Aggregation by Dynamic Intervals
【Question】
I have a table emp:
Empno |
Age |
Deptno |
101 |
10 |
10 |
104 |
30 |
10 |
105 |
60 |
10 |
106 |
60 |
20 |
107 |
20 |
20 |
I wanted the output in the below format: Please help me.
Deptno |
Age<=20 |
Age<=40 |
Age<=60 |
10 |
1 |
1 |
1 |
20 |
1 |
0 |
1 |
【Answer】
You want to count age values according to three listed intervals. With fixed intervals, we can write the aggregations in one SQL query:
SELECT deptno, NVL (SUM (CASE WHEN age BETWEEN 0 AND 20 THEN 1 END),0)"Age <=20",NVL (SUM (CASE WHEN age BETWEEN 21 AND 40 THEN 1 END),0)"Age <=40",NVL (SUM (CASE WHEN age BETWEEN 41 AND 60 THEN 1 END),0) "Age <=60" FROM emp GROUP BY deptno;
But with dynamic intervals, like “Age<=20,Age<=40,Age<=60” and “Age<=20&Salary>3000, Age<=50”, it’s hard to aggregate simply in SQL. We often turn to Java or reporting tools to generate the SQL query. The workarounds are still complicated.
SPL supports result sets with dynamic columns and can pass the interval conditions as parameter values. It’s easy to solve your problem in esProc SPL. Suppose the conditional parameters are [“Age<=20”,“Age>20 && Age<=40”, “Age>40 && Age<=60”], we can write the following code:
A |
|
1 |
=[“Age<=20”,“Age>20 && Age<=40”, “Age>40 && Age<=60”] |
2 |
=connect(“database”) |
3 |
=A2.query(“select * from emp”) |
4 |
= A1.(“count(”+~+“):”+~) |
5 |
= A3.groups(Deptno;${A4.concat@c()}) |
A1: It is a sequence of conditions;
A2: Connect to the database;
A3: Retrieve data from emp table;
A4: Rewrite each member of A1’s sequence of conditions in the format of count(condition):condition. The part before semicolon is the expression performing a count operation by the condition, the part after it is the resulting field name;
A5: Query data for the fields defined in A4 group by group by Deptno. concat() function concatenates members in A1’s sequence with delimiters and returns result as a string. @c option means concatenating them with comma. ${} is a macro replacement which converts the concatenated string into expressions that can be identified by groups() function. For example, count(Age<=20):Age<=20 means counting by condition Age<=20 and the resulting field name is Age<=20. Functions that such a macro replacement applies include sum() and so on.
A1
A3
A4
A5
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL