6.21 Grouped subsets: filter grouped subsets by aggregate value

 

Strictly speaking, grouping and aggregation are two independent actions though they always appear side by side. This gives an impression that the two must come together. Actually, the duo sets a limit for grouping operations because more than the several SQL aggregate functions can be performed after the grouping operation.

ID NAME BIRTHDAY STATE DEPT SALARY
1 Rebecca 1974/11/20 California R&D 7000
2 Ashley 1980/07/19 New York Finance 11000
3 Rachel 1970/12/17 New Mexico Sales 9000
4 Emily 1985/03/07 Texas HR 7000
5 Ashley 1975/05/13 Texas R&D 16000

Find employees whose ages are below the average age of the department.
SPL allows defining an operation on each grouped subsets after the grouping operation in A.group() function. The operation can be not only the existing SQL SUM, COUNT, etc. but a more complex one.

SPL script:

A
1 =T(“Employee.csv”)
2 =A1.group(DEPT; (a=~.avg(age(BIRTHDAY)), ~.select(age(BIRTHDAY)<a)):YOUNG)
3 =A2.conj(YOUNG)

A1 Import Employee table.
A2 Group A1’s table by department and select records where the employee age is below the average age of each group. In an aggregate operation defined in A.group() function, we can use a temporary variable to make the computation simpler and easier to understand.
A3 Concatenate the selected records.