6.22 Grouped subsets: filter grouped subsets and group the selected subsets

 

Select states which more than 50 employees are based and calculate average salary of each department in these states.

ID NAME BIRTHDAY STATE DEPT SALARY
1 Rebecca 1974/11/20 California R&D 7000
2 Ashley 1980/07/19 New York Finance 11000
3 Rachel 1970/12/17 New Mexico Sales 9000
4 Emily 1985/03/07 Texas HR 7000
5 Ashley 1975/05/13 Texas R&D 16000

A grouping operation itself is complex. Retaining grouped subsets and reusing them can increase computing efficiency. A.group() function retains grouped subsets after grouping so that we can further perform grouping & aggregation on the grouped subsets.

SPL script:

A
1 =T(“Employee.csv”)
2 =A1.group(STATE).select(~.len()>50)
3 =A2.conj(~.groups(DEPT; avg(SALARY):AVG_SALARY).derive(A2.~.STATE:STATE))

A1 Import Employee table.
A2 Group A1’s table by state and get the grouped subsets containing more than 50 employees.
A3 In each subset of state, group records by department, calculate average salary in each department and concatenate all result sets.