6.20 Grouped subsets: get subsets instead of aggregate values

 

A grouping operation is to put members in a set having same attribute to the same group, such as grouping an employee table by department and each group of employees has same department value, and grouping a sales table by order year and sales records in each group are in the same year.
In essence, the grouping operation splits a set into multiple subsets according to the specified rule. This means the operation will return a set of sets. We call each member set the grouped subset.
On most occasions people are interested in aggregate values of the grouped subsets, so the grouping operation is always accompanied with a further aggregation. Yet on certain occasions, it is the grouped subsets rather than the aggregate values in which we are more interested.
Let’s look at an example. Here’s part of the data in employee table:

ID NAME BIRTHDAY STATE DEPT SALARY
1 Rebecca 1974/11/20 California R&D 7000
2 Ashley 1980/07/19 New York Finance 11000
3 Rachel 1970/12/17 New Mexico Sales 9000
4 Emily 1985/03/07 Texas HR 7000
5 Ashley 1975/05/13 Texas R&D 16000

Find employees born on the same day.
SPL provides A.group() to achieve a real grouping operation that returns a result set that is a set made up of multiple grouped subsets.

SPL script:

A
1 =T(“Employee.csv”)
2 =A1.group(BIRTHDAY)
3 =A2.select(~.len()>1).conj()

A1 Import Employee table.
A2 Use A.group() to group records by birthday.
A3 Get groups containing more than one member, which are subsets whose members having same birthday, and concatenate those subsets.