6.20 Grouped subsets: get subsets instead of aggregate values
A grouping operation is to put members in a set having same attribute to the same group, such as grouping an employee table by department and each group of employees has same department value, and grouping a sales table by order year and sales records in each group are in the same year.
In essence, the grouping operation splits a set into multiple subsets according to the specified rule. This means the operation will return a set of sets. We call each member set the grouped subset.
On most occasions people are interested in aggregate values of the grouped subsets, so the grouping operation is always accompanied with a further aggregation. Yet on certain occasions, it is the grouped subsets rather than the aggregate values in which we are more interested.
Let’s look at an example. Here’s part of the data in employee table:
ID | NAME | BIRTHDAY | STATE | DEPT | SALARY |
---|---|---|---|---|---|
1 | Rebecca | 1974/11/20 | California | R&D | 7000 |
2 | Ashley | 1980/07/19 | New York | Finance | 11000 |
3 | Rachel | 1970/12/17 | New Mexico | Sales | 9000 |
4 | Emily | 1985/03/07 | Texas | HR | 7000 |
5 | Ashley | 1975/05/13 | Texas | R&D | 16000 |
… | … | … | … | … | … |
Find employees born on the same day.
SPL provides A.group() to achieve a real grouping operation that returns a result set that is a set made up of multiple grouped subsets.
SPL script:
A | |
---|---|
1 | =T(“Employee.csv”) |
2 | =A1.group(BIRTHDAY) |
3 | =A2.select(~.len()>1).conj() |
A1 Import Employee table.
A2 Use A.group() to group records by birthday.
A3 Get groups containing more than one member, which are subsets whose members having same birthday, and concatenate those subsets.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL