3.2 Non-equivalence grouping

 

3.1 Equivalence grouping


3.2.1 Alignment grouping

Count the number of male employees in the specified departments [‘Administration’, ‘HR’, ‘Marketing’, ‘Sales’] in the employee table.

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.select(GENDER==“M”)
3 [Administration, HR, Marketing, Sales]
4 =A2.align@a(A3,DEPT).new(A3(#):DEPT,~.len():NUM)

SPL provides the align@a() function to implement alignment grouping. This function can produce the same result as group function, and both return the grouped subset.

SQL

SELECT d.DEPT, COUNT(GENDER) AS MaleCount
FROM (
    SELECT COLUMN_VALUE AS DEPT 
    FROM TABLE(SYS.ODCIVARCHAR2LIST('Administration', 'HR', 'Marketing', 'Sales'))) d
LEFT JOIN EMPLOYEE e ON d.DEPT = e.DEPT AND e.GENDER = 'M'
GROUP BY d.DEPT;

SQL can only use left join to do this task.

Python

Python does not provide a dedicated method similar to alignment grouping, so it has to hard code.

3.2.2 Enumeration grouping

Calculate the average salary of employees of different generations (generation X: born between 1965 and 1980, generation Y: born between 1980 and 1995, generation Z: born btw 1995 and 2010).

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 [?>=date(“1965-01-01”)&&?<date(“1980-01-01”),?>=date(“1980-01-01”)&&?<date(“1995-01-01”),?>=date(“1995-01-01”)&&?<date(“2010-01-01”)] /Grouping conditions
3 [X,Y,Z]
4 =A1.run(BIRTHDAY=date(BIRTHDAY))
5 =A1.enum(A2,BIRTHDAY).new(A3(#):GENERATION,~.avg(SALARY):SALARY) /Enumeration grouping
6 [1965-01-01,1980-01-01,1995-01-01,2010-01-01]
7 =A1.align@a(A3,A3(A6.pseg(BIRTHDAY))).new(A3(#):GENERATION,~.avg(SALARY):SALARY) /Use the alignment grouping function to do enumeration grouping

A2: Write the conditions of different generations as string to form a sequence, and use ‘?’ to represent the grouping key value to be passed to calculation.
A5: The enum() function uses the grouping key of each member in the set to be grouped to calculate these conditions in turn. If the result is true, the member is put into the corresponding group.
A6: Put the segmentation years of the generations in a sequence.
A7: Use the align()and pseg() methods to do the enumeration grouping task.

SQL

WITH t AS (SELECT 'X' AS generation, DATE '1965-01-01' AS s,DATE '1980-01-01' AS e FROM DUAL 
    UNION ALL
        SELECT 'Y' AS generation, DATE '1980-01-01' AS s,DATE '1995-01-01' AS e FROM DUAL
    UNION ALL
        SELECT 'Z' AS generation, DATE '1995-01-01' AS s,DATE '2010-01-01' AS e FROM DUAL)
SELECT generation,AVG(SALARY) FROM t LEFT JOIN EMPLOYEE e on e.BIRTHDAY>=t.s AND e.BIRTHDAY<t.e GROUP BY generation;

Python

Python does not provide a dedicated method similar to enumeration grouping, so it has to hard code.

3.2.3 Overlapped enumeration grouping

To group the employees by age and calculate the number of each group (when grouping conditions overlap, list all the employees satisfying the conditions. The grouping conditions are [year of service<5, 5<=years of service>=10, years of service>=10, years of service>=15]).

SPL

A
1 =file(“EMPLOYEE.csv”).import@tc()
2 [?<5,?>=5 && ?<10,?>=10,?>=15]
3 [Within five years,Five to ten years,More than ten years,Over fifteen years]
4 =A1.derive(age(HIREDATE):EMPLOYED)
5 =A4.enum@r(A2, EMPLOYED).new(A3(#):EMPLOYEED,~.len():NUM)

The @r option in the enum@r() function allows members repeatedly existing in different subsets.

SQL

WITH t AS (SELECT 'Within five years' AS employeed, 0 AS s,5 AS e FROM DUAL 
    UNION ALL
        SELECT 'Five to ten years' AS employeed, 5 AS s,10 AS e FROM DUAL 
    UNION ALL
        SELECT 'More than ten years' AS employeed, 10 AS s,100 AS e FROM DUAL 
    UNION ALL
        SELECT 'Over fifteen years' AS employeed, 15 AS s,100 AS e FROM DUAL)
SELECT employeed,COUNT(h.HIRED) AS NUM 
FROM t 
LEFT JOIN (
    SELECT TRUNC(MONTHS_BETWEEN(SYSDATE, HIREDATE)/12) AS HIRED 
    FROM EMPLOYEE) h 
ON h.HIRED>=t.s AND h.HIRED <t.e 
GROUP BY employeed;

Python

Python can only implement this task through hard coding.


3.3 Ordered grouping
Example codes for comparing SPL, SQL, and Python