4.4 Group & aggregate

 

4.3 Set aggregation


1. The average salary of each department
2. The highest-paid member in each department
3. The three highest-paid employees in each department
4. The employee with the smallest EID in each department (EID field is in order)

SPL

A B
1 =file(“EMPLOYEE.csv”).import@tc()
2 =A1.groups(DEPT;avg(SALARY):avg_salary) /Average salary of each department
3 =A1.groups(DEPT;maxp(SALARY):max_salary_emp) /The highest-paid member in each department (one member)
4 =A1.groups(DEPT;maxp@a(SALARY):max_salary_emp) /The highest-paid members in each department (multiple members)
5 =A1.groups(DEPT;top(-3;SALARY):max3_age_emp) /The three highest-paid employees in each department
6 =A1.group@1(DEPT) /The employee with the smallest EID in each department

The group@1() function in A6 is to retrieve the first member of a group. Since the EID field is originally ordered, the first member of group after grouping is the employee with the smallest EID in the corresponding department.

SQL

1. The average salary of each department

SELECT DEPT, AVG(SALARY) AS AVERAGE_SALARY
FROM EMPLOYEE
GROUP BY DEPT;

2. The highest-paid member in each department (one member)

SELECT*
FROM(
    SELECTEMPLOYEE.*,
        ROW_NUMBER()OVER(PARTITIONBYDEPTORDERBYSALARYDESC)ASrn
    FROMEMPLOYEE)subquery
WHERErn=1;

3. The highest-paid members in each department (multiple members)

SELECTE.*FROMEMPLOYEEE
INNERJOIN(
    SELECTDEPT,MAX(SALARY)ASMAX_SALARY
    FROMEMPLOYEE
    GROUPBYDEPT)M
ONE.DEPT=M.DEPTANDE.SALARY=M.MAX_SALARY
ORDERBYE.DEPT;

4. The three highest-paid employees in each department

SELECT*FROM(
    SELECTEMPLOYEE.*,
        ROW_NUMBER()OVER(PARTITIONBYDEPTORDERBYSALARYDESC)ASrn
    FROMEMPLOYEE)subquery
WHERErn<=3;

5. The employee with the smallest EID in each department (EID field is in order)

SELECT * FROM (
    SELECT EMPLOYEE.*,
        ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY NULL) AS rn
    FROM EMPLOYEE) subquery
WHERE rn = 1
ORDER BY EID;

Python

emp = pd.read_csv('../EMPLOYEE.csv')
#Average salary of each department
avg_salary_by_dept = emp.groupby('DEPT')['SALARY'].mean()
#The highest-paid member in each department (one member)
highest_paid_employee_by_dept = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(1, 'SALARY')).reset_index(drop=True) 
#The highest-paid members in each department (multiple members)
highest_paid_employee_by_dept_all = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(1, 'SALARY',keep='all')).reset_index(drop=True)
#The three highest-paid employees in each department
top_3_employees_by_dept = emp.groupby('DEPT').apply(
    lambda x: x.nlargest(3, 'SALARY')).reset_index(drop=True)
#The employee with the smallest EID in each department
first_employee_by_dept = emp.groupby('DEPT').first()

5.1 Simple iterative operations
Example codes for comparing SPL, SQL, and Python