4.4 Group & aggregate
1. The average salary of each department
2. The highest-paid member in each department
3. The three highest-paid employees in each department
4. The employee with the smallest EID in each department (EID field is in order)
SPL
A | B | |
---|---|---|
1 | =file(“EMPLOYEE.csv”).import@tc() | |
2 | =A1.groups(DEPT;avg(SALARY):avg_salary) | /Average salary of each department |
3 | =A1.groups(DEPT;maxp(SALARY):max_salary_emp) | /The highest-paid member in each department (one member) |
4 | =A1.groups(DEPT;maxp@a(SALARY):max_salary_emp) | /The highest-paid members in each department (multiple members) |
5 | =A1.groups(DEPT;top(-3;SALARY):max3_age_emp) | /The three highest-paid employees in each department |
6 | =A1.group@1(DEPT) | /The employee with the smallest EID in each department |
The group@1() function in A6 is to retrieve the first member of a group. Since the EID field is originally ordered, the first member of group after grouping is the employee with the smallest EID in the corresponding department.
SQL
1. The average salary of each department
SELECT DEPT, AVG(SALARY) AS AVERAGE_SALARY
FROM EMPLOYEE
GROUP BY DEPT;
2. The highest-paid member in each department (one member)
SELECT*
FROM(
SELECTEMPLOYEE.*,
ROW_NUMBER()OVER(PARTITIONBYDEPTORDERBYSALARYDESC)ASrn
FROMEMPLOYEE)subquery
WHERErn=1;
3. The highest-paid members in each department (multiple members)
SELECTE.*FROMEMPLOYEEE
INNERJOIN(
SELECTDEPT,MAX(SALARY)ASMAX_SALARY
FROMEMPLOYEE
GROUPBYDEPT)M
ONE.DEPT=M.DEPTANDE.SALARY=M.MAX_SALARY
ORDERBYE.DEPT;
4. The three highest-paid employees in each department
SELECT*FROM(
SELECTEMPLOYEE.*,
ROW_NUMBER()OVER(PARTITIONBYDEPTORDERBYSALARYDESC)ASrn
FROMEMPLOYEE)subquery
WHERErn<=3;
5. The employee with the smallest EID in each department (EID field is in order)
SELECT * FROM (
SELECT EMPLOYEE.*,
ROW_NUMBER() OVER (PARTITION BY DEPT ORDER BY NULL) AS rn
FROM EMPLOYEE) subquery
WHERE rn = 1
ORDER BY EID;
Python
emp = pd.read_csv('../EMPLOYEE.csv')
#Average salary of each department
avg_salary_by_dept = emp.groupby('DEPT')['SALARY'].mean()
#The highest-paid member in each department (one member)
highest_paid_employee_by_dept = emp.groupby('DEPT').apply(
lambda x: x.nlargest(1, 'SALARY')).reset_index(drop=True)
#The highest-paid members in each department (multiple members)
highest_paid_employee_by_dept_all = emp.groupby('DEPT').apply(
lambda x: x.nlargest(1, 'SALARY',keep='all')).reset_index(drop=True)
#The three highest-paid employees in each department
top_3_employees_by_dept = emp.groupby('DEPT').apply(
lambda x: x.nlargest(3, 'SALARY')).reset_index(drop=True)
#The employee with the smallest EID in each department
first_employee_by_dept = emp.groupby('DEPT').first()
5.1 Simple iterative operations
Example codes for comparing SPL, SQL, and Python
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL