"9.1 Grouping and aggregation 1. Please count the number of occurrences of each letter in the str .."

mars RaqForum 25 No.
1 Reply • 306 View • 7 Months ago

SPL Programming Exercise - Chapter 9 Grouping

9.1 Grouping and aggregation

1. Please count the number of occurrences of each letter in the string “esProc SPL is a smart desktop data analysis tool” (ignoring case, a and A are the same letter)

(1) Program to generate a student score table, as shown in the following figure. The student ID ranges from 1 to 50, with 1 to 25 being Class “one” and after the 25th being Class “Two”. The Subject is “English”, “Math”, “PE”, “Science”, and “Arts”, and the Score is a random number within 100.

(2) Calculate the total and average scores for each student, as well as the subject with the highest score

(3) Calculate the average score, highest and lowest scores for each subject in each class

(4) Arrange the scores of each subject in descending order for each class and save them as multiple Excel sheets, with a naming format of Class_Subjects

(5) Calculate the median score and number of students who fail (below 60 points) in each subject of each class

(6) Select individuals with scores of 90 or above in each subject of each class

3. Randomly generate a series of email addresses containing duplicate values, such as

(1) Remove duplicate email addresses

(2) Identify duplicate email addresses

4. The behavior and activities of some players on a certain game platform are as follows:

Each row of data records the number of games opened by a player using the same device on the same day before logging off the platform (possibly 0).

(1) Check the date when each player first logged into the platform.

Result example:

(2) Check which devices each player has used to log in

Result example:

(3) Calculate the ratio of players who log in again on the second day of their first login

There is an employee table

ID is the primary key (column with unique value) of this table.

Each row of the table represents the name of the employee, their department, and the ID of their manager.

If the managerId is empty, then the employee does not have a manager.

No employee will become their own manager.

Write a program to identify the names of managers who have at least five direct reports

9.2 Enumeration and alignment

1. The salary table for employees is as follows: calculate the average salary of departments A, B, and C

id	name	department	salary
101	John	A	7000
102	Dan	B	11000
103	James	A	9000
104	Amy	B	7000
105	Anne	B	16000
106	Ron	A	8000
107	Van	C	20000
108	Mike	C	7500
109	Jack	D	18000
110	Kelly	D	9500
111	Bonnie	C	12000
112	Adam	C	8500

2. According to the order table, list the total number of orders for each month in 2013 in order

3. According to the posting record table, group posts by label and count the frequency of each label appearing in descending order

ID	Title	Author	Label
1	Easy analysis of Excel	John	Excel,ETL,Import,Export
2	Early commute: Easy to pivot excel	Dan	Excel,Pivot,Python
3	Initial experience of SPL	James	Basics,Introduction
4	Talking about set and reference	Amy	Set,Reference,Dispersed,SQL
5	Early commute: Better weapon than Python	John	Python,Contrast,Install
6	…	…	…

Result example:

Label	Count
SPL	7
SQL	6
Basics	5
…	…

4. According to the employee salary table, divide into three groups of [L, M, H] based on salaries below 8000, 8000~12000, and above 12000, and calculate the number of people in each group. (Implemented separately using enumeration and sequence number grouping)

id	name	department	salary
101	John	A	7000
102	Dan	B	11000
103	James	A	9000
104	Amy	B	7000
105	Anne	B	16000
106	Ron	A	8000
107	Van	C	20000
108	Mike	C	7500
109	Jack	D	18000
110	Kelly	D	9500
111	Bonnie	C	12000
112	Adam	C	8500

Result example:

5. According to the employee salary table, group employees by age criteria (less than 35 years old, less than 45 years old) (can be repeated), calculate the average salary, and assign those who do not meet the criteria to a new group.

id	name	birthday	salary
101	John	1974/11/20	7000
102	Dan	1980/7/19	11000
103	James	1970/12/17	9000
104	Amy	1985/3/7	7000
105	Anne	1975/5/13	16000
106	Ron	1988/3/7	8000
107	Van	1994/11/20	20000
108	Mike	1990/7/19	7500
109	Jack	1970/12/17	18000
110	Kelly	1985/3/7	9500
111	Bonnie	1975/5/13	12000
112	Adam	1995/5/13	8500

9.3 Order-related grouping

1. Divide employees into three groups on average based on their start time (assigned to a certain group according to the order of remainder), and calculate the average salary for each group. Partial data is as follows:

ID	NAME	BIRTHDAY	ENTRYDATE	DEPT	SALARY
1	Rebecca	1974/11/20	2005/3/11	R&D	7000
2	Ashley	1980/7/19	2008/3/16	Finance	11000
3	Rachel	1970/12/17	2010/12/1	Sales	9000
4	Emily	1985/3/7	2006/8/15	HR	7000
5	Ashley	1975/5/13	2004/7/30	R&D	16000
…	…	…	…	…	…

2. According to the medal table of previous Olympic Games, determine the country with the longest number of consecutive first place medals and its medal information. Partial data is as follows:

Game	Nation	Gold	Silver	Copper
30	USA	46	29	29
30	China	38	27	23
30	UK	29	17	19
30	Russia	24	26	32
30	Korea	13	8	7
…	…	…	…	…

Game represents the number of Olympic Games

3. What is the maximum consecutive rising days of the closing price of the Shanghai Composite Index in 2020? (On the first trading day, the index rose.) Partial data is as follows:

DATE	CLOSE	OPEN	VOLUME	AMOUNT
2020/1/2	3085.198	3066.3357	292470208	3.27E+11
2020/1/3	3083.786	3089.022	261496667	2.90E+11
2020/1/6	3083.408	3070.9088	312575842	3.31E+11
2020/1/7	3104.802	3085.4882	276583111	2.88E+11
2020/1/8	3066.893	3094.2389	297872553	3.07E+11
…	…	…	…	…

9.4 Expansion and transpose

1. According to the student score table, calculate the highest scores for each subject in each class and display them in columns. Partial data is as follows:

CLASS	STUDENTID	SUBJECT	SCORE
1	1	English	84
1	1	Math	77
1	1	PE	69
1	2	English	81
1	2	Math	80
…	…	…	…

Expected results:

CLASS	MAX_MATH	MAX_ENGLISH	MAX_PE
1	97	96	97
2	97	96	97
…	…	…	…

2. Generate a list of each medal based on the Olympic medal tally. Partial data is as follows:

Game	Nation	Gold	Silver	Copper
30	USA	46	29	29
30	China	38	27	23
30	UK	29	17	19
30	Russia	24	26	32
30	Korea	13	8	7
…	…	…	…	…

Expected results:

Game	Nation	Medal_type	Medals
30	USA	Gold	46
30	USA	Silver	29
30	USA	Copper	29
30	China	Gold	38
30	China	Silver	27
30	China	Copper	23
…	…	…	…

3. The salary table for employees is as follows:

Name	Dept	Area	Salary
David	Sales	Beijing	8000
Daniel	R&D	Beijing	15000
Andrew	Sales	Shanghai	9000
Robert	Sales	Beijing	26000
Rudy	R&D	Shanghai	23000
…	…	…	…

Calculate the average salary of each department in different areas, and the expected results are:

Dept	Beijing	Shanghai	…
R&D	13000	11000	…
Sales	15000	14000	…

4. Sales table classified by channel, recorded by year and month. Partial data is as follows:

YEAR	MONTH	ONLINE	STORE
2020	1	2440	3746.2
2020	2	1863.4	448
2020	3	1813	624.8
2020	4	670.8	2464.8
2020	5	3730	724.5
…	…	…	…

Expected to query the monthly sales revenue of each channel in 2020, expected results:

CATEGORY	1	2	3	…
ONLINE	2440	1863.4	1813	…
STORE	3746.2	448	624.8	…

Suggested answers:

9.1 Grouping and aggregation

	A
1	esProc SPL is a smart desktop data analysis tool
2	=lower(A1).split("“).select(~!=”")
3	=A2.group(~:letter;~.len():letter_count)

	A
1	=[“English”,“Math”,“PE”,“Science”,“Arts”]
2	=A1.(50.new(if(~<=25,“one”,“Two”):Class, ~:Id, A1.~:Subject, rand(100):Score)).conj().sort(Id)
3	=A2.groups(Id;sum(Score):Total,maxp(Score).Subject:Max_sub)
4	=A2.groups(Class,Subject;avg(Score):Average,max(Score):Max,min(Score):Min)
5	=A2.group(Class,Subject;~.sort@z(Score):Subscore)
6	=A5.(file(~.Class/“_”/~.Subject/“.xlsx”).xlsexport@t(~.Subscore))
7	=A2.groups(Class,Subject;median(,Score):Median_socre,count(Score<60):Failed_count)
8	=A2.group(Class,Subject;~.select(Score>=90):Above_90)

	A
1	=30.(char(97+rand(26))/“@raqsoft.com”)
2	=A1.id()
3	=A1.group().select(~.len()>1).(~(1))

	A
1	=T(“player_event.xlsx”)
2	=A1.groups(player_id;min(event_date):first_login)
3	=A1.group(player_id;~.(device_id).id():device)
4	=A1.group(player_id;~.(event_date).contain(min(event_date)+1):continu_login)
5	=A4.count(continu_login)/A4.len()

	A
1	=T(“employee.xlsx”)
2	=A1.group(managerId).select(~.len()>=5).(~.managerId)
3	=A2.(A1.select(A2.~==id).name)

9.2 Enumeration and alignment

	A
1	=T(“salary.xlsx”)
2	[A,B,C]
3	=A1.align@a(A2,department)
4	=A3.new(A2(#):department,~.avg(salary):avg_salary)

	A
1	=T(“orders.xlsx”)
2	=A1.select(year(Orderdate)==2013)
3	=A2.groups@n(month(Orderdate):Month;count(id):OrderCount)

	A
1	=T(“post_record.xlsx”)
2	=A1.conj(Label.split(“,”)).id()
3	=A1.align@ar(A2.len(),A2.pos(Label.split(“,”)))
4	=A3.new(A2(#):Label,~.count():Count).sort@z(Count)

	A
1	=T(“salary.xlsx”)
2	[?<8000,?>=8000 && ?<12000,?>=12000]
3	[L,M,H]
4	=A1.enum(A2,salary)
5	=A4.new(A3(#):Grade,count(~):Count)

	A
1	=T(“salary.xlsx”)
2	[8000,12000]
3	[L,M,H]
4	=A1.group@n(A2.pseg(salary)+1)
5	=A4.new(A3(#):Grade,count(~):Count)

	A
1	=T(“salary.xlsx”)
2	[?<35,?<45]
3	[below 35,below 45]
4	=A1.enum@nr(A2,age(birthday))
5	=A4.new(if (#>A2.len(), “Other”,A3(#)):AGE,~.avg(salary):AvgSalary)

9.3 Order-related grouping

	A
1	=T(“Employee.csv”).sort(ENTRYDATE)
2	=A1.group@n((#-1)*3\A1.len()+ 1)
3	=A2.new(#:GROUP_NO, ~.avg(SALARY):AVG_SALARY)

	A
1	=T(“Olympic.txt”)
2	=A1.sort@z(Game,Gold,Silver,Copper)
3	=A2.group@o1(Game)
4	=A3.group@o(Nation)
5	=A4.maxp@a(~.len())

	A
1	=T(“SSEC.csv”)
2	=A1.select(year(DATE)==2020).sort(DATE)
3	=A2.group@i(CLOSE<CLOSE[-1])
4	=A3.max(~.len())

9.4 Expansion and transpose

	A
1	=T(“Scores.csv”)
2	=A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE)
3	=A2.pivot(CLASS; SUBJECT, MAX_SCORE; “Math”:“MAX_MATH”, “English”:“MAX_ENGLISH”, “PE”:“MAX_PE”)

	A
1	=T(“Olympic.txt”)
2	=A1.pivot@r(Game,Nation;Medal_type,Medals;Gold, Silver, Copper)

	A
1	=T(“salary.txt”)
2	=A1.pivot@s(Dept; Area, avg(Salary))

	A
1	=T(“MonthSales.csv”).select(YEAR==2020)
2	=A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT)
3	=A2.pivot(CATEGORY; MONTH, AMOUNT)

In logical order, use the function A.pivot@r()to convert the column to row first, and then use the function A.pivot() to convert the row to column.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

programming exercise(11)

eBook

mars • 306 View • 7 Months ago

SPL Programming Exercise - Chapter 9 Grouping

9.1 Grouping and aggregation

9.2 Enumeration and alignment

9.3 Order-related grouping

9.4 Expansion and transpose

Suggested answers:

ToC