"Perform distinct count when summarizing data in a table sequence. Now we are trying to find the .."

mars RaqForum 25 No.
204 View • 1 Years ago

2.15 Grouping: distinct count

Perform distinct count when summarizing data in a table sequence. Now we are trying to find the most suitable field for cooperating with Pclass to form the primary key.

PassengerId	Survived	Pclass	Name	Sex	Age
1	0	3	“Braund, Mr. Owen Harris”	male	22
2	1	1	“Cumings, Mrs. John Bradley”	female	38
3	1	3	“Heikkinen, Miss. Laina”	female	26
4	1	1	“Futrelle, Mrs. Jacques Heath”	female	35
5	0	3	“Allen, Mr. William Henry”	male	35
6	0	3	“Moran, Mr. James”	male
7	0	1	“McCarthy, Mr. Timothy J”	male	54
…	…	…	…	…	…

SPL script:

	A	B	C
1	=T(“titanic_train.xlsx”)
2	=A1.group(Pclass)
3	=A1.fno()	>result=[]
4	for A3	=A2.new(~.field(A4).icount():Dcount,~.len():Dlen)
5		=B4.select(DCount==Dlen)
6		if B5.len()==A2.len()	>result.insert(0,A1.fname(A4))
7	return result

A1 Load data table.
A2 Group table by Pclass.
A3 Get the ordinal number of each field.
A4 Loop each column to
B4 Use icount() to calculate the non-duplicate values of the current field in each group;
B5 Get records where the number of distinct values of the current field is equivalent to the length of the current group;
B6 If the number of records in B5 is equivalent to the number of groups in A2, the current field does not have duplicate values in all groups;
C6 Add name of the current field to the result set.
A7 Return result set.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

spl-cookbook(224)

eBook

mars • 204 View • 1 Years ago