2.15 Grouping: distinct count


Perform distinct count when summarizing data in a table sequence. Now we are trying to find the most suitable field for cooperating with Pclass to form the primary key.

PassengerId Survived Pclass Name Sex Age
1 0 3 “Braund, Mr. Owen Harris” male 22
2 1 1 “Cumings, Mrs. John Bradley” female 38
3 1 3 “Heikkinen, Miss. Laina” female 26
4 1 1 “Futrelle, Mrs. Jacques Heath” female 35
5 0 3 “Allen, Mr. William Henry” male 35
6 0 3 “Moran, Mr. James” male
7 0 1 “McCarthy, Mr. Timothy J” male 54

SPL script:

1 =T(“titanic_train.xlsx”)
2 =A1.group(Pclass)
3 =A1.fno() >result=[]
4 for A3 =A2.new(~.field(A4).icount():Dcount,~.len():Dlen)
5 =B4.select(DCount==Dlen)
6 if B5.len()==A2.len() >result.insert(0,A1.fname(A4))
7 return result

A1 Load data table.
A2 Group table by Pclass.
A3 Get the ordinal number of each field.
A4 Loop each column to
B4 Use icount() to calculate the non-duplicate values of the current field in each group;
B5 Get records where the number of distinct values of the current field is equivalent to the length of the current group;
B6 If the number of records in B5 is equivalent to the number of groups in A2, the current field does not have duplicate values in all groups;
C6 Add name of the current field to the result set.
A7 Return result set.