Group & Aggregate with SPL

Question

I have a CSV file with the following values:

#BOF

userID;gender;movieID;rating

1;m;100;50

1;m;101;100

1;m;102;0

2;f;100;100

2;f;101;80

3;m;104;70

4;m;104;80

5;f;100;75

#EOF

 

I want to know how many movies does each user rate? Assume that there are hundreds of thousands of users. I tried to code it in Eclipse for Java using:

while ((strLine = br.readLine()) != null) {

String[] strings = strLine.split(";");

But then it stopped. I am new at this so it probably looks easy, but not for me yet.

 

Answer

It’s inconvenient to code group & aggregate in Java because the high-level language doesn’t offer corresponding functions. Here I get it done with SPL (Structured Process Language):

A

1

=file("d:\\source.csv").read@n()

2

=A1.to(2,A1.len()-1)

3

=A2.concat("\n")

4

=A3.import@t(;";")

5

=A4.groups(userID;count(movieID))

 

A1: Read in the contents of source.csv and return the lines as a sequence of strings; each line is a member.

undefined

A2: Retrieve rows from the second to the second-to-last from A1’s table.

undefined

A3: Join members of A2’s sequence into a string with the delimiter “\n”.

undefined

A4: Import A3’s string into multiple records by the delimiter and return them as a table sequence.

undefined

A5: Group records by userID and count records in each group.

undefined

The SPL script can be easily integrated into a Java application. See How to Call an SPL Script in Java for more details.