Subgroup Aggregation and Getting Top N

Question

Source: SQL Query with Time Interval

I want to get best 3 day of users between “2014-07-01” and “2014-08-01”

Could someone help me? I’ve been stuck here for 3 days.

In real score table entries are from 10:00 to 22:00 and 1 entry for each hour.

A total of 12 entries each day and each player (Sometimes it could be less 1 or 2).

This is the output I’m trying to get:

ID | User_ID | Username | Sum(Score) | Date

 

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

 

1| 1| Xxx | 52| 2014-07-01

 

2| 1| Xxx | 143| 2014-07-02

 

3| 2| Yyy | 63| 2014-07-01

 

...

 

Score table:

 

ID | User_ID | Score | Datetime

 

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

 

1| 1| 35| 2014-07-0111:00:00

 

2| 1| 17| 2014-07-0112:00:00

 

3| 2| 36| 2014-07-0111:00:00

 

4| 2| 27| 2014-07-0112:00:00

 

5| 1| 66| 2014-07-0211:00:00

 

6| 1| 77| 2014-07-0212:00:00

 

7| 2| 93| 2014-07-0212:00:00

 

...

 

User table:

 

ID | Username

 

\-\-\-\-\-\-\-\-\-\-\-\-\-\-

 

1| Xxx

 

2| Yyy

 

3| Zzz

 

...

 

A solution:

I think you need to aggregate first by date, and then choose the first three usingrow_number(). To do the aggregation:

 

select s.user_id,sum(s.datetime,'day')as theday,sum(score)as score,

 

row_number()over(partitionby s.user_id orderby sum(score)desc)as seqnum

 

from scores s

 

groupby s.user_id;

 

To get the rest of the information, use this as a subquery or CTE:

 

select u.*,s.score

 

from(select s.user_id,sum(s.datetime,'day')as theday, sum(s.score)as score,

 

row_number()over(partitionby s.user_id orderby sum(s.score)desc)as seqnum

 

from scores s

 

groupby s.user_id

 

)s join

 

users u

 

on s.user_id =u.users_id

 

whereseqnum <=3

 

orderby u.user_id,s.score desc;

 

Answer

This is a typical intra-group computation. Here is the logic:

1. Join the two source tables and group records by User_ID. Each group includes all records of a user.

2. For each group, group records by Datetime and sum the scores each day.

3. Find the records containing top 3 scores in each subgroup.

4. Concatenate the selected records.

The logic is clear but SQL has difficulty in expressing the algorithm. Here I choose to get it done in SPL. The Structured Process Language can phrase an intra-group computation conveniently. Here’s the SPL script:

A

1

=dataSource.query("select s.ID   ID,s.User_ID User_ID,u.Username Username,s.Score Score,s.Datetime Datetime   from Scoretable s join Usertable u on s.User_ID = u.ID")

2

=A1.group(User_ID)

3

=A2.(~.groups(User_ID,Username,date(Datetime):day;   sum(Score):sumScore))

4

=A3.(~.top(-3;sumScore))

5

=A4.conj()

 

1. The sign “~” represents each group of records.

2. SPL’s group() function groups data but won’t aggregate it (A2).

3. SPL’s groups() function groups data while performs aggregation (A3).

4. An SPL script is easily integrated with a Java program or a reporting tool.

More similar examples can be found in SPL Simplified SQL Case Details: Intra-group Computing.