SPL: taking the record of maximum values/maximum values in group

 

Sometimes, we don’t care about what the exact maximum value is, we are rather concerned about the record in which the maximum value is located. For example, look up the student ID with the highest math score, select the order with the highest monthly sales, and so on.

 

1. Take the record of maximum values

To get the record where the maximum value is, we need to find out which row the maximum value is in (possibly multiple rows if there are multiple maxima), and then return to the corresponding record. SPL provides the A.maxp() function which is used to select the record where the maximum value is located.

[e.g. 1] According to the report card, find the ID of the student with the highest math score in Class One. Some of the figures are as follows:

CLASS

STUDENTID

SUBJECT

SCORE

1

1

English

84

1

1

Math

77

1

1

PE

69

1

2

English

81

1

2

Math

80

 

SPL script is as follows:

说明: IMG_256

A

1

=T("Scores.csv").select(SUBJECT=="Math"&&CLASS==1)

2

=A1.maxp(SCORE).STUDENTID

A1: import the score table and query the math grades of Class One.

A2: employ A.maxp() function to select the record with the highest score and extract the student ID from the record.

 

The record where the maximum value is located is not necessarily exclusive. If we want to get all the records with the highest scores, the option @a of A.maxp() function could be used:

说明: IMG_256

A

2

=A1.maxp@a(SCORE).(STUDENTID)

A2: use the option @a of A.maxp() function to select all the records with the highest scores, and then extract all the student IDs.

 

The execution results of A1.maxp@a (SCORE) are as follows:

CLASS

STUDENTID

SUBJECT

SCORE

1

10

Math

97

1

13

Math

97

 

We can see that there are two students in Class One with the highest scores in math. Their student numbers are 10 and 13 respectively.

Similarly, SPL also provides the A.minp()function for taking the record where the minimum value is located. If we want to get all the records where the minimum value are, the option @a is also supported in A.minp() function.

 

2. Take the record of maximum values in group

In general, people are more interested in aggregated values after grouping, therefore, grouping operations are often accompanied by further aggregation of subsets. By aggregating groups together, we can easily calculate the maximum or minimum values for each group. However, there are times when we are more interested in these grouped subsets than in aggregated values. For example, we want to know some information like the customers’ names, sales personnel, and so on, regardless of the exact number of highest sales.

 

SPL provides the group() function that we can use to implement authentic grouping, and the result set after grouping is a set of many grouped subsets. This allows the group function to support the operation of taking the record where the maximum value is in each grouped subsets. Let’s look at a simple example of how to get the record where the maximum value is in group.

[e.g. 2] According to the sales table, find the record of the highest monthly sales in 2014. Some of the figures are as follows:

ID

CUSTOMERID

ORDERDATE

SELLERID

AMOUNT

10400

EASTC

2014/01/01

1

3063.0

10401

HANAR

2014/01/01

1

3868.6

10402

ERNSH

2014/01/02

8

2713.5

10403

ERNSH

2014/01/03

4

1005.9

10404

MAGAA

2014/01/03

2

1675.0

 

We firstly select the sales records of 2014, group them by month, then take out the records with the highest sales from the sales records for each month, and finally merge those records together.

The SPL script looks like this:

说明: IMG_256

A

1

=T("Sales.csv").select(year(ORDERDATE)==2014)

2

=A1.group(month(ORDERDATE);~.maxp@a(AMOUNT):MAX_AMOUNT)

3

=A2.conj(MAX_AMOUNT)

A1: import the sales table and select the sales figures of 2014.

A2: employ the A.group() function to group the sales records by month and select the records with the highest sales in each grouped subsets.

A3: concatenate the maximum records of all month.