Non-basic Aggregation Examples

 

An aggregate operation summarizes data and returns the result. An aggregation is always preceded by a grouping operation. Common aggregate calculations include sum, max, min, count and logic operations. In this article, you’ll find illustrations of aggregate problems and their simple and efficient solutions with esProc scripts. Looking Non-basic Aggregation Examples for details.

 

1. SUM aggregate after enumeration grouping

Example 1 Based on the GDP table, calculate the GDP per capita for direct-controlled municipalities, first-tier cities and second-tier cities respectively. Below is part of the source table:

ID

City

GDP

Population

1

Shanghai

32679

2418

2

Beijing

30320

2171

3

Shenzhen

24691

1253

4

Guangzhou

23000

1450

5

Chongqing

20363

3372

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from   GDP")

/ Query GDP table

3

[["Beijing","Shanghai","Tianjin","Chongqing"].pos(?)>0,
["Beijing","Shanghai","Guangzhou","Shenzhen"].pos(?)>0,
["Chengdu","Hangzhou","Chongqing","Wuhan","Xian","Suzhou",
"Tianjin","Nanjing","Changsha","Zhengzhou","Dongguan","Qingdao",
"Shenyang","Ningbo","Kunming"].pos(?)>0]

/ Enumerate direct-controlled cities,   first-tier cities and second-tier cities respectively

4

=A2.enum@r(A3,City)

/ Group records in GDP table according to the enumerated sequences   of cities

5

=A4.new(A3(#):Area,~.sum(GDP)/~.sum(Population)*10000:CapitaGDP)

/ Calculate GDP per capita in each   group, during which sum() function is used to calculate sum

A5’s result:

Area

CapitaGDP

["Beijing","Shanghai","Tianjin","Chongqing"].pos(?)>0

107345.03

["Beijing","Shanghai","Guangzhou","Shenzhen"].pos(?)>0

151796.49

["Chengdu","Hangzhou","Chongqing","Wuhan","Xi’an","Suzhou","Tianjin","Nanjing",
"Changsha","Zhengzhou","Dongguan","Qingdao","Shenyang","Ningbo","Kunming"].pos(?)>0

106040.57

 

2. Merge overlapping time intervals

Example 2 The following is part of the orders table. We want to merge records of customer ANATR that have overlapping time periods (between order date and finish date).

OrderID

Customer

SellerId

OrderDate

FinishDate

10308

ANATR

7

2012/09/18

2012/10/16

10309

ANATR

3

2012/09/19

2012/10/17

10625

ANATR

3

2013/08/08

2013/09/05

10702

ANATR

1

2013/10/13

2013/11/24

10759

ANATR

3

2013/11/28

2013/12/26

SPL script


A

B

1

=connect("db")

/ Connect to data source

2

=A1.query("select * from Orders   where  Customer='ANATR'order by   OrderDate")

/ Get orders records of customer ANATR   and sort them by order date

3

=A2.group@i(OrderDate>max(FinishDate[,-1]))

/ Group the selected orders records,   during which a new group is created when the order date of the current record   is later than all previous finish dates

4

=A3.new(Customer,~.min(OrderDate):OrderDate,~.max(FinishDate):FinishDate)

/ For each group, min()function gets   the earliest order date and max() function gets the latest finish date, and   use them respectively as the new order date and the new finish date

A4’s result:

Customer

OrderDate

FinishDate

ANATR

2012/09/18

2012/10/17

ANATR

2013/08/08

2013/09/05

ANATR

2013/10/13

2013/11/24

ANATR

2013/11/28

2013/12/29

 

3. Grouping & conditional COUNT aggregate

Example 4 The following is part of the scores table. We want to, for each subject, find the number of students in class one who fail in this subject.

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

1

English

84

Class  one

1

Math

77

Class  one

1

PE

69

Class  one

2

English

81

Class  one

2

Math

80

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from Scores   where  CLASS='Class one'")

/ Get scores records of students in   class one

3

=A2.groups(SUBJECT;  count(SCORE<60):FailCount)

/ Perform grouping & aggregation,   during which count() function calculates the number of students who fail in   the subject

A3’s result:

SUBJECT

FailCount

English

2

Math

0

PE

2

 

4. Perform logical AND on a set of Boolean values

Example 5 Based on the following primary school online-learning terminal tables (as shown by pic 2), we want to find if all students use mobile phones to learn. Pic 1 is the directory where the tables for classes of all grades are stored.

undefined

ID

STUDENT_NAME

TERMINAL

1

Rebecca  Moore

Phone

2

Ashley  Wilson

Phone,PC,Pad

3

Rachel  Johnson

Phone,PC,Pad

4

Emily  Smith

Phone,Pad

5

Ashley  Smith

Phone,PC

6

Matthew  Johnson

Phone

7

Alexis  Smith

Phone,PC

8

Megan  Wilson

Phone,PC,Pad

SPL script


A

B

C

1

=directory@ps("D:/Primary  School")


/ Traverse the target directory   recursively to list all files

2

for A1

=file(A2).xlsimport@t()

/ Import the Excel files of all   classes circularly

3


=B2.([TERMINAL,"Phone"].ifn().split@c().pos("Phone")    > 0)|@

/ ifn() function makes sure that a   null terminal value is treated as being able to use phone and returns True

4

=B3.cand()


/ A.cand() function checks whether B3’s   members are all True

A4’s result:

Value

false

 

5. Perform logical OR on a set of Boolean values

Example 6 The following is part of the sales data. We want to find if there is at least one month in 2014 when customer RATTC’s amount ranks in top 3.

OrderID

Customer

SellerId

OrderDate

Amount

10400

EASTC

1

2014/01/01

3063.0

10401

HANAR

1

2014/01/01

3868.6

10402

ERNSH

8

2014/01/02

2713.5

10403

ERNSH

4

2014/01/03

1005.9

10404

MAGAA

2

2014/01/03

1675.0

SPL script


A

B

1

=connect("db").query("select    * from sales")

/ Connect to the data source to query sales table

2

=A1.select(year(OrderDate)==2014)

/ Get records of 2014

3

=A2.group(month(OrderDate))

/ Group records of 2014 by months

4

=A3.(~.groups(Customer;  sum(Amount):Amount))

/ Group records in each group by   customers and calculate each customer’s total amount

5

=A4.new(~.top(-3; Amount):Top3)

/ Loop through records of each month   to get customers whose total amounts rank in top 3

6

=A5.(Top3.(Customer).pos("RATTC")>0)

/ Among each month’s top 3, check   whether customer RATTC is included

7

=A6.cor()

/ A.cor() function checks if there is   a True in A6’s members

A7’s result:

Value

false

 

Find more examples in SPL CookBook.