Membership Test for Structured Data

 

Abstract

Sometimes we need to judge whether a member belongs to a certain set or not when selecting data from a table, such as getting housing prices of major cities from house price table, getting VIP customers sales records from sales table, etc. This article illustrates how to deal with these membership issues fast and efficiently, and offers esProc sample scripts.

 

1. Set membership test

Get records from a table according to whether a value is included in a specified set (where the number of members is within 10).

Example 1Calculate the average salary in each department in first-tier cities based on the employee table. Below is part of the source data:

ID

NAME

CITY

SALARY

1

Rebecca

Tianjin

7000

2

Ashley

Tianjin

11000

3

Rachel

Shijiazhuang

9000

4

Emily

Shenzhen

7000

5

Ashley

Nanjing

16000

Analysis

When getting data from the employee table, we need to check whether the city where an employee settles is included in the constant set of [Beijing, Shanghai, Guangzhou, Shenzhen]. When the number of members in the set is within 10, we can use A.contain() function to get the eligible records.

SPL script


A

B

1

=connect("db").query("select  * from Employee")

/ Connect to database and query Employee table

2

[Beijing, Shanghai, Guangzhou, Shenzhen]

/ Define a constant set of first-tier cities

3

=A1.select(A2.contain(CITY))

/ Use A.contain() function to get records where the CITY value is contained in A2s constant set

4

=A3.groups(DEPT; avg(SALARY):SALARY)

/ Group selected records by DEPT and calculate average salary in each group

A4s result:

DEPT

SALARY

Finance

7833.33

HR

7187.5

Marketing

7977.27

Get records from a table according to whether a value is included in a specified set (that has members over 10).  

Example 2Calculate the sales amount of every big customer in each month of the year 2014 based on the sales table. Below is part of the source data:

ID

Customer

SellerId

Date

Amount

10400

EASTC

1

2014/01/01

3063.0

10401

HANAR

1

2014/01/01

3868.6

10402

ERNSH

8

2014/01/02

2713.5

10403

ERNSH

4

2014/01/03

1005.9

10404

MAGAA

2

2014/01/03

1675.0

Analysis

Similarly, we need to check whether the Customer value in each sales record is included in the constant set of big customers. In this case the number of members in the set is relatively large (above 10). We first sort the constant set, and then use @b option with A.contain() function to perform a binary search.

SPL script


A

B

1

=connect("db").query("select  * from Sales")

/ Connect to database and query Sales table

2

=["SAVEA","QUICK","ERNSH","HUN","RATTC","HANAR","FOLKO","QUEEN,MEREP","WHITC","FRANK","KOENE"].sort()

/ Define a constant set of big customers and sort it

3

=A1.select(year(Date)==2014 &&  A2.contain@b(Customer))

/ Get records of big customers in 2014. As set A is already ordered, use @b option with A.contain() function to perform a binary search

4

=A3.groups(month(Date):Month;  sum(Amount):Amount)

/ Group the selected records by month and sum the sales amounts in each month

A4s result:

STATE

SALARY

California

7700.0

Texas

7592.59

New  York

7677.77

Florida

7145.16

Other

7308.1

2. Foreign key mapping membership test

Get records from a table according to whether a record is included in the foreign key references in its associated table.

Example 3Find how many students are there in each class who select the Matlab course. Below are SelectCourse table and Course table:

undefined 

Analysis

We need to check whether the CourseID in a record is Matlab when getting records from the SelectCourse table. Before that we get a set of course records from the Course table where the course name is Matlab, and then get records from the SelectCourse table where the CourseID is contained in the set.

SPL script


A

B

1

=connect("db")

/Connect to database

2

=A1.query("select * from Course")

/ Query Course table

3

=A1.query("select * from  SelectCourse")

/ Query SelectCourse table

4

=A2.select(Name=="Matlab")

/ Get Course table records where the course name is Matlab

5

=A3.join@i(CourseID, A4:ID)

/ Use @i option with A.join() function to perform a join filtering

6

=A5.groups(Class; count(1):SelectCount)

/ Group the selected records and count the students who select the specified course in each class

A6s result:

Class

SelectCount

Class  1

3

Class  2

5

 

3. Non-foreign key mapping membership test

Get records from a table according to whether a record is included in the non-foreign key references in its associated table.

Example 4Find the number of students in each class who have a score of above 80 for any subject. Below are Score table and Student table:

undefined 

Analysis

When getting records from the Student table, we need to check whether the current student has a subject whose score is above 80. First we select all Score table records where the Score value is greater than 80 and perform distinct by StudentID to generate a set of records of unique student IDs whose corresponding scores are above 80. Then we get Student table records whose IDs are included in that set of records.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Student")

/ Query Student table

3

=A1.query("select * from Score")

/ Query Score table

4

=A3.select(Score>80)

/ Get Score table records where score value is greater than 80

5

=A4.id(StudentID)

/ id function performs distinct operation by student ID

6

=A2.join@i(ID, A5)

/ Use A.join@i() function to perform join filtering

7

=A6.groups(Class; count(1):StudentCount)

/ Group the joining records and count the eligible students in each class

A7s result:

Class

StudentCount

Class  1

9

Class  2

11

In an optimized and efficient way, test and get records from a table according to whether a record is included in the non-foreign key references in its associated table.

Example 5Get the number of customers in each city who placed an order in 2014. Below are Sales table and Customer table:

undefined 

Analysis

When getting records from the Sales table, we need to check whether the current customer placed an order in 2014. First we select sales table records of 2014 and perform distinct by customer ID to generate a set of records of unique customer IDs who have sales records in 2014. Then we get Costomer table records whose IDs are included in that set of records.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Customer")

/ Query Customer table

3

=A1.query("select * from Sales where  year(Date)=2014 order by CustomerID")

/ Query sales records of 2014 and sort them by customer ID

4

=A3.groups@o(ID)

/ As the selected records are already ordered, groups() function works with @o option to perform distinct by customer ID

5

=A2.join@i(ID, A4:CustomerID)

/Use A.join@i() function to perform a join filtering

6

=A5.groups(City; count(1):CustomerCount)

/ Group the joining records and count the eligible customers in each city

A6s result:

City

CustomerCount

Dongying

6

Tangshan

7

 

4. Foreign key mapping non-membership test

Get records from a table according to whether a record is not included in the non-foreign key references in its associated table.

Example 6Find the total sales amount of each new customer in the year of 2014. Below are Sales table and Customer table:

undefined 

Analysis

When getting records from the Sales table, we need to make sure the current customer didnt place an order in 2014. First we select sales table records of 2014 and then we get those whose customer IDs are not included in the Customer table.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from Sales where  year(OrderDate)=2014")

/ Query sales records of 2014

3

=A1.query("select * from  Customer")

/ Query Customer table

4

=A2.join@d(CustomerID ,A3:ID)

/Use A.join@d() function to get sales records whose customer IDs are not included in Customer table

5

=A4.groups(CustomerID; sum(Amount):Amount)

/ Group the selected records and sum sales amounts of each customer

A5s result:

CustomerID

Amount

DOS

11830.1

HUN

57317.39

 

Find more examples in SPL CookBook