Existence Checking for Structured Data

 

When retrieving data from a table, we sometimes need to check whether a member exists according to a specific condition. The condition could come from another data table. For instance, there is a score field in score table and we want to get records from student table where every subject gets a score above 80. So through a series of examples, let’s see how to handle such existence checking tasks conveniently and efficiently. The article provides esProc sample script for each example. Looking Existence Checking for Structured Data for details.

 

1. Existence checks of foreign key mapping

Find records according to the existence of foreign key mapping based on two associated tables.

Example 1 Based on the following Score table and Student table, calculate the average score of each boy in class one.

undefined

Analysis

When retrieving data from Score table, we check whether the current record is about a boy in class one. Select it out if it is.

SPL script


A

B

1

=connect("db")

/Connect to database

2

=A1.query("select * from Student")

/Query Student table

3

=A1.query("select * from  Score")

/Query Score table

4

=A3.select(Class=="Class 1"  && Gender=="Male")

/ Select records of boys in class one

5

=A2.join@i(Class:StudentID,   A4:Class:ID)

/ A.join@i() function joins Student table and the selected Score records and performs filtering

6

=A5.groups(StudentID;   avg(Score):Score)

/ Group A5’s records and calculate each   boy’s average score

A6’s result:

StudentID

Score

1

76

3

74

When the foreign key table is too large to fit into memory, we can use cursor to retrieve data from source tables and then perform order-based merge between cursors.

Example 2 Based on the following are Order table and Detail table, find the number of orders that don’t use the discount in each month of the year 2014.

undefined

Analysis

When retrieving data from the Order table, we check whether the discount in the current record is 0 and select it out if it is.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.cursor("select * from Order   where year(Date)=2014  order by   ID")

/ Create cursor of Order table and get records of 2014

3

=A1.cursor("select * from Detail   order  by ID")

/ Create cursor of Detail table

4

=A3.select(Discount==0)

/ Get Detail records that don’t use the discount

5

=joinx(A2:Order,ID;A4:Detail,ID)

/ joinx function performs order-based   merge between A2 and A4’s cursors

6

=A5.groups(month(Order.Date):Month;  icount(Order.ID):OrderCount)

/ Group merged records by month and   count orders in each month

A6’s result:

Month

OrderCount

1

16

2

25

2. Checks through non-equi join

We search for data in a table according to whether a specific condition is met through the non-equi join.

Example 3 Based on the following order data, we want to find the amount of orders that span over a year. Below is part of the source data:

ID

NUMBER

AMOUNT

DELIVERDATE

ARRIVALDATE

10814

1

408.0

2014/01/05

2014/04/18

10814

2

204.0

2014/02/21

2014/04/05

10814

3

102.0

2014/03/14

2014/04/06

10814

4

102.0

2014/04/09

2014/04/27

10814

5

102.0

2014/05/04

2014/07/04

10848

1

873.0

2014/01/06

2014/04/21

Analysis

When retrieving data from the Order table, we check whether there are any records whose order span over a year and select them out if there is.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from   Detail")

/ Query Detail table

3

=A2.group(ID)

/ Group Detail table by ID

4

=A3.select(interval(~.min(DELIVERDATE),    ~.max(ARRIVALDATE)) > 365)

/ Get records of same order ID where   the latest arrival date is at least 365 days after the earlier deliver date

5

=A4.new(ID, ~.sum(AMOUNT):Amount)

/ Create a new data table and sum   amounts of each eligible order

A5’s result:

ID

Amount

10998

6800.0

11013

4560.0

11032

20615.0

 

3. Non-existence checks of foreign key mapping

Find records according to the non-existence of foreign key mapping based on two associated tables.

Example 4Based on the following Student table and Score table, find students who get scores above 80 for all subjects:

undefined

Analysis

When retrieving data from the Student table, we check whether there is a score below 80 in the current record and, if it doesn’t have any, select it out.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Student")

/ Query Student table

3

=A1.query("select * from   Score")

/ Query Score table

4

=A3.select(Score<=80)

/ Get Student records where there is a score of or below 80

5

=A4.id(StudentID)

/ Perform distinct over the selected   records

6

=A2.join@d(ID, A5)

/ A.join@d() function gets unmatching   records from the Student table

A6’s result:

ID

Class

Name

2

Class  1

Ashley

16

Class  2

Alexis

 

4. Checks through double negatives

Find matching records through double negaives.

Example 5 Based on the following Student table, SelectCourse table and Course table, find students who select all courses.

undefined

Analysis

When retrieving records from the Student table, we check whether there is any course the current student doesn’t select and, if there isn’t one, select it out. We can also think the opposite in the positive way to select the records where the number of selected courses is equal to that of all courses.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Student")

/ Query Student table

3

=A1.query("select * from Course")

/ Query Course table

4

=A1.query("select * from  SelectCourse")

/ Query SelectCourse table

5

=A4.groups(StudentID;  icount(CourseID):CourseCount)

/ Group the SelectCourse table by student ID and count the courses selected   by each student

6

=A5.select(CourseCount==A3.len())

/ Get student IDs that select all   courses

7

=A2.join@i(ID, A6:StudentID)

/ A.join@i() function performs   filtering join

A7’s result:

ID

Name

Class

4

Emily  Smith

Class  1

 

5. Checks of existence of any values meeting a specific condition

Find records according to the existence of any two values that meet a specific condition based on two associated tables.

Example 6 Based on the following Student table and Score table, find Student records where there are any two scores whose difference is over 30.

undefined

Analysis

When retrieving records from the Student table, we check if there are records where scores of any subjects have a 30 difference between them, and select it out if there are.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Student")

/Query Student table

3

=A1.query("select * from   Score")

/ Query Score table

4

=A3.group(StudentID)

/ Group Score table by student ID

5

=A4.select(~.max(Score)-~.min(Score)>30)

/ Get students whose highest score and   lowest score have a 30 difference between them

6

=A5.id(StudentID)

/ Perform distinct by student ID

7

=A2.join@i(ID,A6)

/ A.join@i() performs a join filtering

A7’s result:

ID

Name

Class

4

Emily  Smith

Class  1

8

Megan

Class  1

 

6. Checks of all eligible members meeting a specific condition

Find all records meeting a specific condition from a data table.

Example 7Based on the following Employee table, find the employees whose salaries are higher than those of all salespeople. Below is part of the source data:

ID

NAME

DEPT

SALARY

1

Rebecca

R&D

7000

2

Ashley

Finance

11000

3

Rachel

Sales

9000

4

Emily

HR

7000

5

Ashley

R&D

16000

Analysis

When retrieving data from the Employee table, we check whether the current employee’s salary is higher than those of all salespeople, and select it out if it is.

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  Employee")

/Query Employee table

3

=A2.select(DEPT:"Sales").max(SALARY)

/Get records of sales department and get the highest salary   in the department

4

=A2.select(SALARY>A3)

/ Find records where salaries are   higher than the highest salary in sales department

A4’s result:

ID

NAME

DEPT

SALARY

5

Ashley

R&D

16000

20

Alexis

Administration

16000

22

Jacob

R&D

18000

47

Elizabeth

Marketing

17000

 

Read SPL CookBook to find more related examples.