The Select Operation on Structured Data

 

A select operation gets one or more members from a set according to a specific condition. It is similar to a locate operation. But the latter is concerned with the positions of certain members while the select operation wants the detailed information of a member record. Getting the incomes of the employees whose hire durations are over 10 years and finding names of customers whose order amounts are above 10,000 are among the many examples of select operations. But to perform selections conveniently and efficiently, you’d better read this article carefully where various scenarios are listed and sample scripts in esProc SPL are provided. Looking The Select Operation on Structured Data for details.

 

1. Get members meeting the specific condition

Example 1 List European and African cities with a population of more than 2 million and their populations in column groups (each column group is ordered by population in descending order). Below is part of the world’s urban population table:

Continent

Country

City

Population

Africa

Egypt

Cairo

6789479

Asia

China

Shanghai

24240000

Europe

Britain

London

7285000

The expected result:

Europe    City

Population

Africa    City

Population

Moscow

8389200

Cairo

6789479

London

7285000

Kinshasa

5064000

St  Petersburg

4694000

Alexandria

3328196

SPL script


A

B

1

=connect("db").query("select    * from World where Continent in('Europe','Africa')   and Population >=  2000000")

/ Connect to database and get records   of European and African cities having a population of over 2 million

2

=A1.select(Continent:"Europe")

/ A.select()function gets records of   Europe

3

=A1.select(Continent:"Africa")

/A.select()function gets records of   Africa

4

=create('Europe   City',Population,'Africa  City',   Population)

/ Create an empty table according to   the target structure

5

=A4.paste(A2.(City),A2.(Population),A3.(City),A3.(Population))

/ A.paste()function pastes values to   corresponding columns

A4’s result:

Europe    City

Population

Africa    City

Population

Moscow

8389200

Cairo

6789479

London

7285000

Kinshasa

5064000

St  Petersburg

4694000

Alexandria

3328196

 

Example 2 Get full names and salaries of employees in R&D department in the New York state. Below is part of the source data:

ID

NAME

SURNAME

STATE

DEPT

SALARY

1

Rebecca

Moore

California

R&D

7000

2

Ashley

Wilson

New  York

Finance

11000

3

Rachel

Johnson

New  Mexico

Sales

9000

4

Emily

Smith

Texas

HR

7000

5

Ashley

Smith

Texas

R&D

16000

SPL script


A

B

1

=connect("db")

/ Connect to the data source

2

=A1.query("select * from  Employee")

/ Import Employee table

3

=A2.select(STATE=="New   York"&&DEPT=="R&D")

/ A.select()function gets records of   employees of R&D department in New York; the double ampersand sign ("&&")   is used to connect two conditions that need to be met at the same time

4

=A3.new(NAME+""+SURNAME:FULLNAME,   SALARY)

/ Use the plus sign (+) to connect the   name string and surname string into a full name

A4’s result:

FULLNAME

SALARY

Matthew  Johnson

6000

Lauren  Thomas

12000

Brooke  Williams

12000

 

Example 3 Find the number of employees who are below 30 or over 50 in each department. Below is part of the source data:

ID

NAME

BIRTHDAY

STATE

DEPT

SALARY

1

Rebecca

1974/11/20

California

R&D

7000

2

Ashley

1980/07/19

New  York

Finance

11000

3

Rachel

1970/12/17

New  Mexico

Sales

9000

4

Emily

1985/03/07

Texas

HR

7000

5

Ashley

1975/05/13

Texas

R&D

16000

SPL script


A

B

1

=connect("db")

/ Connect to the data source

2

=A1.query("select * from  Employee")

/ Import Employee table

3

=A2.select((age=age(BIRTHDAY),   age<30 ||  age>50))

/ A.select()function gets records of   employees below 30 or over 50; use the double vertical bar ("||")   to connect two conditions that are met if either of them are met

4

=A3.groups(DEPT; count(~):Count)

/ Group the selected records and count   the employees

A4’s result:

DEPT

Count

4

9

5

5

6

3

7

6

8

1

 

2. Get the record(s) holding the maximum/minimum value

Example 4 Based on the scores table, get the student ID with the lowest math score. Below is part of the source table:

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

1

English

84

Class  one

1

Math

77

Class  one

1

PE

69

Class  one

2

English

81

Class  one

2

Math

80

SPL script


A

B

1

=connect("db")

/ Connect to the database

2

=A1.query("select * from Scores   where  SUBJECT='Math'and CLASS='Class   one'")

/ Get math records for class one

3

=A2.minp(SCORE)

/A.minp() function gets the record   having the lowest score

4

=A3.STUDENTID

/ Get the student ID from A3’s record

There could be more than one record that contains the minimum value. To return all eligible records, you can use @a option in A.minp() function:


A

B

3

=A2.minp@a(SCORE)

/ A.minp() function works with @a   option to get all records containing the lowest score

4

=A3.(STUDENTID)

/ Get student IDs from A3’s records

A3’s result:

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

5

Math

60

Class  one

14

Math

60

A4’s result:

Member

5

14

 

Example 5 Based on the Olympics medal table, get all information of the country that holds the top place for the longest time in terms of total medals. Below is part of the source table:

Game

Nation

Gold

Silver

Copper

30

USA

46

29

29

30

China

38

27

23

30

UK

29

17

19

30

Russia

24

26

32

30

Korea

13

8

7

SPL script


A

B

1

=file("Olympic.csv").import@cqt()

/ Import the Olympic medal tally file

2

=A1.sort@z(Game,   1000000*Gold+1000*Silver+Copper)

/ Sort records by games and the totals

3

=A2.group@o1(Game)

/ Get the one record for each game,   which is the first record because records are already sorted in descending   order

4

=A3.group@o(Nation)

/ Group the selected records by   nations in their original order

5

=A4.maxp(~.len())

/ A.maxp() selects the group having   the most records, which are the country that holds the top place in a row for   the longest time

A5’s result:

Game

Nation

Gold

Silver

Copper

10

USA

41

32

30

9

USA

22

18

16

8

USA

45

27

27

7

USA

41

27

28

 

3. Get data by ranges

Sometimes we need to first find the sequence number of the range where a value belongs to and then get its corresponding members in a set. Examples include returning the assessment (Excellent, Good, Average, Failed) according to a student’s score, and finding a household’s income range (Below or near poverty level, Low income, middle class, Upper middle class) according to its annually income.

Example 6 Based on the score table, find the number of students in Excellent, Pass, and Failed ranges respectively for the English subject. Below is part of the source data:

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

1

English

84

Class  one

1

Math

77

Class one

1

PE

69

Class  one

2

English

81

Class  one

2

Math

80

SPL script


A

B

1

=connect("db").query("select    * from Scores where   SUBJECT='English'")

/ Connect to the database and get   records of English subject

2

=create(Assessment,Score).record(["fail",0,"pass",60,"excellent",90])

/ Create a lookup table of score ranges   and assessments

3

=A1.derive(A2.segp(Score,SCORE).Assessment:Assessment)

/ A.segp() function gets assessment   for each score according to the sequence number of the range where the score   falls in

4

=A3.groups(Assessment;count(1):Count)

/ Group A3’s records by assessment and   count students in each group

A4’s result:

Assessment

Count

excellent

6

Fail

4

Pass

18

 

4. Get the top/bottom N values

Example 7 Based on the score table, for each subject in every class, find the top 2 students’ IDs in terms of score. Below is part of the source data:

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

1

English

84

Class  one

1

Math

77

Class  one

1

PE

69

Class  one

2

English

81

Class  one

2

Math

80

SPL script


A

B

1

=connect("db")

/ Connect to the database

2

=A1.query("select * from   Scores")

/ Query Scores table

3

=A2.group(CLASS,SUBJECT;~.top(-2;SCORE):TOP2)

/ A.top() function gets the records of   top 2 students for each subject in each class; -2 means getting two records   in descending order

4

=A3.conj(TOP2)

/ Concatenate all top 2 records   together

 

A4’s result:

CLASS

STUDENTID

SUBJECT

SCORE

Class  one

4

English

96

Class  one

9

English

93

Class  one

13

Math

97

Class  one

10

Math

97

 

5. Query records by primary key values

It’s common to locate records according to primary key values, such as finding employee records by employee IDs, querying detailed orders data by orders IDs, etc.

Example 8 Based on the associated Course table and SelectCourse table, list a table of course selection information where each course occupies a column. Below is part of the Course table:

ID

NAME

TEACHERID

1

Environmental  protection and …

5

2

Mental  health of College Students

1

3

Computer  language Matlab

8

Here is part of the SelectCourse table:

ID

STUDENT_NAME

COURSE

1

Rebecca

2,7

2

Ashley

1,8

3

Rachel

2,7,10

The expected result:

ID

STUDENT_NAME

COURSE1

COURSE2

COURSE3

1

Rebecca

Mental  health of College Students

Into  Shakespeare


2

Ashley

Environmental  protection and …

Modern  economics


3

Rachel

Mental  health of College Students

Into  Shakespeare

Music  appreciation

SPL script


A

B

1

=connect("db")

/Connect to the database

2

=A1.query("select * from  Course").keys(ID)

/ Query Course table and set ID as the primary key

3

=A1.query("select * from  SelectCourse")

/ Query SelectCourse table

4

=A3.run(COURSE=COURSE.split@cp())

/ Split each Course value in SelectCourse table by comma and   reassign it to the field

5

=A4.max(COURSE.len())

/ Get the largest number of selected   courses

6

=create(ID,STUDENT_NAME,  ${A5.("COURSE"+string(~)).concat@c()})

/ Create an empty table where the   number of course columns is the number A5 gets

7

>A4.run(A6.record([ID,STUDENT_NAME]|COURSE.(A2.find(~).Name)))

/ Loop through the SelectCourse table to join and append   student IDs, names and the courses obtained through A.find() function to A6’s   table

A6’s result:

ID

STUDENT_NAME

COURSE1

COURSE2

COURSE3

1

Rebecca

Mental  health of College Students

Into  Shakespeare


2

Ashley

Environmental  protection and …

Modern  economics


3

Rachel

Mental  health of College Students

Into  Shakespeare

Music  appreciation

 

Find more examples in SPL CookBook.