Accessing Members of a Structured Data Set by Sequence Numbers

 

Sequence numbers (or indexes/subscripts) can be used to access members of an ordered set. There are various requirements in daily analytic scenarios, such as using one sequence number to access one member each time, or using multiple sequence numbers to access a number of members at once, or using one or more sequence numbers to access one or more members from back to front, or even accessing members according to a specified span, etc. This article will illustrate how to implement these requirements efficiently and quickly and offer sample scripts in esProc SPL. Looking Accessing Members of a Structured Data Set by Sequence Numbers for details.

 

1. Access one member by one sequence number each time

We can get one record from a data table according to one sequence number.

Example 1 Get the information of the first and the last transaction days in Shanghai Stock Exchange in the year of 2019. Below is part of the source data:

Date

Open

Close

Amount

2019/12/31

3036.3858

3050.124

2.27E11

2019/12/30

2998.1689

3040.0239

2.67E11

2019/12/27

3006.8517

3005.0355

2.58E11

2019/12/26

2981.2485

3007.3546

1.96E11

2019/12/25

2980.4276

2981.8805

1.9E11

SPL script


A

B

1

=file("000001.csv").import@ct()

/ Import the source data file

2

=A1.select(year(Date)==2019).sort(Date)

/ Select records of 2019 and sort them   by date

3

=A2(1)|A2.m(-1)

/ Get records of the first and the   last transaction days in SSE. A2(1) function gets the first record of the   table sequence and A2.m(-1) function gets the last one

A3’s result:

Date

Open

Close

Amount

2019/01/02

2497.8805

2465.291

9.76E10

2019/12/31

3036.3858

3050.124

2.27E11

At certain occasions we need to get the sequence number of member in the specified position, from back to front sometimes. To get the sequence number of the second to last transaction date from the above records ordered by date, for instance, we can use the parameter -2 to do this.

Example 2 Based on the EMPLOYEE table, calculate the average salary for the states of [California, Texas, New York, Florida] and for other states as a whole, which are classified as “Other”. Below is part of the source data:

ID

NAME

STATE

SALARY

1

Rebecca

California

7000

2

Ashley

New  York

11000

3

Rachel

New  Mexico

9000

4

Emily

Texas

7000

5

Ashley

Texas

16000

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from  EMPLOYEE")

/ Query EMPLOYEE table

3

[California,Texas,New York,Florida]

/ Define a sequence of states

4

=A2.align@an(A3,STATE)

/ Group records of EMPLOYEE table by the specified   states; @a option enables returning all matching records for each group, and   @n option creates a new group to hold the unmatching records

5

=A4.new(if  (#>A3.p(-1),"Other",STATE):STATE,~.avg(SALARY):AvgSalary)

/ Calculate the average salary in each   group and generate a new table sequence; A.p(-1) function gets the sequence   number of the last member; change the last group name to “Other”

A5’s result:

STATE

SALARY

California

7700.0

Texas

7592.59

New  York

7677.77

Florida

7145.16

Other

7308.1

 

2. Access members by multiple sequence numbers each time

At times we need to get a number of records according to multiple sequence numbers. For example, we can use a set of sequence numbers [4,5,6] to access records of the second quarter from a certain year’s sales table ordered by months; or we can access the weekend duty records using a set of sequence numbers [1,7] from a weekly on-duty table.

Example 3 The following is part of a table that records daily attendant information:

Per_Code

in_out

Date

Time

Type

1110263

1

2013-10-11

09:17:14

In

1110263

6

2013-10-11

11:37:00

Break

1110263

5

2013-10-11

11:38:21

Return

1110263

0

2013-10-11

11:43:21

NULL

1110263

6

2013-10-11

13:21:30

Break

1110263

5

2013-10-11

14:25:58

Return

1110263

2

2013-10-11

18:28:55

Out

We want to group the table every 7 records and then convert it to the following format:

Per_Code

Date

In

Out

Break

Return

1110263

2013-10-11

9:17:14

18:28:55

11:37:00

11:38:21

1110263

2013-10-11

9:17:14

18:28:55

13:21:30

14:25:58

SPL script


A

B

1

=connect("db")

/ Connect to database

2

=A1.query("select * from   DailyTime  order by   Per_Code,Date,Time")

/ Query the source table and sort it   by code, date and time

3

=A2.group(Per_Code,Date)

/ Group records by code and date

4

=create(Per_Code,Date,In,Out,Break,Return)

/ Create an empty table that stores   the final result

5

=A3.(~([1,7,2,3,1,7,5,6]))

/ A([1,7,2,3,1,7,5,6]) function gets   records from each group in order to generate the ordered whole record of a   date

6

=A5.conj([~.Per_Code,~.Date]|~.(Time).m([1,2,3,4])|[~.Per_Code,~.Date]|~.(Time).m([5,6,7,8]))

/ Concatenate values of all record in   each group to one set, during which A.m() is used to access multiple members

7

>A4.record(A6)

/ Populate values to A4’s table

A4’s result:

Per_Code

Date

In

Out

Break

Return

1110263

2013-10-11

9:17:14

18:28:55

11:37:00

11:38:21

1110263

2013-10-11

9:17:14

18:28:55

13:21:30

14:25:58

We can also access members from back to front.

Example 5 Calculate the growth rate of each closing price in SSE’s last 10 trading dates of the year 2019 compared with the previous closing price. Below is part of the source data:

Date

Open

Close

Amount

2019/12/31

3036.3858

3050.124

2.27E11

2019/12/30

2998.1689

3040.0239

2.67E11

2019/12/27

3006.8517

3005.0355

2.58E11

2019/12/26

2981.2485

3007.3546

1.96E11

2019/12/25

2980.4276

2981.8805

1.9E11

SPL script


A

B

1

=file("000001.csv").import@ct()

/Import the source data file

2

=A1.select(year(Date)==2019).sort(Date)

/ Select records of 2019 and sort them   by date

3

=A2.p(to(-10,-1))

/ A.p() function returns the sequence   numbers of the last records

4

=A3.new(A2(~).Date:Date,  string(A2(~).Close/A2(~-1).Close-1,   "0.000%" ):Increase)

/ Circularly calculate the growth rate   of each closing price compared with the previous one

A4’s result:

Date

Increase

2019/12/18

-0.178%

2019/12/19

0.001%

2019/12/20

-0.402%

2019/12/23

-1.404%

2019/12/24

0.673%

 

3. Access members by the specified span

Another scenario is to access a series of members beginning from the specific sequence number according to a specified span. To get one sample from every 10 records in a data table, for instance, we can begin from the first record and get one within every ten. Another instance is to find all multiples of 3 from the natural numbers from 1 to 100. To do that, we can begin from the third number and get one every 3 numbers.

Example 6 Find the prime numbers within 100.

SPL script


A

B

1

=to(100)

/ Define a set of numbers from 1 to   100

2

=to(2,10)

/ Define a set of numbers from 2 to 10

3

=A2.(A1.step(~,~*2))

/ A1.step(~,~*2) calculates the multiples   (n times and n>1) of each member in A2 within 100

4

=A1.to(2,)\A3.conj()

Get all prime numbers within 100 by   removing 1 and all composite numbers within the same range; A3.conj() finds   all composite numbers within 100

A4’s result:

Member

2

3

5

7

11

13

17

19

 

Find more examples in SPL CookBook.