Alignment Operations between Ordered Sets

 

Abstract

We can perform alignment operations on two ordered sets. Such as comparing the sizes of two sets, checking whether two sets are equal, etc. Read this article to find ways of handling these operations conveniently and efficiently and their esProc sample scripts.

 

The alignment operations over ordered sets include comparison operators (>, <, ==, etc.) and basic arithmetic operations (represented by +, -, *, /, %, \). The sets under discussion are ordered sets whose members are already sorted out. The alignment operations on two ordered sets compare their members or do arithmetic over members in corresponding positions in order.

1. Size comparison

We can use operators ">" and "<" to compare corresponding members of two sets from the first pair in order. To compare [1,3,1] and [1,2,2], for instance, we begin from the first pair of members, [1,1]. They are equal, and then go on to compare the second pair. Since 3>2, there is no need to compare other members and we get [1,3,1] > [1,2,2].

Example 1Based on the Olympic Medal table, find the Games when China is before Russia in the table. Below is part of the table:

Game

Nation

Medal

Game

30

USA

46,29,29

30

30

China

38,27,23

30

30

UK

29,17,19

30

30

Russia

24,26,32

30

30

Korea

13,8,7

30

According to the Olympic rule of ranking, nations are ranked in descending order by the number of gold medals they have won. If nations have same number of gold medals, they will be ranked by the number of silver medals in descending order; and then by the number of copper medals for nations having same total silver medals. Nations getting same total copper medals come equal.

SPL script


A

B

1

=file("Olympic.csv").import@cqt()

/ Import the Olympic medal table for all games

2

=A1.run(Medal=Medal.split@c())

/ Split each Medal value into a set of medal totals by comma

3

=A2.group(Game)

/ Group A2s records by game

4

=A3.select(~.select(Nation=="China").Medal>~.select(Nation=="Russia").Medal)

/ Use ">" to compare Chinas medal total set and Russias in the order of gold, silver and copper, and select the games when China goes ahead

5

=A4.(Game)

/ List the eligible games

A5s result:

Game

23

25

28

29

30

 

2. Equality comparison

Its common to compare two ordered sets to find if they are equal. Specific scenarios include the comparison of file content or table values, and others (compare [1,2,3] and [2,1,3] to see if they are equal, for instance). Generally two sets are regarded as unequal if any members in the corresponding positions are not consistent. But in certain occasions the order is unimportant and the greatest concern is to find if both sets have same members.

Example 2The following file stores random samples. We want to find if two random samples get same IDs. Below is part of the source data:

ID

Predicted_Y

Original_Y

10

0.012388464367608093

0.0

11

0.01519899123978988

0.0

13

0.0007920238885061248

0.0

19

0.0012656367468159102

0.0

21

0.009460545997473379

0.0

23

0.024176791871681664

0.0

SPL script


A

B

1

=file("p_old.csv").import@ct()

/ Import the first file exported

2

=file("p_new.csv").import@ct()

/ Import the second file exported

3

=cmp(A1.(ID),A2.(ID))

/Use cmp() function to compare if the IDs in the two files are equal (the corresponding member values are equal and their orders are consistent) 

A3s result:

Member

0

The result is 0, which means the two files contain completely same IDs.

If the order of IDs are different, we can use eq() function to find whether the two sets have same members.


A

B

3

=A1.(ID).eq(A2.(ID))

/Use eq() function to compare if the two files contain same IDs whose order can be different 

 

3. Basic arithmetic operations on sets

The alignment operations include the basic arithmetic calculations (+,-,*,/,%,\). For instance, the set of sales amounts of store A in 3 days is [2,3,4] and that of sales amount of store B is [3,1,3], so the set of total sales amounts of the two stores is [5,4,7].

Example 3Calculate the daily relative yield between SZSE 300 (399007) and SZSE Component Index (399001) during the period of December 24 -26, 2019. Below is part of the source data:

Date

Code

Name

Open

Close

Amount

2020/2/18

399001

Shenzhen

11244.7651

11306.4863

3.19E+11

2020/2/17

399001

Shenzhen

10974.9328

11241.4993

3.12E+11

2020/2/14

399001

Shenzhen

10854.4551

10916.3117

2.77E+11

2020/2/13

399001

Shenzhen

10936.5011

10864.3222

2.87E+11

2020/2/12

399001

Shenzhen

10735.0475

10940.7952

2.66E+11

SPL script


A

B

1

=connect("db")

/Connect to data sourece

2

=["399007","399001"].(A1.query("select  * from StockIndex where code=? and date between'2019-12-23'and'2019-12-26'",~))

/ Import data of SZSE 300 and SZSE Component Index from Dec. 23-26, 2019; the data of Dec. 23 is for calculating the growth rate

3

=A2.(~.calc(to(2,4),Close/Close[-1]))

/ Calculate the growth rate for each of the 3 days

4

=A3(1)--A3(2)

/ Use "--" to perform alignment subtraction between the two set to get the relative yields; the sign ?? (?{+,-,*,/,%,\}) is used to perform alignment operations between sets

A4s result:

Member

0.0031349096521252617

0.0011897141619391371

-4.4910504685946595E-4

 

Find more examples in SPL CookBook