SPL: Dynamic Transposition

 

The transposition function is commonly presented in the front-end such as reports, which transposes the queried data into a specific display layout. For example, row-to-column transposition, column-to-row transposition, more complex dynamictransposition, etc. Dynamic row-to-column transposition means the fields generated during transposition can not be specified in advance, but can only be determined dynamically according to the values of the original fields. So let’s take a look at how SPL handles the dynamic transposition.

1. Automatically generate a column according to field values

[Example 1] Based on the employee table, calculate the average salary of each department in different areas. Some of the data is shown below:

ID

NAME

SURNAME

STATE

DEPT

SALARY

1

Rebecca

Moore

California

R&D

7000

2

Ashley

Wilson

New York

Finance

11000

3

Rachel

Johnson

New Mexico

Sales

9000

4

Emily

Smith

Texas

HR

7000

5

Ashley

Smith

Texas

R&D

16000

The layout of the target table is expected to be as follows:

DEPT

California

Colorado

Florida

Administration



9333.333

Finance

8000

5000

10000

HR

10000


7000

The row-to-column transposition needs to get its target fields from the original table. The A.pivot() function in SPL can do such a transposition, which will automatically extract names of the target fields when no target fields are specified.

The SPL script is:


A

1

=T("Employee.csv")

2

=A1.groups(STATE,DEPT;avg(SALARY):AVG_SALARY)

3

=A2.pivot(DEPT; STATE, AVG_SALARY)

A1: Import the employee table.

A2: Group the table by STATE and DEPT to calculate the average salary of each department in each state.

A3: Use the A.pivot() function to transpose rows to columns, and retrieve the names of target fields automatically when they are not specified.

2. Dynamically generate column names through calculation

[Example 2] Based on the income details table, get the income information of each employee with categories generated automatically. Below is part of the source table:

NAME

SOURCE

INCOME

David

Salary

8000

David

Bonus

15000

Daniel

Salary

9000

Andrew

Shares

26000

Andrew

Sales

23000

The layout of the target table is expected to be as follows:

NAME

SOURCE1

INCOME1

SOURCE2

INCOME2

Andrew

Shares

26000

Sales

23000

Daniel

Salary

9000



David

Salary

8000

Bonus

15000

Robert

Bonus

13000



The target fields are not retrieved from a certain field dynamically but need to be calculated dynamically. Here the A.pivot() function is no longer available. We can generate the target data structure according to the group with the most income categories, and then fill it with data.

The SPL script is:


A

1

=T("Income.txt").group(NAME)

2

=A1.max(~.len())

3

=create(NAME, ${A2.("SOURCE"/~/", INCOME"/~).concat@c()})

4

>A1.(A3.record(~.NAME | ~.conj([SOURCE, INCOME])))

A1: Import the income details table, and group it by NAME.

A2: Calculate the maximum number of members in groups, i.e., the largest number of income categories.

A3: Generate column names dynamically according to the result in A3, and create an empty table.

A4: Loop through each group, and fill in the names, income sources, and amounts with the table created in A3.

3. Dynamic row-to-column transposition through table join

[Example 3] Based on the following ORDERS table, ORDER_DETAIL table, and PRODUCT table, generate a table recording information of products purchased each day by each customer in the year 2014. The relationship between ORDERS table and ORDER_DETAIL table is one to many, where each order corresponds to multiple order detail records. The relationship between ORDER_DETAIL table and PRODUCT table is many to one, where the product ID field of ORDER_DETAIL table points to the ID field of PRODUCT table. Below are parts of the three tables:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

PRODUCT:

ID

NAME

SUPPLIERID

CATEGORY

1

Apple Juice

2

1

2

Milk

1

1

3

Tomato sauce

1

2

4

Salt

2

2

5

Sesame oil

2

2

The layout of the target table is expected to be as follows:

ORDER_DATE

CUSTOMERID

PRODUCT1

COUNT1

PRODUCT2

COUNT2

2014/1/5

VICTE

Corn flakes

8




2014/1/23

CONSH

Chicken

3

Cake

9


We can first join up the three source tables through their associative relationships and perform the subsequent calculations using the same method in the previous two examples.

The SPL script is:


A

1

=T("Orders.txt").select(year(ORDER_DATE)==2014)

2

=T("Product.txt")

3

=T("OrderDetail.txt").switch(PRODUCTID,A2:ID).group(ID)

4

=join(A1:ORDERS,ID;A3:DETAIL,ID)

5

=create(DATE,CUSTOMERID,${A4.max(DETAIL.len()).("PRODUCT"/~/","/"COUNT"/~).concat@c()})

6

>A4.run(A5.record([ORDERS.ORDER_DATE,ORDERS.CUSTOMERID]|DETAIL.([PRODUCTID.NAME,COUNT]).conj()))

A1: Import the Orders table and select records of the year 2014.

A2: Import the Product table.

A3: Import the OrderDetail table, join it with the Product table through the PRODUCTID filed, convert the PRODUCTID filed values to corresponding product records, and then group the joining result by order ID.

A4: Join up Order table with OrderDetail table through order ID.

A5: Generate the target data structure according to the group with the most OrderDetail records, and create an empty table.

A6: Loop through the order data and fill them in the table created in A5 in turn.

In fact, this case is the same as the previous ones essentially with only one more step, that is, joining up the three tables through their associative relationships. The subsequent operations are the same, creating the target data structure as needed and filling it with data.