Comparison of SQL & SPL: Dynamic Transposition

 


Abstract

Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.

 

Dynamic row-to-column transposition cannot specify to-be-generated fields in advance but can only define them on an ad hoc basis according to the original fields.

 

1. Automatically generate a column according to field values

Example 1Based on the following employee table, calculate the average salary in different regions for each department. Below is part of the source table:

ID

NAME

SURNAME

STATE

DEPT

SALARY

1

Rebecca

Moore

California

R&D

7000

2

Ashley

Wilson

New   York

Finance

11000

3

Rachel

Johnson

New   Mexico

Sales

9000

4

Emily

Smith

Texas

HR

7000

5

Ashley

Smith

Texas

R&D

16000

Below is the result of the expected layout:

DEPT

California

Colorado

Florida

Administration



9333.333

 

Finance

8000

5000

10000

HR

10000


7000

 

SQL solution:

   Field names in the result set are extracted from STATE field values in the original table. As SQL does not allow using a non-constant expression as the PIVOT/UNPIVOT value, PIVOT cannot be used to handle the dynamic row-to-column transposition. SQL also does not support returning a dynamic data structure for achieving the dynamic row-to-column transposition.

To deal with this task using the database, we can turn to the stored procedure to create a dynamic SQL statement. Since what we focus on is the comparison of SQL and SPL, we just skip the stored procedure solution.

 

SPL solution:

   This row-to-column transposition gets its target fields from the original table. SPL offers A.pivot() function to do this. When no target fields are specified, the function will automatically extract names for the target fields.


A

1

=T("Employee.csv")

2

=A1.groups(STATE,DEPT;avg(SALARY):AVG_SALARY)

3

=A2.pivot(DEPT; STATE, AVG_SALARY)

A1: Import the Employee table.

A2: Group the table by STATE and DEPT to calculate the average salary in each state for each department.

A3: A.pivot() function is used to perform the row-to-column transposition. It will automatically extract names for the target fields if they are not specified.

 

2. Dynamically generate column names through calculation

Example 2Based on the following income details table, get the income information of each employee with categories generated automatically. Below is part of the source table:

NAME

SOURCE

INCOME

David

Salary

8000

David

Bonus

15000

Daniel

Salary

9000

Andrew

Shares

26000

Andrew

Sales

23000

Below is the result of the expected layout:

NAME

SOURCE1

INCOME1

SOURCE2

INCOME2

Andrew

Shares

26000

Sales

23000

Daniel

Salary

9000



David

Salary

8000

Bonus

15000

Robert

Bonus

13000



 

SQL solution:

   In this case, column names need to be dynamically generated through calculation according to the original field values. The number of columns, even the column names, cannot be defined in advance. SQL cannot implement this type of dynamic row-to-column transposition.

 

SPL solution:

   The target fields are not extracted from a specific field dynamically but need to be calculated dynamically. Here A.pivot() function is not suitable. We can first generate the target data structure according to the group with the most income categories and then populate data into it.


A

1

=T("Income.txt").group(NAME)

2

=A1.max(~.len())

3

=create(NAME,   ${A2.("SOURCE"/~/", INCOME"/~).concat@c()})

4

>A1.(A3.record(~.NAME | ~.conj([SOURCE,   INCOME])))

A1: Import the Income table and group it by name.

A2: Get the largest number of members among the groups, which is the largest number of income categories.

A3: Create an empty table by generating column names dynamically according to A3’s result.

A4: Loop through each group to populate the names, income sources, and amounts to A3’s empty table.

3. Dynamic row-to-column transposition through table join

Example 3Based on the following ORDERS table, ORDER_DETAIL table, and PRODUCT table, generate a table recording information of product purchased each day by each customer in the year 2014. The relationship between ORDERS table and ORDER_DETAIL table is one to many. Each order corresponds to multiple order detail records. The relationship between ORDER_DETAIL table and PRODUCT table is many to one. The ID field of ORDER_DETAIL table points to the ID field of PRODUCT table. Below are parts of the three tables:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

PRODUCT:

ID

NAME

SUPPLIERID

CATEGORY

1

Apple Juice

2

1

2

Milk

1

1

3

Tomato sauce

1

2

4

Salt

2

2

5

Sesame oil

2

2

 

Below is result of the expected layout

ORDER_DATE

CUSTOMERID

PRODUCT1

COUNT1

PRODUCT2

COUNT2

2014/1/5

VICTE

Corn flakes

8




2014/1/23

CONSH

Chicken

3

Cake

9


 

SQL solution:

   SQL cannot implement this type of dynamic row-to-column transposition, too.

 

SPL solution:

   Join up the three source tables through their associated relationships and do the subsequent calculations using the methods for handling the previous two examples.


A

1

=T("Orders.txt").select(year(ORDER_DATE)==2014)

2

=T("Product.txt")

3

=T("OrderDetail.txt").switch(PRODUCTID,A2:ID).group(ID)

4

=join(A1:ORDERS,ID;A3:DETAIL,ID)

5

=create(DATE,CUSTOMERID,${A4.max(DETAIL.len()).("PRODUCT"/~/","/"COUNT"/~).concat@c()})

6

>A4.run(A5.record([ORDERS.ORDER_DATE,ORDERS.CUSTOMERID]|DETAIL.([PRODUCTID.NAME,COUNT]).conj()))

A1: Import Orders table and select records of the year 2014.

A2: Import Product table.

A3: Import OrderDetail table and join it with Product table through PRODUCTID field; replace PRODCUTID field values with corresponding Product records; and then group the joining result by order ID.

A4: Join up Orders table with OrderDetail table through order ID.

A5: Generate target data structure according to the group containing the most OrderDetail records and create an empty table.

A6: Loop through A4’s table to populate data into A5’s empty table in turn.

 

   Actually, this case is essentially the same as the previous two cases. It just has one more step to join up the three tables together. The steps after that are the same.

 

Summary

Through the three essays about the comparison of SQL and SPL in handling transposition, we can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them. For dynamic transpositions, SQL just cannot achieve them. Usually, it turns to the more roundabout way, like the stored procedure, to generate the dynamic SQL query.

SPL provides flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.

   When query is complex, SQL statement becomes even more complicated. Temporary tables and nested queries are methods it often uses. This makes it hard to write and maintain the SQL statement. SPL, by contrast, can write the logic according to the natural way of thinking and generate concise code step by step.

The SPL-driven esProc is a professional data computation engine. It is based on ordered sets and provides a complete set of set-oriented operations, making it the combination of JAVA advantages and SQL merits. Transpositions become easy and simple with SPL.