SPL: One-to-many Table Associations

 

A JOIN operation is used to join records of two or more tables. This essay explains SPL’s way of handling JOINs and the underlying principles.

Table association relationships are data dependence relationships between tables. There are four types of table association relationships – one-to-one, many-to-one, one-to-many and many-to-many. We can combine two or more tables through one of those relationships to achieve the multi-table join query.

In a one-to-many association relationship, one record of a table corresponds to any number of records in the other table. In a one-to-many relationship, the table at the “one”-side is the primary table and table at the “many”-side is the sub table (or the slave table). Take ORDERS table and ORDER_DETAIL table as an example, each order detail record corresponds to one order ID but one order ID may correspond to multiple order detail records. We call ORDERS table the primary table and ORDER_DETAIL table the sub table.

Example 1 Based on ORDERS table and ORDER_DETAIL table, we are trying to calculate the total amount of each order. Below is part of the data:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

When only one pair of primary-child tables is involved, we use join() function to join the two tables by matching part of the child table’s primary key to the primary table’s primary key.

SPL script for achieving the join query:


A

1

=T("Orders.txt")

2

=T("OrderDetail.txt")

3

=join(A1:Orders,ID; A2:Detail,ID)

4

=A3.groups(Orders.ID; sum(Detail.PRICE*Detail.COUNT):AMOUNT)

A1: Import ORDERS table from the corresponding file.

A2: Import ORDER_DETAIL table from the corresponding file.

A3: Join the two tables on order ID.

A4: Group records by order ID and sum the amounts in each group.

Example 2 The ORDERS table has another sub table that records payment status. Now we want to find orders that are not fully paid, i.e., those whose total payment is less than the total order amount. Below is part of the data:

ORDERS:

ID

CUSTOMERID

EMPLOYEEID

ORDER_DATE

ARRIVAL_DATE

10248

VINET

5

2012/07/04

2012/08/01

10249

TOMSP

6

2012/07/05

2012/08/16

10250

HANAR

4

2012/07/08

2012/08/05

10251

VICTE

3

2012/07/08

2012/08/05

10252

SUPRD

4

2012/07/09

2012/08/06

ORDER_DETAIL:

ID

ORDER_NUMBER

PRODUCTID

PRICE

COUNT

DISCOUNT

10814

1

48

102.0

8

0.15

10814

2

48

102.0

8

0.15

10814

3

48

306.0

24

0.15

10814

4

48

102.0

8

0.15

10814

5

48

204.0

16

0.15

ORDER_PAYMENT:

ID

PAY_DATE

AMOUNT

CHANNEL

INSTALMENTS

10814

2014/01/05

816.0

3

0

10848

2014/01/23

800.25

2

1

10848

2014/01/23

800.25

0

0

10848

2014/01/23

800.25

3

1

10966

2014/03/20

572.0

2

1

It is wrong to simply join the three table together because there are a many-to-many relationship between ORDER-DETAIL records and ORDER_PAYMENT records.

Yet if each of the sub tables is grouped by order ID, the ID field becomes the unique, actual primary key. Then we can join the three tables on ID fields because they are now homo-dimension tables having one-to-one relationships.

SPL script for achieving the join query:


A

1

=T("Orders.txt")

2

=T("OrderDetail.txt")

3

=T("OrderPayment.txt")

4

=A2.groups(ID; sum(PRICE*COUNT):AMOUNT)

5

=A3.groups(ID; sum(AMOUNT):PAY_AMOUNT)

6

=join(A1:Orders,ID; A4:Detail,ID; A5:Payment,ID)

7

=A6.new(Orders.ID, Detail.AMOUNT, Payment.PAY_AMOUNT)

8

=A7.select(PAY_AMOUNT<AMOUNT)

A1: Import ORDERS table from the corresponding file.

A2: Import ORDER_DETAIL table from the corresponding file.

A3: Import ORDER_PAYMENT table from the corresponding file.

A4: Group ORDER_DETAIL records by order ID and sum the amounts in each group.

A5: Group ORDER_PAYMENT records by order ID and sum the payment amounts in each group.

A6: Use join() function to join ORDERS table, and the summarized ORDER_DETAIL and ORDER_PAYMENT tables on order ID.

A7: Create a new table sequence made up of three fields – order ID, order amount and payment amount.

A8: Find from A7 the records where the payment amount is less than the order amount, i.e., the orders not full paid.

Generally, SPL uses two steps to handle the one-to-many association (between the primary table and the sub table). First, group each sub table by part of the primary key (which is the primary table’s primary key) to make they have same actual primary key with the primary table. Second, join tables on their primary keys (or actual primary keys).

OrderDetail.txt

OrderPayment.txt

Orders.txt