"A JOIN operation is used to join records of two or more tables. This essay explains SPL’s way of .."

Hoo RaqForum 19 No.
1 Reply • 471 View • 2 Years ago

SPL: One-to-many Table Associations

A JOIN operation is used to join records of two or more tables. This essay explains SPL’s way of handling JOINs and the underlying principles.

Table association relationships are data dependence relationships between tables. There are four types of table association relationships – one-to-one, many-to-one, one-to-many and many-to-many. We can combine two or more tables through one of those relationships to achieve the multi-table join query.

In a one-to-many association relationship, one record of a table corresponds to any number of records in the other table. In a one-to-many relationship, the table at the “one”-side is the primary table and table at the “many”-side is the sub table (or the slave table). Take ORDERS table and ORDER_DETAIL table as an example, each order detail record corresponds to one order ID but one order ID may correspond to multiple order detail records. We call ORDERS table the primary table and ORDER_DETAIL table the sub table.

【Example 1】 Based on ORDERS table and ORDER_DETAIL table, we are trying to calculate the total amount of each order. Below is part of the data:

ORDERS:

ID	CUSTOMERID	EMPLOYEEID	ORDER_DATE	ARRIVAL_DATE
10248	VINET	5	2012/07/04	2012/08/01
10249	TOMSP	6	2012/07/05	2012/08/16
10250	HANAR	4	2012/07/08	2012/08/05
10251	VICTE	3	2012/07/08	2012/08/05
10252	SUPRD	4	2012/07/09	2012/08/06
…	…	…	…	…

ORDER_DETAIL:

ID	ORDER_NUMBER	PRODUCTID	PRICE	COUNT	DISCOUNT
10814	1	48	102.0	8	0.15
10814	2	48	102.0	8	0.15
10814	3	48	306.0	24	0.15
10814	4	48	102.0	8	0.15
10814	5	48	204.0	16	0.15
…	…	…	…	…	…

When only one pair of primary-child tables is involved, we use join() function to join the two tables by matching part of the child table’s primary key to the primary table’s primary key.

SPL script for achieving the join query:

	A
1	=T("Orders.txt")
2	=T("OrderDetail.txt")
3	=join(A1:Orders,ID; A2:Detail,ID)
4	=A3.groups(Orders.ID; sum(Detail.PRICE*Detail.COUNT):AMOUNT)

A1: Import ORDERS table from the corresponding file.

A2: Import ORDER_DETAIL table from the corresponding file.

A3: Join the two tables on order ID.

A4: Group records by order ID and sum the amounts in each group.

【Example 2】 The ORDERS table has another sub table that records payment status. Now we want to find orders that are not fully paid, i.e., those whose total payment is less than the total order amount. Below is part of the data:

ORDERS:

ID	CUSTOMERID	EMPLOYEEID	ORDER_DATE	ARRIVAL_DATE
10248	VINET	5	2012/07/04	2012/08/01
10249	TOMSP	6	2012/07/05	2012/08/16
10250	HANAR	4	2012/07/08	2012/08/05
10251	VICTE	3	2012/07/08	2012/08/05
10252	SUPRD	4	2012/07/09	2012/08/06
…	…	…	…	…

ORDER_DETAIL:

ID	ORDER_NUMBER	PRODUCTID	PRICE	COUNT	DISCOUNT
10814	1	48	102.0	8	0.15
10814	2	48	102.0	8	0.15
10814	3	48	306.0	24	0.15
10814	4	48	102.0	8	0.15
10814	5	48	204.0	16	0.15
…	…	…	…	…	…

ORDER_PAYMENT:

ID	PAY_DATE	AMOUNT	CHANNEL	INSTALMENTS
10814	2014/01/05	816.0	3	0
10848	2014/01/23	800.25	2	1
10848	2014/01/23	800.25	0	0
10848	2014/01/23	800.25	3	1
10966	2014/03/20	572.0	2	1
…	…	…	…	…

It is wrong to simply join the three table together because there are a many-to-many relationship between ORDER-DETAIL records and ORDER_PAYMENT records.

Yet if each of the sub tables is grouped by order ID, the ID field becomes the unique, actual primary key. Then we can join the three tables on ID fields because they are now homo-dimension tables having one-to-one relationships.

SPL script for achieving the join query:

	A
1	=T("Orders.txt")
2	=T("OrderDetail.txt")
3	=T("OrderPayment.txt")
4	=A2.groups(ID; sum(PRICE*COUNT):AMOUNT)
5	=A3.groups(ID; sum(AMOUNT):PAY_AMOUNT)
6	=join(A1:Orders,ID; A4:Detail,ID; A5:Payment,ID)
7	=A6.new(Orders.ID, Detail.AMOUNT, Payment.PAY_AMOUNT)
8	=A7.select(PAY_AMOUNT<AMOUNT)

A1: Import ORDERS table from the corresponding file.

A2: Import ORDER_DETAIL table from the corresponding file.

A3: Import ORDER_PAYMENT table from the corresponding file.

A4: Group ORDER_DETAIL records by order ID and sum the amounts in each group.

A5: Group ORDER_PAYMENT records by order ID and sum the payment amounts in each group.

A6: Use join() function to join ORDERS table, and the summarized ORDER_DETAIL and ORDER_PAYMENT tables on order ID.

A7: Create a new table sequence made up of three fields – order ID, order amount and payment amount.

A8: Find from A7 the records where the payment amount is less than the order amount, i.e., the orders not full paid.

Generally, SPL uses two steps to handle the one-to-many association (between the primary table and the sub table). First, group each sub table by part of the primary key (which is the primary table’s primary key) to make they have same actual primary key with the primary table. Second, join tables on their primary keys (or actual primary keys).

OrderDetail.txt

OrderPayment.txt

Orders.txt

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc