Comparison of SQL & SPL: Join Operations (Ⅲ)
【Abstract】
The join operation is used to combine records of two or more tables. This essay explains the handling of join operations using SQL and SPL. By exploring solutions and basic principles of the two commonly used programming languages and providing sample programs written in them, we try to find the faster and more efficient way for you.
In this essay, we focus on the handling of table joins with manytomany relationship.
The manytomany relationship exists between tables where one record in table A corresponds to any number of records in table B and vice versa.
We use cross join to handle this type of relationship. One record of table A will join up with each record of table B to generate new records, so the number of rows in the returned result set is the product of numbers of rows in the two tables. SQL CROSS JOIN is used to do the cross join. SPL has xjoin() function to do the job. In realworld situations, the manytomany relationship is rare. Now let’s look at how to handle such a scenario through matrix multiplication.
【Example 1】Calculate the product of two matrices. The two matrix tables are stored respectively in MATRIXA and MATRIXB:
MATRIXA:
ROW 
COL 
VALUE 
1 
1 
1 
1 
2 
2 
1 
3 
3 
2 
1 
4 
2 
2 
5 
2 
3 
6 
MATRIXB:
ROW 
COL 
VALUE 
1 
1 
1 
1 
2 
4 
2 
1 
2 
2 
2 
5 
3 
1 
3 
3 
2 
6 
The mathematical formula for solving this task is as follows:
SQL solution:
The matrix multiplication involves manytomany relationship, which is handled in SQL through the cross join. First, we cross product the two matrix tables to select records where COL value in table A is equivalent to ROW value in table B. Second, sum values with same COL and ROW values. Below are SQL statements:
SELECT
"ROW",COL,SUM(VALUE1*VALUE2) VALUE
FROM (
SELECT
A."ROW",A.VALUE VALUE1,B.COL,B.VALUE VALUE2
FROM MATRIXA A
CROSS JOIN
MATRIXB B
WHERE A.COL=B."ROW"
)
GROUP BY "ROW",COL
ORDER BY "ROW",COL
SPL solution:
SPL offers xjoin() function to perform cross join.
A 

1 
=T("MatrixA.csv") 
2 
=T("MatrixB.csv") 
3 
=xjoin(A1:A; A2:B, A.COL==A2.ROW) 
4 
=A3.groups(A.ROW, B.COL; sum(A.VALUE * B.VALUE):VALUE) 
A1: Import MatrixA table.
A2: Import MatrixB table.
A3: The xjoin() function performs cross join, as well as the conditional filtering at the same time.
A4: Group rows by the condition that COL value is equivalent to ROW value and sum products of values in each group.
A nonequijoin is one where the joining condition is not equivalence comparison. Examples of such joins include finding whether an age is included in a specific age group and whether the income belongs to a certain income level.
【Example 2】Find which age group a person falls in based on COMMUNITY table and AGE_GROUP. Below is part of the source data:
COMMUNITY:
ID 
NAME 
AGE 
1 
David 
28 
2 
Daniel 
15 
3 
Andrew 
65 
4 
Rudy 

… 
… 
… 
AGE_GROUP:
GROUP_NAME 
START 
END 
Children 
0 
15 
Youth 
16 
40 
Middle 
41 
60 
Old 
61 
100 
SQL solution:
There are no age record for some community residents, so we use LEFT JOIN to handle this task:
SELECT
ID,NAME,AGE,GROUP_NAME
FROM COMMUNITY C
LEFT JOIN
AGE_GROUP A
ON A."START"<=C.AGE AND A."END">=C.AGE
ORDER BY ID
SPL solution:
SPL offers xjoin() function to perform cross join. The function works with @1 option to enable a left join.
A 

1 
=T("Community.txt") 
2 
=T("AgeGroup.txt") 
3 
=xjoin@1(A1:C; A2:A, A2.START<=C.AGE && A2.END>=C.AGE) 
4 
=A3.new(C.ID, C.NAME, C.AGE,A.GROUP_NAME) 
A1: Import Community table.
A2: Import AgeGroup table.
A3: The xjoin@1() function left joins the two tables according to the first table and get records corresponding to each group. When a record does not match any age group, just keep it and set its corresponding age group as null.
A4: Return ages groups for all community residents.
Summary
SQL defines the join operation simply as performing Cartesian product on two sets (tables) and then filtering by a specific condition. The definition covers not only the commonly seen equijoins, but the equijoins with manytomany relationship and even nonequijoins. An experienced programmer knows that most joins in realworld situations are equijoins. The oversimplified SQL definition does not reflect the features of equijoins, and thus the language cannot make the most use of those features to facilitate code writing and task implementation. When a query is too complicated (such as when the number of involved tables are many and when there are nested queries), it is hard to write and optimize code with SQL.
SPL makes use of those features to distinguish equijoins and nonequijoins and to offer separate solutions for equijoins with three different types of associative relationships. The language has join()function to handle onetoone relationship (homodimension tables) and onetomany relationship (primary and sub tables) and provides A.switch()function to deal with manytoone relationship (foreign key table).
Besides, SQL’s use of temporary table and nested query makes it hard to write and maintain a SQL query. SPL, however, can compose concise code step by step according to the natural way of thinking.
The SPLdriven esProc is the professional data computation engine. It is orderedsetbased and offers a complete set of grouping functions, which combines advantages of both Java and SQL. A join operation will thus become simple and easy with SPL.
Chinese version