# Comparison of SQL & SPL: Recursion Operation

An recursion operation is defined as the process of directly or indirectly calling a procedure repeatedly. The Tower of Hanoi puzzle is one typical example of this. This essay introduces the methods and basic principles of SQL and SPL, the two common programming languages, in handling recursion scenarios, and finds the efficient and fast solutions for you through sample programs. Looking Comparison of SQL & SPL: Recursion Operation for details.

## Ⅰ. Search for all references recursively

Example 1According to a company’s organizational structure below, get the hierarchical level of each department (headquarters are level 1, branches are level 2, and so on).

 ID ORG_NAME PARENT_ID 1 Head Office 0 2 Beijing Branch Office 1 3 Shanghai Branch Office 1 4 Chengdu Branch Office 1 5 Beijing R&D Center 2 … … …

SQL solution:

To get this done, we need to loop through each record to find all superiors for each organization recursively. According to the natural way of thinking, the expected recursive process is that, for each record of each organization, we can locate the parent record through its superior’s ID and then the grandparent record through the corresponding ID, and so on. SQL does not have the concepts of explicit record and reference, so we can only get records having the superior-subordinate relationship recursively and sort them in a specific order, as shown below:

SQL offers WITH statement to do the recursion query, as shown below:

WITH CTE1 (ID,ORG_NAME,PARENT_ID,O_ID,O_ORG_NAME) AS(

SELECT

ID,ORG_NAME,PARENT_ID,ID O_ID,ORG_NAME O_ORG_NAME

FROM ORGANIZATION

UNION ALL

SELECT

ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID,CTE1.O_ID,CTE1.O_ORG_NAME

FROM ORGANIZATION ORG

INNER JOIN CTE1

ON ORG.ID=CTE1.PARENT_ID

)

SELECT

O_ID ID,MIN(O_ORG_NAME) ORG_NAME,COUNT(*) "LEVEL"

FROM (

SELECT

ID,ORG_NAME,PARENT_ID,O_ID,O_ORG_NAME

FROM(

SELECT *

FROM CTE1

ORDER BY O_ID,ID

)

)

GROUP BY O_ID

ORDER BY ID

With Oracle, you can also use START WITH … CONNECT BY … PRIOR … to do the recursive query, as shown below:

SELECT

MAX(ID) ID, MAX(ORG_NAME) ORG_NAME, COUNT(*) "LEVEL"

FROM (

SELECT

ID,ORG_NAME,SUM(ID) OVER (ORDER BY RN) GROUP_ID

FROM (

SELECT

CASE WHEN LAG(PARENT_ID,1) OVER(ORDER BY RN ASC)=0

OR LAG(PARENT_ID,1) OVER(ORDER BY RN ASC) IS NULL THEN ID

ELSE NULL END ID,

CASE WHEN LAG(PARENT_ID,1) OVER(ORDER BY RN ASC)=0

OR LAG(PARENT_ID,1) OVER(ORDER BY RN ASC) IS NULL THEN ORG_NAME

ELSE NULL END ORG_NAME,

RN

FROM (

SELECT

ID,ORG_NAME,PARENT_ID,ROWNUM RN

FROM ORGANIZATION ORG

CONNECT BY ORG.ID=PRIOR ORG.PARENT_ID

)

)

)

GROUP BY GROUP_ID

ORDER BY GROUP_ID

The START WITH …  statement does not generate simpler code. Though the program returns records of all superiors in turn for every organization to the result set, we need to number records in each group and compare each record with its neighboring one to get the initial position of a group because a SQL set is unordered. As SQL GROUP BY only supports the equi-grouping, we need to add group marks for a further grouping. It would be much simpler if the language supported conditional grouping (according to PARENT_ID=0, for instance) as SPL does.

With databases that neither support WITH nor START WITH …, SQL is unable to achieve recursive queries directly but it has to use the stored procedure to write an recursion function (The details will be skipped here).

SPL solution:

SPL provides A.prior(F) function to perform a recursive query to find all references by default.

 A 1 =T("Organization.txt") 2 >A1.switch(PARENT_ID,A1:ID) 3 =A1.new(ID,ORG_NAME,~.prior(PARENT_ID).len():LEVEL)

A1: Import Organization table from the source file.

A2: Objectify the ID foreign key of the superior organization to convert it into the corresponding record and achieve self-join.

A3: Create a new table made up of ID, organization name and level. A.prior() function calculates the department level by getting the number of levels of records that a record references.

The SPL script of handling an recursion problem is concise because SPL’s recursive logic is natural. A.prior() function returns a set of all records of the parent organization for the current organization. Then it’s easy to perform count or any other operation, even a complicated one.

SPL supports retrieving a data table from the database, too. We can change A1 in the above script as follows:

 A 1 =connect("db").query("SELECT * FROM ORGANIZATION")

## Ⅱ. Search for all references recursively until a specific value appears

Example 2According to a company’s organizational structure below, get Beijing Branch Office’s subordinate organizations and its superiors’ names (separate multilevel organizations by comma).

 ID ORG_NAME PARENT_ID 1 Head Office 0 2 Beijing Branch Office 1 3 Shanghai Branch Office 1 4 Chengdu Branch Office 1 5 Beijing R&D Center 2 … … …

SQL solution:

After the analysis of example 1, the first idea for doing this is that, during the recursive process for searching for the superiors for each organization, stop the recursion if the specified value (the Beijing Branch Office) is found, keep its records and filter away records of the organizations that cannot be found. It is difficult for SQL to hit the target with one round of recursion. So we divide the task into two steps. First, find all subordinate organizations of Beijing Branch Office; second, as the solution to the above problem does, find all superior organizations for each of those organizations until Beijing Branch Office appears. Below is the SQL queries:

WITH CTE1 (ID,ORG_NAME,PARENT_ID) AS(

SELECT

ID,ORG_NAME,PARENT_ID

FROM ORGANIZATION

WHERE ORG_NAME='Beijing Branch Office'

UNION ALL

SELECT

ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID

FROM ORGANIZATION ORG

INNER JOIN CTE1

ON ORG.PARENT_ID=CTE1.ID

)

,CTE2 (ID,ORG_NAME,PARENT_ID,O_ID,GROUP_NUM) AS(

SELECT

ID,ORG_NAME,PARENT_ID,ID O_ID,1 GROUP_NUM

FROM CTE1

UNION ALL

SELECT ORG.ID,ORG.ORG_NAME,ORG.PARENT_ID,

CTE2.O_ID,CTE2.GROUP_NUM+1 GROUP_NUM

FROM ORGANIZATION ORG

INNER JOIN CTE2

ON ORG.ID=CTE2.PARENT_ID AND

CTE2.ORG_NAME<>'Beijing Branch Office'

)

SELECT

MAX(O_ID) ID, MAX(O_ORG_NAME) ORG_NAME,

MAX(PARENT_NAME) PARENT_NAME

FROM(

SELECT

O_ID, O_ORG_NAME,

WM_CONCAT(ORG_NAME) OVER

(PARTITION BY O_ID ORDER BY O_ID,GROUP_NUM) PARENT_NAME

FROM(

SELECT

ID,PARENT_ID,O_ID,GROUP_NUM,

CASE WHEN GROUP_NUM=1 THEN NULL ELSE ORG_NAME END ORG_NAME,

CASE WHEN GROUP_NUM=1 THEN ORG_NAME ELSE NULL END O_ORG_NAME

FROM (

SELECT

ID,ORG_NAME,PARENT_ID,O_ID,GROUP_NUM,ROWNUM RN

FROM CTE2

ORDER BY O_ID,GROUP_NUM

)

)

)

GROUP BY O_ID

ORDER BY O_ID

With Oracle, you can also use START WITH … CONNECT BY … PRIOR … to do the recursive query, as shown below:

WITH CTE1 AS (

SELECT ID,ORG_NAME,PARENT_ID,ROWNUM RN

FROM ORGANIZATION ORG

CONNECT BY ORG.PARENT_ID=PRIOR ORG.ID

)

,CTE2 AS (

SELECT

ID,ORG_NAME,PARENT_ID,RN,

CASE WHEN LAG(ORG_NAME,1) OVER(ORDER BY RN ASC)= 'Beijing Branch Office' OR

LAG(ORG_NAME,1) OVER(ORDER BY RN ASC) IS NULL THEN 1 ELSE 0 END FLAG

FROM(

SELECT ID,ORG_NAME,PARENT_ID,ROWNUM RN

FROM CTE1

CONNECT BY CTE1.ID=PRIOR CTE1.PARENT_ID

)

)

SELECT

MAX(ID) ID, MAX(O_ORG_NAME) ORG_NAME,

MAX(PARENT_NAME) PARENT_NAME

FROM(

SELECT

ID,O_ORG_NAME,GROUP_ID,

WM_CONCAT(ORG_NAME) OVER

(PARTITION BY GROUP_ID ORDER BY RN) PARENT_NAME

FROM(

SELECT

ID, ORG_NAME, O_ORG_NAME,RN,

SUM(ID) OVER (ORDER BY RN) GROUP_ID

FROM(

SELECT

PARENT_ID,RN,

CASE WHEN FLAG=1 THEN NULL ELSE ORG_NAME END ORG_NAME,

CASE WHEN FLAG=1 THEN ID ELSE NULL END ID,

CASE WHEN FLAG=1 THEN ORG_NAME ELSE NULL END O_ORG_NAME

FROM CTE2

)

)

)

GROUP BY GROUP_ID

ORDER BY GROUP_ID

SPL solution:

SPL offers A.prior(F,r) function to do the recursive query to find the refences until a specific value appears:

 A 1 =T("Organization.txt") 2 >A1.switch(PARENT_ID,A1:ID) 3 =A1.select@1(ORG_NAME=="Beijing Branch Office") 4 =A1.new(ID,ORG_NAME,~.prior(PARENT_ID,A3)   :PARENT_NAME) 5 =A4.select(PARENT_NAME!=null) 6 =A5.run(PARENT_NAME=PARENT_NAME.(PARENT_ID.ORG_NAME).concat@c())

A1: Import the Organization table.

A2: Objectify the ID foreign key of the superior organization to convert it into the corresponding record and achieve foreign key objectification.

A3: Get records of Beijing Branch Office.

A4: Create a new table made up of ID, organization name and the set of records of all superior organizations.

A5: Get records that has at least one superior organization, that is, those of Beijing Branch Office.

A6: Join up names of superior organizations into a comma-separated string in a circular way.

The SPL solution is logically clear and concise. It uses records of Beijing Branch Office as parameters to do the recursive search of references.

## Ⅲ. Search for leaf records recursively

Example 3According to the following Chinese administrative divisions table, find regions and counties under Hebei province. Below is part of the source table:

 ID NAME PARENT_ID 1 China 0 11 Beijing 1 12 Tianjin 1 13 Hebei 1 … … … 1301 Shijiazhuang 13 1302 Tangshan 13 … … …

SQL solution:

First, we find all records of Hebei province and then find records that are not superior regions of other regions. Below are the SQL queries:

WITH CTE1 (ID,NAME,PARENT_ID,PARENT_NAME) AS(

SELECT

ID,NAME,PARENT_ID,NULL PARENT_NAME

FROM CHINA_REGION

WHERE NAME='Hebei'

UNION ALL

SELECT

T.ID,T.NAME,T.PARENT_ID,CTE1.NAME PARENT_NAME

FROM CHINA_REGION T

INNER JOIN CTE1

ON T.PARENT_ID=CTE1.ID

)

SELECT ID,NAME,PARENT_NAME

FROM CTE1 T1

WHERE NOT EXISTS(

SELECT 1

FROM CTE1 T2

WHERE T1.ID=T2.PARENT_ID

)

ORDER BY ID

SPL solution:

SPL has P.nodes@d(F,r) function to find all leaves recursively:

 A 1 =T("ChinaRegion.csv") 2 >A1.switch(PARENT_ID,A1:ID) 3 =A1.select@1(NAME=="Hebei") 4 =A1.nodes@d(PARENT_ID,A3) 5 =A4.new(ID,NAME,PARENT_ID.NAME:PARENT_NAME)

A1: Import ChinaRegion table.

A2: Objectify the ID foreign key of the superior region to convert it into the corresponding record and achieve foreign key objectification.

A3: Get records of Hebei province.

A4: Use P.nodes() function to perform the recursive search; @d option is used to recursively find all leaves.

A5: Create a new table made up of ID, region names and names of superior regions.

## IV. Traverse all files under a directory

Example 4According to the following terminal device use for online teaching in a primary school, find the proportion of each type of terminal. Below is the directory structure:

Files under the directory is of Excel format. Below is part of the source data:

 ID STUDENT_NAME TERMINAL 1 Rebecca Moore Phone 2 Ashley Wilson Phone,PC,Pad 3 Rachel Johnson Phone,PC,Pad 4 Emily Smith PC,Pad 5 Ashley Smith PC 6 Matthew Johnson Phone 7 Alexis Smith Phone,PC 8 Megan Wilson Phone,PC,Pad … … …

Since SQL cannot retrieve a directory and the files under it, a SQL solution is thus impossible. Let’s look at how SPL handles the recursive traversal of directories and files.

SPL solution:

SPL offers directory@s() function to search for names of all files under all subdirectories in an recursive way:

 A 1 =directory@ps("C:/Primary School") 2 >A1.run(t=T(~),@+=t.len(),~=t.conj(TERMINAL.split@c())) 3 =A1.conj().groups(~:TERMINAL;count(~)/A2:PERCENTAGE)

A1Use directory() function to import the file directory. @s option is used to get file names recursively under all subdirectories.

A2Read in files circularly, split each string of terminals by comma, and concatenate all unique terminals.

A3Group records by terminal and calculate the proportion for each.

## V. Concatenate field values in a JSON file recursively

Example 5Below is the JSON data of worldwide new COVID-19 confirmed cases in a certain time point. The task is to get the total confirmed cases across the world. Below is part of the source data:

[

{Region:"USA",Confirmed:[

{Region:"California",Confirmed:3671902},

{Region:"New York",Confirmed:1139740},

{Region:"Virginia",Confirmed:620801},

{Region:"Florida",Confirmed:2064525},

…

]},

{Region:"Brazil",Confirmed:[…]},

{Region:"India",Confirmed: […]},

{Region:"France",Confirmed: […]}

…

]

As SQL cannot retrieve a JSON file, we will skip the SQL solution. Below is the SPL way of handling the recursive traversal of a JSON file.

SPL solution:

SPL offers A.field@r()function to get field values recursively, and A.conj@r() function to concatenate members of a sequence recursively. Below is the SPL script:

 A 1 =json(file("COVID-19.json").read()) 2 =A1.field@r("Confirmed") 3 =A2.conj@r().sum()

A1: Read in the JSON file.

A2: GET ALL Confirmed field values recursively.

A3: Concatenate all numbers of confirmed cases and sum them.

## Summary

SQL has its method of achieving recursion problems, but the method not convenient to use. This is because SQL does not have the concepts of explicit records and reference, and row numbers should be used in the result of an recursive operation to mark the superior-subordinate relationship. Besides, SQL is based on unordered sets, which makes it troublesome to perform subsequent operations after recursions.

SPL supports the data structure of different data types and offers the concept of explicit records. More importantly, SPL’s logic in handle recursion problems is clear. It uses an recursion function after the self-join and then a set operation to finish the job. SPL supplies flexible and adaptable recursion features to handle common recursion scenarios, such as directory recursion and file content recursion.

With a complex query, the complexity of SQL solution increases sharply by resorting to temporary table and nested queries. This makes it hard to write and maintain the SQL statements. SPL, however, organizes an algorithm in a natural way of thinking and generates concise code.

The SPL-driven and ordered-set-based esProc is the professional data computation engine that provides a complete set of set-oriented operations. The tool combines merits of both Java and SQL and makes it easy to handle recursion scenarios.