SPL Achieves Oracle Special Uses for MySQL

  Oracle provides some special syntax and functions, whose migration to MySQL is annoying. This article explains how to achieve the effect of Oracle special functions in esProc for MySQL as well as all the other database products.

1.Recursive statements

a)

select employee_id,first_name,last_name,manager_id
from hr.employees
start with employee_id=102
connect by prior employee_id = manager_id
A
1

=connect("orcl")

2 =A1.query@x("select employee_id, first_name, last_name, manager_id from hr.employees")
3 =A2.keys(EMPLOYEE_ID)
4 =A2.select@1(EMPLOYEE_ID==102)
5 =A2.switch(MANAGER_ID, A2)
6 =A2.nodes(MANAGER_ID, A4)
7 =(A4|A6).new(EMPLOYEE_ID, FIRST_NAME, LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID)

(1) A3 sets a key for A2’s table sequence;
(2) A4 gets the starting employee record;
(3) A5 replaces values of MANAGER_ID field in A2 with records for the purpose of performing recursion;
(4) A6 gets all child nodes of the starting employee record.
001png

b)

select employee_id, first_name,last_name,manager_id
from hr.employees
start with employee_id=104
connect by prior manager_id = employee_id
A
1 =connect("orcl")
2 =A1.query@x("select employee_id, first_name, last_name, manager_id from hr.employees")
3 =A2.keys(EMPLOYEE_ID)
4 =A2.switch(MANAGER_ID, A2)
5 =A2.select@1(EMPLOYEE_ID==104)
6 =A5.prior(MANAGER_ID)
7 =A6.new(EMPLOYEE_ID, FIRST_NAME, LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID)

(1) A6 gets all parent nodes of the starting employee record.
002png

c)

select employee_id,last_name,manager_id,sys_connect_by_path(last_name,'/') path from hr.employees
start with employee_id=102
connect by prior employee_id = manager_id
A
1 =connect("orcl")
2 =A1.query@x("select employee_id, last_name, manager_id,null path from hr.employees")
3 =A2.keys(EMPLOYEE_ID)
4 =A2.select@1(EMPLOYEE_ID==102)
5 =A2.switch(MANAGER_ID, A2)
6 =A2.nodes(MANAGER_ID, A4)
7 =A4|A6
8 =A7.run(PATH=if(EMPLOYEE_ID==102, "/"+LAST_NAME, MANAGER_ID.PATH+"/"+LAST_NAME))
9 =A7.new(EMPLOYEE_ID, LAST_NAME, MANAGER_ID.EMPLOYEE_ID:MANAGER_ID, PATH)

(1) Since the parent nodes of every record of A7 precede the current node, A8 is able to modify the PATH value for every record front to back.
003png

2.Nested aggregate functions

select avg(max(salary)) avg_max, avg(min(salary)) avg_min
from hr.employees
group by department_id
A
1 =connect("orcl")
2 =A1.query@x("select * from hr.employees")
3 =A2.groups(DEPARTMENT_ID;max(SALARY):m1, min(SALARY):m2)
4 =A3.group(;~.avg(m1):avg_max,~.avg(m2):avg_min)

(1) In A2, A1.query can be replaced by A1.cursor.
004png

3. Aggregation analysis functions – FIRST & LAST

SELECT department_id,
    MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) worst,
    MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) best
FROM hr.employees
GROUP BY department_id
ORDER BY department_id
A
1 =connect("orcl")
2 =A1.query@x("select * from hr.employees order by department_id,commission_pct")
3 =A2.group@o(DEPARTMENT_ID)
4 =A3.new(DEPARTMENT_ID,~.minp@a(ifn(COMMISSION_PCT,2)).min(SALARY):worst, ~.maxp@a(ifn(COMMISSION_PCT,2)).max(SALARY):best)
5 =A4.sort(ifn(DEPARTMENT_ID,power(2,32)))

(1) Since A2 is already sorted by DEPARTMENT_ID, A3 is able to group records uing group@o;
(2) Oracle FIRST/LAST_VALUE function gets the first/last group of sorted data. In Oracle, records with null values are put at the end, so the LAST_VALUE function gets the null values which are put in the last group. In esProc, the maxp/minp function removes null values when getting the maximum/minimum value. In A4, expression ifn(COMMISSION_PCT,2) is used to make sure the maximum will be retrieved by removing the null values;
(3) A5 uses power(2,32), which is greater than all DEPARTMENT_ID values, to ensure a row where DEPARTMENT_ID is null is put at the end.

  To manipulate a huge amount of data, we can use the cursor:

A
1 =connect("orcl")
2 =A1.cursor@x("select * from hr.employees")
3 =A2.groups(DEPARTMENT_ID; min([if(COMMISSION_PCT,2),SALARY]):m1, max([if(COMMISSION_PCT,2), SALARY]):m2)
4 =A3.new(DEPARTMENT_ID, m1(2):worst, m2(2):best)
5 =A4.sort(ifn(DEPARTMENT_ID,power(2,32)))

(1) In A3, min([if(COMMISSION_PCT,2), SALARY]) gets the lowest salary when COMMISSION_PCT value is the smallest, i.e. when COMMISSION_PCT value ranks the first. Same operation for getting the maximum value.

005png

4.ratio_to_report function

a)

SELECT last_name, salary, RATIO_TO_REPORT(salary) OVER () AS rr
FROM hr.employees
WHERE job_id = 'PU_CLERK'
ORDER BY last_name
A
1 =connect("orcl")
2 =A1.query@x("select last_name,salary from hr.employees where job_id='PU_CLERK' order by last_name")
3 =A2.sum(SALARY)
4 =A2.new(LAST_NAME, SALARY, SALARY/A3:RR)

006png

b)

SELECT department_id,last_name, salary, RATIO_TO_REPORT(salary) OVER (partition by department_id) AS rr
FROM hr.employees
WHERE department_id in (20,60)
ORDER BY department_id,last_name
A
1 =connect("orcl")
2 =A1.query@x("select department_id,last_name,salary from hr.employees where department_id in (20,60) order by department_id,last_name")
3 =A2.groups@o(DEPARTMENT_ID;sum(SALARY):sum)
4 =A2.switch(DEPARTMENT_ID, A3)
5 =A2.new(DEPARTMENT_ID.DEPARTMENT_ID:DEPARTMENT_ID, LAST_NAME, SALARY, SALARY/DEPARTMENT_ID.sum:RR)

(1) Since data is already sorted by DEPARTMENT_ID in A2, A3 is able to use groups@o to perform aggregation by group.
007png

5.Multilevel grouping

SELECT department_id, job_id, sum(salary) total
FROM hr.employees
WHERE department_id in (30, 50)
GROUP BY grouping sets( (department_id, job_id), department_id)
A
1 =connect("orcl")
2 =A1.query@x("select department_id, job_id, salary from hr.employees where department_Id in (30,50) order by department_id, job_id")
3 =A2.groups@o(DEPARTMENT_ID, JOB_ID; sum(SALARY):TOTAL)
4 =A3.group@o(DEPARTMENT_ID, null:JOB_ID; ~.sum(TOTAL):TOTAL)
5 =[A3,A4].merge(DEPARTMENT_ID, ifn(JOB_ID,fill("z", 10)))

(1) Because both A3 and A4 are ordered by DEPARTMENT_ID, they can be merged in A5. Expression ifn(JOB_ID,fill(“z”,10))) makes sure that records where JOB_ID is null are put at the end.

  Cursor can be used to manipulate a huge amount of data:

A
1 =connect("orcl")
2 =A1.cursor@x("select department_id,job_id,sum(salary) total from hr.employees where department_id in (30,50) group by department_id, job_id order by department_id,job_id")
3 =A2.group(DEPARTMENT_ID)
4 =A3.(~.insert(0, ~.groups@o(DEPARTMENT_ID, null:JOB_ID;sum(TOTAL):TOTAL)))
5 =A4.fetch()
6 =A5.conj()

(1) In A3, A2.group() function requires that A2 should be ordered by DEPARTMENT_ID;
(2) A4 calculates sum over each group and inserts the result at the end of the group.

  The channel function can also be used:

A
1 =connect("orcl")
2 =A1.cursor@x("select department_id, job_id, salary from hr.employees where department_Id in (30,50) order by department_id, job_id")
3 =channel().group@o(DEPARTMENT_ID, JOB_ID; sum(SALARY):TOTAL)
4 >A2.push(A3)
5 =channel().groups@o(DEPARTMENT_ID, null:JOB_ID; sum(TOTAL):TOTAL)
6 >A3.push(A5)
7 =A3.fetch()
8 for A2,1000
9 =A3.result()|A5.result()
10 =A9.sort(DEPARTMENT_ID)

(1) A3 creates a channel, with finding sum within a group operation attached;
(2) A4 is ready to push A2’s data into A3’s channel, but the operation is only executed as A2’s data is really fetched.
(3) A5 creates a channel, with finding sum within a group operation attached;
(4) A6 pushes A3’s result into A5’s channel. Or you can push A2’s data directly into A5’s channel, but this involves more operations and the execution time is longer;
(5) A7 fetches A3’s data;
(6) Retrieve data from A2 in batches. To reduce memory consumption, each batch only contains 1000 records;
(7) A10 sorts the union result of A3 and A5. Records where JOB_ID is null are put at the end as the whole algorithm is coherent and consistent.
008png