Learn performance optimization skills from TPCH tests - Q21
I. Query Requirement
Q21 queries for suppliers who cannot ship required parts in a timely manner.
Q21 is characterized by four tables join operations with grouping, sorting, aggregation, EXISTS sub-query and NOT EXISTS sub-query. Query statements do not grammatically limit how many tuples are returned, but the TPC-H standard stipulates that only the first 100 rows of the query results are returned (usually depending on the application).
II. Oracle Execution
The query SQL written in Oracle is as follows:
select * from (
select /*+ parallel(n) */
s_name,
count(*) as numwait
from
supplier,lineitem l1,orders,nation
where
s_suppkey = l1.l_suppkey
and o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and exists (
select
*
from
lineitem l2
where
l2.l_orderkey = l1.l_orderkey
and l2.l_suppkey <> l1.l_suppkey
)
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
and s_nationkey = n_nationkey
and n_name = 'CHINA'
group by
s_name
order by
numwait desc,
s_name
) where rownum<=100;
Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
978 |
634 |
463 |
363 |
326 |
III. SPL Optimization
The main query of this problem is relatively simple, that is, the joins about the primary-sub table and the foreign key table introduced earlier. The main problem here is two sub-queries with EXISTS. After carefully analyzing these two sub-queries, it can be found that they are all operations on the LINEITEM records under the same l_orderkey.
We know that LINEITEM has been sorted by l_orderkey, and sub-table records can be regarded as the set fields of the primary table. If we orderly group the join result of ORDERS and LINEITEM according to orderkey (but do not aggregate), we can get small subsets of LINEITEM records with the same l_orderkey, and then calculate the two EXISTS conditions based on the small subsets, which will be simpler this way.
There is no explicit set data type in SQL, which makes it impossible to describe this operation process. While SPL provides this data type, and can implement this idea.
The SPL script is as follows:
A |
|
1 |
=now() |
2 |
>name="CHINA" |
3 |
=file("nation.btx").import@b().select(N_NAME==name).derive@o().keys@i(N_NATIONKEY) |
4 |
=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME;A3.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY) |
5 |
=file("orders.ctx").open().cursor@m(O_ORDERKEY;O_ORDERSTATUS=="F") |
6 |
=file("lineitem.ctx").open().news(A5,L_ORDERKEY,L_SUPPKEY,L_COMMITDATE,L_RECEIPTDATE) |
7 |
=A6.group(L_ORDERKEY) |
8 |
=A7.conj(if( (t=~.select(~.L_RECEIPTDATE>~.L_COMMITDATE)) && (s=t.m(1).L_SUPPKEY)&& !t.select@1(~.L_SUPPKEY!=s) && ~.select@1(~.L_SUPPKEY!=s),t,null) ) |
9 |
=A8.switch@i(L_SUPPKEY,A4) |
10 |
=A9.groups@u(L_SUPPKEY.S_NAME:s_name;count(1):numwait) |
11 |
=A10.top(100;[-numwait,s_name]) |
12 |
return interval@ms(A1,now()) |
A6 joins ORDERS and LINEITEM, and A7 orderly groups it into subsets of LINEITEM records with the same l_orderkey value. In A8, these small subsets are judged with EXISTS condition to filter out the unsatisfactory ones. The latter code joins other foreign key tables and performs ordinary grouping operations.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
978 |
634 |
463 |
363 |
326 |
SPL composite table |
259 |
163 |
83 |
40 |
27 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL