Learn performance optimization skills from TPCH tests - Q22
I. Query Requirement
Q22 queries to identity the geographies where there are customers likely to make a purchase. This query counts the number of consumers in specified countries who are more positive than the average but have not placed orders for seven years. It also reflects the attitude of ordinary consumers, that is, purchasing intention.
Q22 is characterized by two tables join operations with grouping, sorting, aggregation, sub-query and NOT EXISTS sub-query.
II. Oracle Execution
The query SQL written in Oracle is as follows:
select /*+ parallel(n) */
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substr(c_phone,1,2) as cntrycode,
c_acctbal
from
customer
where
substr(c_phone,1,2) in
('11', '14', '15', '19', '20', '21', '23')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substr(c_phone,1,2) in
('11', '14', '15', '19', '20', '21', '23')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) custsale
group by
cntrycode
order by
cntrycode;
Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
128 |
86 |
70 |
53 |
48 |
III. SPL Optimization
The outer layer of this query is a conventional grouping aggregation, whose main complexity lies in the two sub-queries used for conditions in the inner layer. These two sub-queries can correspond to some CUSTOMER records that satisfy the conditions, while the latter sub-query is modified with NOT EXIST, so the final result will be the difference set of the two set of CUSTOMER records corresponding to the two sub-queries.
The SPL script is as follows:
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch() |
4 |
=A3.avg(C_ACCTBAL) |
5 |
=A3.select(C_ACCTBAL>A4).derive@o().keys@i(C_CUSTKEY) |
6 |
=file("orders.ctx").open().cursor@m(O_CUSTKEY;A5.find(O_CUSTKEY)) |
7 |
=A6.groups(O_CUSTKEY:C_CUSTKEY) |
8 |
=[A5,A7].merge@d(C_CUSTKEY) |
9 |
=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
10 |
return interval@ms(A1,now()) |
The results of grouping operation in SPL are ordered, that is, A7 is ordered to C_CUSTKEY, while A5 itself comes from CUSTOMER table, which is also ordered to C_CUSTKEY, thus two ordered sets can use merge algorithm to calculate the difference set at high speed (in A8).
Script execution time, Unit: seconds
Number of parallel |
1 |
2 |
4 |
8 |
12 |
Oracle |
128 |
86 |
70 |
53 |
48 |
SPL composite table |
102 |
53 |
29 |
24 |
19 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL