Performance optimization case course: TPCH-Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from (
select
substr(c_phone,1,2) as cntrycode,
c_acctbal
from
customer
where
substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) custsale
group by
cntrycode
order by
cntrycode;
The main query’s outer layer is a regular grouping and aggregation operation, and its inner layer contains two conditional subqueries.
The two subqueries, where the second one is preceded by not exists, correspond to the customer record sets that meet their respective condition. The final result is the difference set of the two sets.
1. Data storage
There is no special requirement for data tables, store them in order by primary key.
We can directly use orders.ctx and customer.ctx from Q3. Copy them to the main directory of this query.
2. General method
Calculation code:
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch() |
4 |
=A3.avg(C_ACCTBAL) |
5 |
=A3.select@m(C_ACCTBAL>A4).derive@o().keys@im(C_CUSTKEY) |
6 |
=file("orders.ctx").open().cursor@m(O_CUSTKEY;O_CUSTKEY:A5) |
7 |
=A6.run(O_CUSTKEY.C_CUSTKEY=null).skip() |
8 |
=A5.select@m(C_CUSTKEY) |
9 |
=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
10 |
=interval@ms(A1,now()) |
A5 is equivalent to selecting the records meeting condition specified in the first subquery. A6 associates with A5 and selects the records for the second subquery. A7 sets C_CUSTKEY values of these records selected in A6 as null, which means deleting them from A5 and is equivalent to achieving the not exists condition in the second subquery. A8 selects records satisfying the two conditional subqueries.
Test result:
Test items |
Execution time (seconds) |
General method |
6 |
3. Data conversion
For this query, we need to use the dimension table primary key sequence-numberization method mentioned in previous articles. Copy orders_5.ctx and customer_5.ctx in Q3 to the main directory of this query.
Calculation code:
A |
B |
|
1 |
=now() |
|
2 |
=["11","14","15","19","20","21","23"] |
|
3 |
=file("customer_5.ctx").open() |
=A3.cursor@m().skip().(null) |
4 |
=A3.cursor@m(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~) |
|
5 |
=B3.avg(C_ACCTBAL) |
|
6 |
=B3.@m(if(C_ACCTBAL>A5,~,null)) |
|
7 |
=file("orders_5.ctx").open().cursor@m(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip() |
|
8 |
=A6.select@m(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
|
9 |
=interval@ms(A1,now()) |
B3 and A6 are aligned sequence, and their values are null or records of customer table.
Test result:
Test items |
Execution time (seconds) |
General method |
6 |
Data conversion |
2 |
4. Column-wise computing
Calculation code:
A |
B |
|
1 |
=now() |
|
2 |
=["11","14","15","19","20","21","23"] |
|
3 |
=file("customer_5.ctx").open() |
=A3.cursor@m().skip().(null) |
4 |
=A3.cursor@mv(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~) |
|
5 |
=B3.avg(C_ACCTBAL) |
|
6 |
=B3.@m(if(C_ACCTBAL>A5,~,null)) |
|
7 |
=file("orders_5.ctx").open().cursor@mv(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip() |
|
8 |
=A6.select(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
|
9 |
=interval@ms(A1,now()) |
Test result:
Test items |
Execution time (seconds) |
General method |
6 |
Data conversion |
2 |
Column-wise computing |
1 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version