Performance Optimization Exercises Using TPC-H – Q22
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from (
select
substr(c_phone,1,2) as cntrycode,
c_acctbal
from
customer
where
substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) custsale
group by
cntrycode
order by
cntrycode;
The main query’s outer layer is a regular grouping and aggregation operation, and its inner layer contains two conditional subqueries.
Ⅱ SPL solution
The two subqueries, where the second one is preceded by not exists, correspond to certain customer records meeting their respective condition. The final result should be the difference set of customer record sets corresponding to the two subqueries.
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch() |
4 |
=A3.avg(C_ACCTBAL) |
5 |
=A3.select@m(C_ACCTBAL>A4).derive@o().keys@im(C_CUSTKEY) |
6 |
=file("orders.ctx").open().cursor@m(O_CUSTKEY;O_CUSTKEY:A5) |
7 |
=A6.run(O_CUSTKEY.C_CUSTKEY=null).skip() |
8 |
=A5.select@m(C_CUSTKEY) |
9 |
=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
10 |
=interval@ms(A1,now()) |
A5 selects records meeting condition specified in the first subquery. A6 associates with A5 and selects target records for the second subquery. A7 resets C_CUSTKEY values of all these selected records as null, which means deleting them from A5, in order to achieve the not exists condition on the second subquery. A8 selects records satisfying the two conditional subqueries.
Ⅲ Further optimization
1. Optimization method
In this example, we will use dimension table primary key numberization method explained in Q2 – customer table’s C_CUSTKEY field and orders table’s O_CUSTKEY field have been converted in the previous examples.
2. Code for data conversion
Copy customer_18.ctx and orders_21.ctx, and rename them customer_22.ctx and orders_22.ctx respectively.
3. Code after data conversion
First, we need to preload the dimension table. Below is preloading code:
A |
|
1 |
>env(customer, file("customer_22.ctx").open().import()) |
Before performing the query, we need to first run the preloading code to load the small dimension table into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=customer.@m(if(C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2)),C_ACCTBAL,null)) |
4 |
=A3.avg() |
5 |
=A3.@m(~>A4) |
6 |
=file("orders_22.ctx").open().cursor@m(O_CUSTKEY;A5(O_CUSTKEY)).run(A5(O_CUSTKEY)=false).skip() |
7 |
=customer(A5.pselect@am(~)).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
8 |
=interval@ms(A1,now()) |
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=file("customer.ctx").open().cursor@mv(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch() |
4 |
=A3.avg(C_ACCTBAL) |
5 |
=A3.select@mv(C_ACCTBAL>A4).derive@o().keys@im(C_CUSTKEY) |
6 |
=file("orders.ctx").open().cursor@mv(O_CUSTKEY;O_CUSTKEY:A5) |
7 |
=A6.run(O_CUSTKEY.C_CUSTKEY=null).skip() |
8 |
=A5.select@mv(C_CUSTKEY) |
9 |
=A8.groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
10 |
=interval@ms(A1,now()) |
2. Optimized data
First, we need to preload the dimension table. Below is preloading code:
A |
|
1 |
>env(customer, file("customer_22.ctx").open().import@v()) |
Before performing the query, we need to first run the preloading code to load the small dimension table into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=["11","14","15","19","20","21","23"] |
3 |
=customer.@m(if(C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2)),C_ACCTBAL,null)) |
4 |
=A3.avg() |
5 |
=A3.@m(~>A4) |
6 |
=file("orders_22.ctx").open().cursor@mv(O_CUSTKEY;A5(O_CUSTKEY)).run(A5(O_CUSTKEY)=false).skip() |
7 |
=customer(A5.pselect@am(~)).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
8 |
=interval@ms(A1,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
7.5 |
4.2 |
After optimization |
3.9 |
2.2 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL