Performance Optimization Exercises Using TPC-H – Q11
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'CHINA'
group by
ps_partkey
having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.000001
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'CHINA'
)
order by
value desc;
If we regard the following subquery as a view V,
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = 'CHINA'
group by
ps_partkey
The body of the original query statement is equivalent to:
select
ps_partkey,
value
from V
where value>0.000001*(select sum(value) from V)
Here view V is a grouping result set that contains a relatively small number of records. Traversing V is much less computation-intensive than traversing partsupp table directly.
Ⅱ SPL solution
A |
|
1 |
=now() |
2 |
>name="CHINA" |
3 |
>percent=0.000001 |
4 |
=file("nation.btx").import@b().select@1(N_NAME== name).N_NATIONKEY |
5 |
=file("supplier.ctx").open().cursor@m(S_SUPPKEY;S_NATIONKEY==A4).fetch().keys@i(S_SUPPKEY) |
6 |
=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5.find(PS_SUPPKEY)) |
7 |
=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value) |
8 |
=A7.sum(value)*percent |
9 |
=A7.select(value>A8).sort@z(value) |
10 |
=interval@ms(A1,now()) |
The partsupp table is ordered by the composite primary key consisting of PS_PATKEY field and PS_SUPPKEY field. This is equivalent to the table being ordered by PS_PARTKEY, so we can use the order-based grouping strategy to perform grouping operation on this field in order to increase performance of computing view V. A7 uses groups@o to perform the order-based grouping, which amounts to computing view V. Then A8 and A9 traverse A7 twice to get the final result.
Ⅲ Further optimization
1. Optimization method
In this example, we will use the dimension table primary key numberization method explained in Q2 – supplier table’s S_SUPPKEY field and partsupp table’s PS_SUPPKEY field have been converted in the previous examples.
2. Code for data conversion
Copy nation_10.btx, supplier_9.ctx and partsupp_9.ctx, and rename them nation_11.btx, supplier_11.ctx and partsupp_11.ctx respectively.
3. Code after data conversion
First, we need to preload dimension tables. Below is preloading code:
A |
|
1 |
>env(nation, file("nation_11.btx").import@b()) |
2 |
>env(supplier, file("supplier_11.ctx").open().import()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
>name="CHINA" |
3 |
>percent=0.000001 |
4 |
=nation.select@1(N_NAME== name).N_NATIONKEY |
5 |
=supplier.@m(S_NATIONKEY==A4) |
6 |
=file("partsupp_11.ctx").open().cursor@m(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5(PS_SUPPKEY)) |
7 |
=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value) |
8 |
=A7.sum(value)*percent |
9 |
=A7.select(value>A8).sort@z(value) |
10 |
=interval@ms(A1,now()) |
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
>name="CHINA" |
3 |
>percent=0.000001 |
4 |
=file("nation.btx").import@b().select@1(N_NAME== name).N_NATIONKEY |
5 |
=file("supplier.ctx").open().cursor@mv(S_SUPPKEY;S_NATIONKEY==A4).fetch().keys@i(S_SUPPKEY) |
6 |
=file("partsupp.ctx").open().cursor@mv(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5.find(PS_SUPPKEY)) |
7 |
=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value) |
8 |
=A7.sum(value)*percent |
9 |
=A7.select@v(value>A8).sort@z(value) |
10 |
=interval@ms(A1,now()) |
2. Optimized data
First, we need to preload dimension tables. Below is preloading code:
A |
|
1 |
>env(nation, file("nation_11.btx").import@bv()) |
2 |
>env(supplier, file("supplier_11.ctx").open().import@v()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
>name="CHINA" |
3 |
>percent=0.000001 |
4 |
=nation.select@1(N_NAME== name).N_NATIONKEY |
5 |
=supplier.(S_NATIONKEY==A4) |
6 |
=file("partsupp_11.ctx").open().cursor@mv(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5(PS_SUPPKEY)) |
7 |
=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value) |
8 |
=A7.sum(value)*percent |
9 |
=A7.select@v(value>A8).sort@z(value) |
10 |
=interval@ms(A1,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
2.7 |
1.1 |
After optimization |
2.0 |
0.8 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL