Performance Optimization Exercises Using TPC-H – Q8
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
o_year,
sum(case
when nation = 'CHINA' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'STANDARD POLISHED BRASS'
) all_nations
group by
o_year
order by
o_year;
This is a grouping & aggregation operation on the filtered result set of multi-table association that contains a primary-sub table association between orders table and lineitem table.
Ⅱ SPL solution
The working principle is similar to that in Q5.
A |
|
1 |
=now() |
2 |
1995-01-01 |
3 |
1996-12-31 |
4 |
>nation="CHINA" |
5 |
>name="ASIA" |
6 |
>type="STANDARD POLISHED BRASS" |
7 |
=file("region.btx").import@b().select@1(R_NAME==name).R_REGIONKEY |
8 |
=file("nation.btx").import@b() |
9 |
=A8.select@1(N_NAME==nation).N_NATIONKEY |
10 |
=A8.select(N_REGIONKEY==A7).derive@o().keys(N_NATIONKEY) |
11 |
=file("supplier.ctx").open().cursor@m(S_SUPPKEY;S_NATIONKEY==A9).fetch().keys@im(S_SUPPKEY) |
12 |
=file("part.ctx").open().cursor@m(P_PARTKEY;P_TYPE==type).fetch().keys@im(P_PARTKEY) |
13 |
=file("customer.ctx").open().cursor@m(C_CUSTKEY;A10.find(C_NATIONKEY)).fetch().keys@im(C_CUSTKEY) |
14 |
=file("orders.ctx").open().cursor@m(O_ORDERKEY,O_ORDERDATE;O_ORDERDATE>=A2 && O_ORDERDATE <=A3 && A13.find(O_CUSTKEY)) |
15 |
=file("lineitem.ctx").open().news(A14,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_ORDERDATE;A12.find(L_PARTKEY)) |
16 |
=A15.run(L_EXTENDEDPRICE*=(1-L_DISCOUNT)) |
17 |
=A16.groups(year(O_ORDERDATE):o_year;sum(if(A11.find(L_SUPPKEY),L_EXTENDEDPRICE,0)):s1,sum(L_EXTENDEDPRICE):s2) |
18 |
=A17.new(o_year,s1/s2:mkt_share) |
19 |
=interval@ms(A1,now()) |
Ⅲ Further optimization
1. Optimization method
In this example, we will use the date-integer conversion optimization method explained in Q1 – the orders table’s OR_ORDERDATE field is already converted in Q3, the string-integer conversion method explained in Q1 – the part table’s P_TYPE field is already converted in Q2, and the dimension table primary key numberization method explained in Q2 – region table’s R_REGIONKEY field, nation table’s N_NATIONKEY and N_REGIONKEY fields, part table’s P_PARTKEY field, supplier table’s S_SUPPKEY and S_NATIONKEY fields, customer table’s C_CUSTKEY and C_NATIONKEY fields, orders table’s O_CUSTKEY field, and lineitem table’s L_SUPPKEY field are already converted in the previous examples. Here we only need to convert lineitem table’s L_PARTKEY field.
2. Code for data conversion
2.1 Conversion on region table, nation table, part table, supplier table, customer table and orders table
Copy region_5.btx, nation_7.btx, part_2.ctx, supplier_7.ctx, customer_7.ctx and orders_7.ctx, and rename them region_8.btx, nation_8.btx, part_8.ctx, supplier_8.ctx, customer_8.ctx, and orders_8.ctx respectively.
2.2 Conversion on lineitem table
A |
|
1 |
=file("part.ctx").open().import(P_PARTKEY).keys@im(P_PARTKEY) |
2 |
=file("lineitem_7.ctx").open().cursor() |
3 |
=A2.run(L_PARTKEY=A1.pfind(L_PARTKEY)) |
4 |
=file("lineitem_8.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) |
5 |
>A4.append(A3) |
3. Code after data conversion
The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:
A |
|
1 |
>env(p_type,file("p_type.txt").import@si()) |
2 |
>env(region, file("region_8.btx").import@b()) |
3 |
>env(nation, file("nation_8.btx").import@b()) |
4 |
>env(supplier, file("supplier_8.ctx").open().import()) |
5 |
>env(customer, file("customer_8.ctx").open().import()) |
6 |
>env(part, file("part_8.ctx").open().import()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=days@o(date("1995-01-01")) |
3 |
=days@o(date("1996-12-31")) |
4 |
>nationName="CHINA" |
5 |
>name="ASIA" |
6 |
>type=p_type.pos@b("STANDARD POLISHED BRASS") |
7 |
=region.pselect(R_NAME==name) |
8 |
=nation.pselect(N_NAME==nationName) |
9 |
=nation.(N_REGIONKEY==A7) |
10 |
=supplier.@m(S_NATIONKEY==A8) |
11 |
=part.@m(P_TYPE==type) |
12 |
=customer.@m(A9(C_NATIONKEY)) |
13 |
=file("orders_8.ctx").open().cursor@m(O_ORDERKEY,O_ORDERDATE;O_ORDERDATE>=A2 && O_ORDERDATE <=A3 && A12(O_CUSTKEY)) |
14 |
=file("lineitem_8.ctx").open().news(A13,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,O_ORDERDATE;A11(L_PARTKEY)) |
15 |
=A14.run(L_EXTENDEDPRICE*=(1-L_DISCOUNT)) |
16 |
=A15.groups(year(O_ORDERDATE):o_year;sum(if(A10(L_SUPPKEY),L_EXTENDEDPRICE,0)):s1,sum(L_EXTENDEDPRICE):s2) |
17 |
=A16.new(o_year,s1/s2:mkt_share) |
18 |
=interval@ms(A1,now()) |
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
1995-01-01 |
3 |
1996-12-31 |
4 |
>nation="CHINA" |
5 |
>name="ASIA" |
6 |
>type="STANDARD POLISHED BRASS" |
7 |
=file("region.btx").import@b().select@1(R_NAME==name).R_REGIONKEY |
8 |
=file("nation.btx").import@b() |
9 |
=A8.select@1(N_NAME==nation).N_NATIONKEY |
10 |
=A8.select(N_REGIONKEY==A7).derive@o().keys(N_NATIONKEY) |
11 |
=file("supplier.ctx").open().cursor@mv(S_SUPPKEY;S_NATIONKEY==A9).fetch().keys@im(S_SUPPKEY) |
12 |
=file("part.ctx").open().cursor@mv(P_PARTKEY;P_TYPE==type).fetch().keys@im(P_PARTKEY) |
13 |
=file("customer.ctx").open().cursor@mv(C_CUSTKEY;A10.find(C_NATIONKEY)).fetch().keys@im(C_CUSTKEY) |
14 |
=file("orders.ctx").open().cursor@mv(O_ORDERKEY,O_ORDERDATE;O_ORDERDATE>=A2 && O_ORDERDATE <=A3 && A13.find(O_CUSTKEY)) |
15 |
=file("lineitem.ctx").open().cursor@v(L_ORDERKEY,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;A12.find(L_PARTKEY);A14) |
16 |
=A15.derive@o(L_EXTENDEDPRICE*(1-L_DISCOUNT):dp) |
17 |
=A16.joinx@im(L_ORDERKEY,A14:O_ORDERKEY,O_ORDERDATE) |
18 |
=A16.groups(year(O_ORDERDATE):o_year;sum(if(A11.find(L_SUPPKEY),dp,0)):s1,sum(dp):s2) |
19 |
=A18.new(o_year,s1/s2:mkt_share) |
20 |
=interval@ms(A1,now()) |
We do not use news()function to perform the primary-sub table association between orders table and lineitem table. Instead, joinx@im is used to perform an order-based merge join after cursors are synchronously segmented. Of course, the news() function is applicable, but joinx() function is slightly faster in our testing.
A16 uses derive@o to add a new column dp, so there is no need to compute dp again in A18.
2. Optimized data
The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:
A |
|
1 |
>env(p_type,file("p_type.txt").import@si()) |
2 |
>env(region, file("region_8.btx").import@bv()) |
3 |
>env(nation, file("nation_8.btx").import@bv()) |
4 |
>env(supplier, file("supplier_8.ctx").open().import@v()) |
5 |
>env(customer, file("customer_8.ctx").open().import@v()) |
6 |
>env(part, file("part_8.ctx").open().import@v()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=days@o(date("1995-01-01")) |
3 |
=days@o(date("1996-12-31")) |
4 |
>nationName="CHINA" |
5 |
>name="ASIA" |
6 |
>type=p_type.pos@b("STANDARD POLISHED BRASS") |
7 |
=region.pselect(R_NAME==name) |
8 |
=nation.pselect(N_NAME==nationName) |
9 |
=nation.(N_REGIONKEY==A7) |
10 |
=supplier.(S_NATIONKEY==A8) |
11 |
=part.(P_TYPE==type) |
12 |
=customer.(A9(C_NATIONKEY)) |
13 |
=file("orders_8.ctx").open().cursor@mv(O_ORDERKEY,O_ORDERDATE;O_ORDERDATE>=A2 && O_ORDERDATE<=A3 && A12(O_CUSTKEY)) |
14 |
=file("lineitem_8.ctx").open().cursor@mv(L_ORDERKEY,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;A11(L_PARTKEY);A13) |
15 |
=A14.joinx@im(L_ORDERKEY,A13:O_ORDERKEY,O_ORDERDATE) |
16 |
=A15.derive@o(L_EXTENDEDPRICE*(1-L_DISCOUNT):dp) |
17 |
=A16.groups(year(O_ORDERDATE):o_year;sum(if(A10(L_SUPPKEY),dp,0)):s1,sum(dp):s2) |
18 |
=A17.new(o_year,s1/s2:mkt_share) |
19 |
=interval@ms(A1,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
20.5 |
8.0 |
After optimization |
13.3 |
5.1 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL