Performance Optimization Exercises Using TPC-H – Q7
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'CHINA' and n2.n_name = 'RUSSIA')
or (n1.n_name = 'RUSSIA' and n2.n_name = 'CHINA')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
This is a grouping & aggregation operation on the filtered result set of multi-table association that contains a primary-sub table association between orders table and lineitem table.
Ⅱ SPL solution
The working principle is similar to that in Q5.
A |
|
1 |
=now() |
2 |
1995-01-01 |
3 |
1996-12-31 |
4 |
>name1="CHINA" |
5 |
>name2="RUSSIA" |
6 |
=file("nation.btx").import@b().keys@i(N_NATIONKEY) |
7 |
>n1=A6.select@1(N_NAME==name1).N_NATIONKEY |
8 |
>n2=A6.select@1(N_NAME==name2).N_NATIONKEY |
9 |
=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NATIONKEY;S_NATIONKEY==n1 || S_NATIONKEY==n2).fetch().keys@im(S_SUPPKEY) |
10 |
=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_NATIONKEY;C_NATIONKEY==n1 || C_NATIONKEY==n2).fetch().keys@im(C_CUSTKEY) |
11 |
=file("orders.ctx").open().cursor@m(O_ORDERKEY,O_CUSTKEY;O_CUSTKEY:A10) |
12 |
=file("lineitem.ctx").open().news(A11,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,O_CUSTKEY;L_SHIPDATE>=A2 && L_SHIPDATE <=A3,L_SUPPKEY:A9) |
13 |
=A12.select(O_CUSTKEY.C_NATIONKEY!=L_SUPPKEY.S_NATIONKEY) |
14 |
=A13.groups(L_SUPPKEY.S_NATIONKEY:supp_nation,O_CUSTKEY.C_NATIONKEY:cust_nation,year(L_SHIPDATE):l_year; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)): volume) |
15 |
=A14.run(supp_nation=A6.find(supp_nation).N_NAME,cust_nation=A6.find(cust_nation).N_NAME) |
16 |
=interval@ms(A1,now()) |
Note that A6’s nation table data is used respectively in A7 and A8 for performing filtering according to foreign key matching. This is different from SQL alias syntax.
Ⅲ Further optimization
1. Optimization method
In this example, we will use the date-integer conversion optimization method explained in Q1, where lineitem table’s L_SHIPDATE is already converted, and the dimension table primary key numberization –nation table’s N_NATIONKEY and N_REGIONKEY fields, supplier table’s S_SUPPKEY and S_NATIONKEY fields, customer table’s C_CUSTKEY field and C_NATIONKEY field, orders table’s O_CUSTKEY, and lineitem table’s L_SUPPKEY field are already converted.
2. Code for data conversion
Copy nation_5.btx, supplier_5.ctx, customer_5.ctx, orders_5.ctx and lineitem_5.ctx, and rename them nation_7.btx, supplier_7.ctx, customer_7.ctx, orders_7.ctx and lineitem_7.ctx.
3. Code after data conversion
The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:
A |
|
1 |
>env(nation, file("nation_7.btx").import@b()) |
2 |
>env(customer, file("customer_7.ctx").open().import()) |
3 |
>env(supplier, file("supplier_7.ctx").open().import()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=days@o(date("1995-01-01")) |
3 |
=days@o(date("1996-12-31")) |
4 |
[CHINA,RUSSIA] |
5 |
=nation.(A4.pos(N_NAME)) |
6 |
=supplier.@m(if(A5(S_NATIONKEY),S_NATIONKEY,null)) |
7 |
=customer.@m(if(A5(C_NATIONKEY),C_NATIONKEY,null)) |
8 |
=file("orders_7.ctx").open().cursor@m(O_ORDERKEY,O_CUSTKEY;O_CUSTKEY:A7:#) |
9 |
=file("lineitem_7.ctx").open().news(A8,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,O_CUSTKEY;L_SHIPDATE>=A2 && L_SHIPDATE <=A3,L_SUPPKEY:A6:#) |
10 |
=A9.select(O_CUSTKEY!=L_SUPPKEY).run(O_CUSTKEY=A5(O_CUSTKEY),L_SUPPKEY=A5(L_SUPPKEY)) |
11 |
=A10.groups@u(year(L_SHIPDATE)*9+L_SUPPKEY*3+O_CUSTKEY:gk; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)): volume) |
12 |
=A11.new(A4(gk%9\3):supp_nation,A4(gk%3):cust_nation,gk\9:l_year,volume).sort(supp_nation,cust_nation,l_year) |
13 |
=interval@ms(A1,now()) |
A10 replaces values of O_CUSTKEY and L_SUPPKEY with corresponding values of A5’s alignment sequence, which are either 1 or 2. A11 uses the grouping key technique explained in Q1 to convert the grouping operation by three fields into one by a single filed.
After date values are converted to integers, we can still use the year() function to get the year part, as A11 does.
Same optimization methods are used here as those in Q5.
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
1995-01-01 |
3 |
1996-12-31 |
4 |
>name1="CHINA" |
5 |
>name2="RUSSIA" |
6 |
=file("nation.btx").import@b().keys@i(N_NATIONKEY) |
7 |
>n1=A6.select@1(N_NAME==name1).N_NATIONKEY |
8 |
>n2=A6.select@1(N_NAME==name2).N_NATIONKEY |
9 |
=file("supplier.ctx").open().cursor@mv(S_SUPPKEY,S_NATIONKEY;S_NATIONKEY==n1 || S_NATIONKEY==n2).fetch().keys@im(S_SUPPKEY) |
10 |
=file("customer.ctx").open().cursor@mv(C_CUSTKEY,C_NATIONKEY;C_NATIONKEY==n1 || C_NATIONKEY==n2).fetch().keys@im(C_CUSTKEY) |
11 |
=file("orders.ctx").open().cursor@mv(O_ORDERKEY,O_CUSTKEY;O_CUSTKEY:A10) |
12 |
=file("lineitem.ctx").open().news(A11,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,O_CUSTKEY;L_SHIPDATE>=A2 && L_SHIPDATE<=A3,L_SUPPKEY:A9) |
13 |
=A12.select@v(O_CUSTKEY.C_NATIONKEY!=L_SUPPKEY.S_NATIONKEY) |
14 |
=A13.groups(L_SUPPKEY.S_NATIONKEY:supp_nation,O_CUSTKEY.C_NATIONKEY:cust_nation,year(L_SHIPDATE):l_year; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)): volume) |
15 |
=A14.new(A6.find(supp_nation).N_NAME:supp_nation,A6.find(cust_nation).N_NAME:cust_nation,l_year,volume) |
16 |
=interval@ms(A1,now()) |
2. Optimized data
The dimension tables and the value list of string fields need to be preloaded. Below is preloading code:
A |
|
1 |
>env(nation, file("nation_7.btx").import@bv()) |
2 |
>env(customer, file("customer_7.ctx").open().import@v()) |
3 |
>env(supplier, file("supplier_7.ctx").open().import@v()) |
Before performing the query, we need to first run the preloading code to load small dimension tables into memory.
Computing code:
A |
|
1 |
=now() |
2 |
=days@o(date("1995-01-01")) |
3 |
=days@o(date("1996-12-31")) |
4 |
[CHINA,RUSSIA] |
5 |
=nation.(A4.pos(N_NAME)) |
6 |
=supplier.(A5(S_NATIONKEY)) |
7 |
=customer.(A5(C_NATIONKEY)) |
8 |
=file("orders_7.ctx").open().cursor@mv(O_ORDERKEY,O_CUSTKEY;A7(O_CUSTKEY)) |
9 |
=file("lineitem_7.ctx").open().news(A8,L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_SHIPDATE,O_CUSTKEY;L_SHIPDATE>=A2 && L_SHIPDATE<=A3 && A6(L_SUPPKEY)) |
10 |
=A9.derive@o(A5(supplier(L_SUPPKEY).S_NATIONKEY):s,A5(customer(O_CUSTKEY).C_NATIONKEY):c).select@v(s!=c) |
11 |
=A10.groups@u(year(L_SHIPDATE)*9+c*3+s:gk; sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):volume) |
12 |
=A11.new(A4(gk%3):supp_nation,A4(gk%9\3):cust_nation,gk\9:l_year,volume).sort(supp_nation,cust_nation,l_year) |
13 |
=interval@ms(A1,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
16.0 |
10.1 |
After optimization |
11.0 |
6.6 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL