Performance Optimization Exercises Using TPC-H – Q15
Ⅰ SQL code and analysis
Below is the SQL query statement:
create view revenue (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date '1995-04-01'
and l_shipdate < date '1995-04-01' + interval '3' month
group by
l_suppkey;
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue
)
order by
s_suppkey;
drop view revenue;
In the code, we first generate view revenue, then associate it with supplier table, and select the record containing the maximum value.
Ⅱ SPL solution
The SPL query has two phases. The first phase generates view revenue, and the second phase gets the record where total_revenue is the biggest from the view. The former is a regular grouping & aggregation operation, where parallel processing is used to increase performance. SPL offers A.maxp method that can directly return the record holding the maximum value.
A |
|
1 |
=now() |
2 |
1995-4-1 |
3 |
=elapse@m(A2,3) |
4 |
=file("lineitem.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<A3) |
5 |
=A4.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue) |
6 |
=A5.maxp@a(total_revenue) |
7 |
=file("supplier.ctx").open() |
8 |
=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE) |
9 |
=interval@ms(A1,now()) |
A5 gets view revenue. A6 uses maxp@a to return the record having the biggest total_revenue through one traversal. After the target records are obtained, we search supplier table for the other fields. This helps reduce computations.
Ⅲ Further optimization
1. Optimization method
In this example, we will use the date-integer conversion optimization method explained in Q1, where lineitem table’s L_SHIPDATE has been converted, and the dimension table primary key numberization method – supplier table’s S_SUPPKEY field and lineitem table’s L_SUPPKEY field have been converted in the previous examples.
2. Code for data conversion
Copy supplier_11.ctx and lineitem_14.ctx, and rename them supplier_15.ctx and lineitem_15.ctx respectively.
3. Code after data conversion
First, we need to preload the dimension table. Below is preloading code:
A |
|
1 |
>env(supplier, file("supplier_15.ctx").open().import()) |
Before performing the query, we need to first run the preloading code to load the small dimension table into memory.
Computing code:
A |
|
1 |
=now() |
2 |
1995-4-1 |
3 |
=days@o(elapse@m(A2,3)) |
4 |
=days@o(A2) |
5 |
=file("lineitem_15.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE< A3) |
6 |
=A5.groups@n(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue) |
7 |
=A6.maxp@a(total_revenue) |
8 |
=A7.new(supplier_no,total_revenue,(s=supplier(supplier_no)).S_NAME,s.S_ADDRESS,s.S_PHONE) |
9 |
=interval@ms(A1,now()) |
A6 uses groups@n to perform grouping operation, with same working principle used in Q13. A7 uses maxp function to get the record containing the maximum value.
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
1995-4-1 |
3 |
=elapse@m(A2,3) |
4 |
=file("lineitem.ctx").open().cursor@mv(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<A3) |
5 |
=A4.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue) |
6 |
=A5.maxp@a(total_revenue) |
7 |
=file("supplier.ctx").open() |
8 |
=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE) |
9 |
=interval@ms(A1,now()) |
2. Optimized data
First, we need to preload the dimension table. Below is preloading code:
A |
|
1 |
>env(supplier, file("supplier_15.ctx").open().import@v()) |
Before performing the query, we need to first run the preloading code to load the small dimension table into memory.
Computing code:
A |
|
1 |
=now() |
2 |
1995-4-1 |
3 |
=days@o(elapse@m(A2,3)) |
4 |
=days@o(A2) |
5 |
=file("lineitem_15.ctx").open().cursor@mv(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE< A3) |
6 |
=A5.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue) |
7 |
=A6.maxp@a(total_revenue) |
8 |
=A7.new(supplier_no,total_revenue,(s=supplier(supplier_no)).S_NAME,s.S_ADDRESS,s.S_PHONE) |
9 |
return interval@ms(A1,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
19.6 |
5.6 |
After optimization |
10.0 |
3.7 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL