Performance Optimization Exercises Using TPC-H – Q4
Ⅰ SQL code and analysis
Below is the SQL query statement:
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= date '1995-10-01'
and o_orderdate < date '1995-10-01' + interval '3' month
and exists (
select * from lineitem
where l_orderkey = o_orderkey and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
This code contains an exists clause for finding records meeting l_commitdate < l_receiptdate from the sub table lineitem. We can always rewrite the exists clause as a join query, as shown below:
select
o_orderpriority,
count(*) as order_count
from (
select
distinct l_orderkey,o_orderpriority
from orders
join lineitem on
l_orderkey = o_orderkey
where
o_orderdate >= date '1995-10-01'
and o_orderdate < date '1995-10-01' + interval '3' month
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
The rewritten statement still focuses on the primary-sub table association. And the association operation aims to generate a filtering condition on the primary table. The intermediate result set, actually, corresponds to orders table, so we need to perform distinct operation on the association result according to orderkey.
Ⅱ SPL solution
Perform association using the order-based merge explained in essay Q3, and get the filtering condition on the primary table:
A |
|
1 |
=now() |
2 |
1995-10-01 |
3 |
=elapse@m(A2,3) |
4 |
=file("orders.ctx").open().cursor@m(O_ORDERKEY,O_ORDERPRIORITY;O_ORDERDATE>=A2 && O_ORDERDATE<A3) |
5 |
=file("lineitem.ctx").open().news@r(A4,O_ORDERPRIORITY;L_COMMITDATE<L_RECEIPTDATE) |
6 |
=A5.groups(O_ORDERPRIORITY;count(1):order_count) |
7 |
=interval@ms(A1,now()) |
A4 and A5 use the above-mentioned technique. A5 uses news@r() to group lineitem table by L_ORDERDEY while filtering away groups that do not meet L_COMMITDATE<L_RECEIPTDATE and retaining one record in each group.
A6 groups A5’s result by O_ORDERPRIORITY and counts records in each group.
Ⅲ Further optimization
1. Optimization method
In this example, we need to use the “date-integer conversion” optimization method explained in essay Q1. The orders table’s O_ORDERDATE field is already converted in essay Q3. Here we just need to convert lineitem table’s L_COMMITDATE field and L_RECEIPTDATE field.
Another optimization method involved in the example is string-integer conversion, which will convert values of O_ORDERPRIORITY into integers.
2. Code for data conversion
2.1 Conversion on orders table
A |
|
1 |
=file("orders.ctx").open().cursor(O_ORDERPRIORITY) |
2 |
=A1.id(O_ORDERPRIORITY).sort() |
3 |
=file("o_orderpriority.txt").export(A2) |
4 |
=file("orders_3.ctx").open().cursor() |
5 |
=A4.run(O_ORDERPRIORITY=A2.pos@b(O_ORDERPRIORITY)) |
6 |
=file("orders_4.ctx").create(#O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE, O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT) |
7 |
>A6.append(A5) |
2.2 Conversion on lineitem table
A |
|
1 |
=file("lineitem_3.ctx").open().cursor() |
2 |
=A1.run(L_COMMITDATE=days@o(L_COMMITDATE), L_RECEIPTDATE=days@o(L_RECEIPTDATE)) |
3 |
=file("lineitem_4.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT) |
4 |
>A3.append(A2) |
A2 converts values of L_COMMITDATE field and L_RECEIPTDATE field into small integers.
3. Code after data conversion
A |
|
1 |
>o_orderpriority=file("o_orderpriority.txt").import@si() |
2 |
=now() |
3 |
1995-10-01 |
4 |
=days@o(elapse@m(A3,3)) |
5 |
=days@o(A3) |
6 |
=file("orders_4.ctx").open().cursor@m(O_ORDERKEY,O_ORDERPRIORITY;O_ORDERDATE>=A5 && O_ORDERDATE<A4) |
7 |
=file("lineitem_4.ctx").open().news@r(A6,O_ORDERPRIORITY;L_COMMITDATE<L_RECEIPTDATE) |
8 |
=A7.groups(O_ORDERPRIORITY;count(1):order_count) |
9 |
=A8.run(O_ORDERPRIORITY=o_orderpriority(O_ORDERPRIORITY)) |
10 |
=interval@ms(A2,now()) |
A9 restores the integer type O_ORDERPRIORITY field to the original string type one.
Ⅳ Using enterprise edition’s column-wise computation
1. Original data
A |
|
1 |
=now() |
2 |
1995-10-01 |
3 |
=elapse@m(A2,3) |
4 |
=file("orders.ctx").open().cursor@mv(O_ORDERKEY,O_ORDERPRIORITY;O_ORDERDATE>=A2 && O_ORDERDATE<A3) |
5 |
=file("lineitem.ctx").open().news@r(A4,O_ORDERPRIORITY;L_COMMITDATE<L_RECEIPTDATE) |
6 |
=A5.groups(O_ORDERPRIORITY;count(1):order_count) |
7 |
=interval@ms(A1,now()) |
The cursor()function in A4 works with @v option to enable retrieving data using column-wise cursor. A5 uses news() function to create a column-wise, synchronously grouped cursor according to A4.
2. Optimized data
A |
|
1 |
>o_orderpriority=file("o_orderpriority.txt").import@si() |
2 |
=now() |
3 |
1995-10-01 |
4 |
=days@o(elapse@m(A3,3)) |
5 |
=days@o(A3) |
6 |
=file("orders_4.ctx").open().cursor@mv(O_ORDERKEY,O_ORDERPRIORITY;O_ORDERDATE>=A5 && O_ORDERDATE<A4) |
7 |
=file("lineitem_4.ctx").open().news@r(A6,O_ORDERPRIORITY;L_COMMITDATE<L_RECEIPTDATE) |
8 |
=A7.groups(O_ORDERPRIORITY;count(1):order_count) |
9 |
=A8.new(o_orderpriority(O_ORDERPRIORITY):O_ORDERPRIORITY,order_count) |
10 |
=interval@ms(A2,now()) |
Ⅴ Test result
Unit: Second
Regular |
Column-wise |
|
Before optimization |
6.5 |
5.2 |
After optimization |
4.7 |
3.3 |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL