Performance Optimization Exercises Using TPC-H – Q15

 

SQL code and analysis

Below is the SQL query statement:

create view revenue (supplier_no, total_revenue) as
    select
        l_suppkey,
        sum(l_extendedprice * (1 - l_discount))
    from
        lineitem
    where
        l_shipdate >= date '1995-04-01'
        and l_shipdate < date '1995-04-01' + interval '3' month
    group by
        l_suppkey;

select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,
    revenue
where
    s_suppkey = supplier_no
    and total_revenue = (
        select
            max(total_revenue)
        from
            revenue
    )
order by
    s_suppkey;

drop view revenue;

In the code, we first generate view revenue, then associate it with supplier table, and select the record containing the maximum value.

SPL solution

The SPL query has two phases. The first phase generates view revenue, and the second phase gets the record where total_revenue is the biggest from the view. The former is a regular grouping & aggregation operation, where parallel processing is used to increase performance. SPL offers A.maxp method that can directly return the record holding the maximum value.


A

1

=now()

2

1995-4-1

3

=elapse@m(A2,3)

4

=file("lineitem.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<A3)

5

=A4.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue)

6

=A5.maxp@a(total_revenue)

7

=file("supplier.ctx").open()

8

=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE)

9

=interval@ms(A1,now())

A5 gets view revenue. A6 uses maxp@a to return the record having the biggest total_revenue through one traversal. After the target records are obtained, we search supplier table for the other fields. This helps reduce computations.

Further optimization

1. Optimization method

In this example, we will use the date-integer conversion optimization method explained in Q1, where lineitem table’s L_SHIPDATE has been converted, and the dimension table primary key numberization method – supplier table’s S_SUPPKEY field and lineitem table’s L_SUPPKEY field have been converted in the previous examples.

2. Code for data conversion

Copy supplier_11.ctx and lineitem_14.ctx, and rename them supplier_15.ctx and lineitem_15.ctx respectively.

3. Code after data conversion

First, we need to preload the dimension table. Below is preloading code:


A

1

>env(supplier, file("supplier_15.ctx").open().import())

Before performing the query, we need to first run the preloading code to load the small dimension table into memory.

Computing code:


A

1

=now()

2

1995-4-1

3

=days@o(elapse@m(A2,3))

4

=days@o(A2)

5

=file("lineitem_15.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE< A3)

6

=A5.groups@n(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue)

7

=A6.maxp@a(total_revenue)

8

=A7.new(supplier_no,total_revenue,(s=supplier(supplier_no)).S_NAME,s.S_ADDRESS,s.S_PHONE)

9

=interval@ms(A1,now())

A6 uses groups@n to perform grouping operation, with same working principle used in Q13. A7 uses maxp function to get the record containing the maximum value.

Using enterprise edition’s column-wise computation

1. Original data


A

1

=now()

2

1995-4-1

3

=elapse@m(A2,3)

4

=file("lineitem.ctx").open().cursor@mv(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<A3)

5

=A4.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue)

6

=A5.maxp@a(total_revenue)

7

=file("supplier.ctx").open()

8

=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE)

9

=interval@ms(A1,now())

2. Optimized data

First, we need to preload the dimension table. Below is preloading code:


A

1

>env(supplier, file("supplier_15.ctx").open().import@v())

Before performing the query, we need to first run the preloading code to load the small dimension table into memory.

Computing code:


A

1

=now()

2

1995-4-1

3

=days@o(elapse@m(A2,3))

4

=days@o(A2)

5

=file("lineitem_15.ctx").open().cursor@mv(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE< A3)

6

=A5.groups@u(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue)

7

=A6.maxp@a(total_revenue)

8

=A7.new(supplier_no,total_revenue,(s=supplier(supplier_no)).S_NAME,s.S_ADDRESS,s.S_PHONE)

9

return interval@ms(A1,now())

Test result

Unit: Second


Regular

Column-wise

Before optimization

19.6

5.6

After optimization

10.0

3.7