Performance Optimization Exercises Using TPC-H – Q14

 

SQL code and analysis

Below is the SQL query statement:

select
    100.00 * sum(
        case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount)
            else 0 end)
        / sum(l_extendedprice * (1 - l_discount) ) as promo_revenue
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-04-01'
    and l_shipdate < date '1995-04-01' + interval '1' month;

This is an aggregate operation on the filtered result set of two-table association.

SPL solution

This is a regular association-based sum query. We can handle it by making full use of the parallel processing.


A

1

=now()

2

1995-4-1

3

=elapse@m(A2,1)

4

=file("part.ctx").open().cursor@m(P_PARTKEY,P_TYPE).fetch().keys@i(P_PARTKEY)

5

=file("lineitem.ctx").open().cursor@m(L_PARTKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 &&L_SHIPDATE<A3,L_PARTKEY:A4)

6

=A5.run(L_EXTENDEDPRICE*=(1-L_DISCOUNT),L_DISCOUNT=if(pos@h(L_PARTKEY.P_TYPE,"PROMO"),L_EXTENDEDPRICE,0))

7

=A6.total(sum(L_DISCOUNT),sum(L_EXTENDEDPRICE))

8

=100.00*A7(1)/A7(2)

9

=interval@ms(A1,now())

Further optimization

1. Optimization method

In this example, we will use date-integer conversion optimization method explained in Q1, where lineitem table’s L_SHIPDATE field has been converted, and dimension table primary key numberization method explained in Q2 – lineitem’s L_PARTKEY field has been converted in the previous example. The part table’s P_PARTKEY has been converted and its P_TYPE field is also converted to the integer type, but the latter is not needed in this example. So, we re-generate composite table part here.

2. Code for data conversion

2.1 Conversion on part table


A

1

=file("part.ctx").open().cursor().fetch()

2

=A1.run(P_PARTKEY=#)

3

=file("part_14.ctx").create(#P_PARTKEY, P_NAME,P_MFGR, P_BRAND, P_TYPE, P_SIZE, P_CONTAINER, P_RETAILPRICE, P_COMMENT)

4

>A3.append(A2.cursor())

2.2 Conversion on lineitem table

Copy lineitem_13.ctx and rename it lineitem_14.ctx.

3. Code after data conversion

First, we need to preload the dimension table. Below is preloading code:


A

1

>env(part, file("part_14.ctx").open().import())

Before performing the query, we need to first run the preloading code to load the small dimension table into memory.

Computing code:


A

1

=now()

2

1995-4-1

3

=days@o(elapse@m(A2,1))

4

=days@o(A2)

5

=part.@m(pos@h(P_TYPE,"PROMO"))

6

=file("lineitem_14.ctx").open().cursor@m(L_PARTKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE<A3)

7

=A6.run(L_EXTENDEDPRICE*=(1-L_DISCOUNT),L_DISCOUNT=if(A5(L_PARTKEY),L_EXTENDEDPRICE,0))

8

=A7.total(sum(L_DISCOUNT),sum(L_EXTENDEDPRICE))

9

=100.00*A8(1)/A8(2)

10

=interval@ms(A1,now())

Using enterprise edition’s column-wise computation

1. Original data


A

1

=now()

2

1995-4-1

3

=elapse@m(A2,1)

4

=file("part.ctx").open().cursor@mv(P_PARTKEY,P_TYPE).fetch().keys@i(P_PARTKEY)

5

=file("lineitem.ctx").open().cursor@mv(L_PARTKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2 && L_SHIPDATE<A3).join(L_PARTKEY,A4,P_TYPE)

6

=A5.derive@o(L_EXTENDEDPRICE*(1-L_DISCOUNT):dp,if(pos@h(P_TYPE,"PROMO"),dp,0.0):dp1)

7

=A6.total(sum(dp1),sum(dp))

8

=100.00*A7(1)/A7(2)

9

=interval@ms(A1,now())

2. Optimized data

First, we need to preload the dimension table. Below is preloading code:


A

1

>env(part, file("part_14.ctx").open().import@v())

Before performing the query, we need to first run the preloading code to load the small dimension table into memory.

Computing code


A

1

=now()

2

1995-4-1

3

=days@o(elapse@m(A2,1))

4

=days@o(A2)

5

=part.(pos@h(p_type(P_TYPE),"PROMO"))

6

=file("lineitem_14.ctx").open().cursor@mv(L_PARTKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A4 && L_SHIPDATE<A3)

7

=A6.derive@o(L_EXTENDEDPRICE*(1-L_DISCOUNT):dp,if(A5(L_PARTKEY),dp,0):dp1)

8

=A7.total(sum(dp1),sum(dp))

9

=100.00*A8(1)/A8(2)

10

=interval@ms(A1,now())

Test result

Unit: Second


Regular

Column-wise

Before optimization

14.2

6.3

After optimization

6.6

2.8