Performance Optimization Exercises Using TPC-H – Q11

 

SQL code and analysis

Below is the SQL query statement:

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'CHINA'
group by
    ps_partkey
having
    sum(ps_supplycost * ps_availqty) > (
        select
            sum(ps_supplycost * ps_availqty) * 0.000001
        from
            partsupp,
            supplier,
            nation
        where
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = 'CHINA'
    )
order by
    value desc;

If we regard the following subquery as a view V,

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
from
    partsupp,
    supplier,
    nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = 'CHINA'
group by
    ps_partkey

The body of the original query statement is equivalent to:

select
    ps_partkey,
    value
from V
where value>0.000001*(select sum(value) from V)

Here view V is a grouping result set that contains a relatively small number of records. Traversing V is much less computation-intensive than traversing partsupp table directly.

SPL solution


A

1

=now()

2

>name="CHINA"

3

>percent=0.000001

4

=file("nation.btx").import@b().select@1(N_NAME== name).N_NATIONKEY

5

=file("supplier.ctx").open().cursor@m(S_SUPPKEY;S_NATIONKEY==A4).fetch().keys@i(S_SUPPKEY)

6

=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5.find(PS_SUPPKEY))

7

=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value)

8

=A7.sum(value)*percent

9

=A7.select(value>A8).sort@z(value)

10

=interval@ms(A1,now())

The partsupp table is ordered by the composite primary key consisting of PS_PATKEY field and PS_SUPPKEY field. This is equivalent to the table being ordered by PS_PARTKEY, so we can use the order-based grouping strategy to perform grouping operation on this field in order to increase performance of computing view V. A7 uses groups@o to perform the order-based grouping, which amounts to computing view V. Then A8 and A9 traverse A7 twice to get the final result.

Further optimization

1. Optimization method

In this example, we will use the dimension table primary key numberization method explained in Q2 – supplier table’s S_SUPPKEY field and partsupp table’s PS_SUPPKEY field have been converted in the previous examples.

2. Code for data conversion

Copy nation_10.btx, supplier_9.ctx and partsupp_9.ctx, and rename them nation_11.btx, supplier_11.ctx and partsupp_11.ctx respectively.

3. Code after data conversion

First, we need to preload dimension tables. Below is preloading code:


A

1

>env(nation, file("nation_11.btx").import@b())

2

>env(supplier, file("supplier_11.ctx").open().import())

Before performing the query, we need to first run the preloading code to load small dimension tables into memory.

Computing code:


A

1

=now()

2

>name="CHINA"

3

>percent=0.000001

4

=nation.select@1(N_NAME== name).N_NATIONKEY

5

=supplier.@m(S_NATIONKEY==A4)

6

=file("partsupp_11.ctx").open().cursor@m(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5(PS_SUPPKEY))

7

=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value)

8

=A7.sum(value)*percent

9

=A7.select(value>A8).sort@z(value)

10

=interval@ms(A1,now())

Using enterprise edition’s column-wise computation

1. Original data


A

1

=now()

2

>name="CHINA"

3

>percent=0.000001

4

=file("nation.btx").import@b().select@1(N_NAME== name).N_NATIONKEY

5

=file("supplier.ctx").open().cursor@mv(S_SUPPKEY;S_NATIONKEY==A4).fetch().keys@i(S_SUPPKEY)

6

=file("partsupp.ctx").open().cursor@mv(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5.find(PS_SUPPKEY))

7

=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value)

8

=A7.sum(value)*percent

9

=A7.select@v(value>A8).sort@z(value)

10

=interval@ms(A1,now())

2. Optimized data

First, we need to preload dimension tables. Below is preloading code:


A

1

>env(nation, file("nation_11.btx").import@bv())

2

>env(supplier, file("supplier_11.ctx").open().import@v())

Before performing the query, we need to first run the preloading code to load small dimension tables into memory.

Computing code:


A

1

=now()

2

>name="CHINA"

3

>percent=0.000001

4

=nation.select@1(N_NAME== name).N_NATIONKEY

5

=supplier.(S_NATIONKEY==A4)

6

=file("partsupp_11.ctx").open().cursor@mv(PS_PARTKEY,PS_AVAILQTY,PS_SUPPLYCOST;A5(PS_SUPPKEY))

7

=A6.groups@o(PS_PARTKEY;sum(PS_SUPPLYCOST*PS_AVAILQTY):value)

8

=A7.sum(value)*percent

9

=A7.select@v(value>A8).sort@z(value)

10

=interval@ms(A1,now())

Test result

Unit: Second


Regular

Column-wise

Before optimization

2.7

1.1

After optimization

2.0

0.8