Learn performance optimization skills from TPCH tests - Q10

 

I Query Requirement

Q10 is to query the customers with problematic freight and the losses caused by it within three months from a certain time in each country.

Q10 is characterized by multi-table join query operation with grouping, sorting and aggregation operations. Query statements do not grammatically limit how many tuples are returned, but according to the TPC-H standard, only the first 10 rows of the query results are returned(usually depending on the application).

II Oracle Execution

The query SQL written in Oracle is as follows:

 select * from (

         select  /*+ parallel(n) */

                   c_custkey,c_name,

                   sum(l_extendedprice * (1 - l_discount)) as revenue,

                   c_acctbal,n_name,c_address,c_phone,c_comment

         from

                   customer,orders,lineitem,nation

         where

                   c_custkey = o_custkey

                   and l_orderkey = o_orderkey

                   and o_orderdate >= date '1993-05-01'

                   and o_orderdate < date '1993-05-01' + interval '3' month

                   and l_returnflag = 'R'

                   and c_nationkey = n_nationkey

         group by

                   c_custkey,

                   c_name,

                   c_acctbal,

                   c_phone,

                   n_name,

                   c_address,

                   c_comment

         order by

                   revenue desc

) where rownum <=10;

Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

Script execution time, Unit: seconds

 

Number of parallel

1

2

4

8

12

Oracle

591

399

313

237

215

 

III SPL Optimization

The optimization principle of JOIN between orders and lineitem primary-sub tables here is similar to that in Q3.

The SPL script is as follows

 


A

1

=now()

2

1993-5-1

3

=elapse@m(A2,3)

4

=file("orders.ctx").open().cursor@m(O_ORDERKEY,O_CUSTKEY,O_ORDERDATE;O_ORDERDATE>=A2   && O_ORDERDATE<A3)

5

=file("lineitem.ctx").open().news(A4,L_ORDERKEY,L_EXTENDEDPRICE,L_DISCOUNT,L_RETURNFLAG,O_CUSTKEY,O_ORDERDATE;L_RETURNFLAG=="R")

6

=A5.groups@u(O_CUSTKEY:c_custkey;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):revenue)

7

=A6.top(-10;revenue)

8

=file("nation.btx").import@b().keys@i(N_NATIONKEY)

9

=file("customer.ctx").open()

10

=A7.joinx@q(c_custkey,A9:C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_COMMENT).fetch()

11

=A10.switch(C_NATIONKEY,A8:N_NATIONKEY)

12

=A11.new(c_custkey:C_CUSTKEY,C_NAME,revenue,C_ACCTBAL,C_NATIONKEY.N_NAME,C_ADDRESS,C_PHONE,C_COMMENT)

13

=A12.sort@z(revenue)

14

return interval@ms(A1,now())

 

First, get the grouping and aggregation operation on the join result of orders and lineitem tables done, then perform the related foreign key table join operation based on the intermediate result. The latter join calculation should not be done before grouping, instead, it can be limited to the required 10 records, otherwise the amount of calculation will increase.

After calculating A7, because the customer table is ordered to C_CUSTKEY, the relevant records can be quickly retrieved from the customer table by orderly matching in A7.cursor(). joinx@q, and then next JOIN can be done without traversing the customer table to reduce the amount of data reading.

 

Script execution time, Unit: seconds

 

Number of parallel

1

2

4

8

12

Oracle

591

399

313

237

215

SPL composite table

108

61

36

23

21