Learn performance optimization skills from TPCH tests - Q15

 

I.   Query Requirement

  Q15 queries the information of the supplier (ranking first) who contributed the most to the total revenue in a certain period of time, which can be used to decide which first-class suppliers will be given rewards, more orders, special certifications, encouragement and so on.

  The characteristic of Q15 is joining ordinary table and view with grouping, sorting, aggregation and aggregation on sub-query.

II.  Oracle Execution

  The query SQL written in Oracle is as follows:

    create view revenue (supplier_no, total_revenue) as
         select
                   l_suppkey,
                   sum(l_extendedprice * (1 - l_discount))
         from
                   lineitem
         where
                   l_shipdate >= date '1995-04-01'
                   and l_shipdate < date '1995-04-01' + interval '3' month
         group by
                   l_suppkey;
 
         select  /*+ parallel(n) */
                   s_suppkey,
                   s_name,
                   s_address,
                   s_phone,
                   total_revenue
         from
                   supplier,
                   revenue
         where
                   s_suppkey = supplier_no
                   and total_revenue = (
                            select
                                     max(total_revenue)
                            from
                                     revenue
                   )
         order by
                   s_suppkey;
 
    drop view revenue;

Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

361

276

218

170

155

III. SPL Optimization

  This query is divided into two stages, first calculating the view revenue and then finding the record in revenue where total_revenue reaches its maximum value. The former is a conventional grouping aggregation, using parallel and columnar storage to improve performance. The latter can only be written as a sub-query in SQL, which requires two traversals. SQL does not have reference and set data types. Therefore it cannot directly return the record where the maximum value is located. In contrast, SPL provides such syntax which can return either the maximum value itself or the record where the maximum value is located during one traversal.

 

  The SPL script is as follows: 


A

1

=now()

2

1995-4-1

3

=elapse@m(A2,3)

4

=file("lineitem.ctx").open().cursor@m(L_SUPPKEY,L_EXTENDEDPRICE,L_DISCOUNT;L_SHIPDATE>=A2   &&L_SHIPDATE< A3)

5

=A4.groups@n(L_SUPPKEY:supplier_no;sum(L_EXTENDEDPRICE*(1-L_DISCOUNT)):total_revenue)

6

=A5.maxp@a(total_revenue)

7

=file("supplier.ctx").open()

8

=A6.joinx@q(supplier_no,A7:S_SUPPKEY,S_NAME,S_ADDRESS,S_PHONE).fetch()

9

return interval@ms(A1,now())

A5 calculates the view revenue, and A6 uses maxp@a to traverse once and return the records with the largest total_revenue, then goes to the supplier table to retrieve other fields, reducing the amount of calculation.

 

Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

361

276

218

170

155

SPL composite table

99

52

29

20

18