Learn performance optimization skills from TPCH tests - Q16

 

I.   Query Requirement

  Q16 queries to obtain the number of suppliers that can supply parts under specified contribution conditions, which can be used to determine whether there are sufficient suppliers when the order quantity is large and the task is urgent.

  Q16 is characterized by two table join operations with grouping, sorting, aggregation, DISTINCT and NOT IN sub-query.

II.  Oracle Execution

  The query SQL written in Oracle is as follows:

    select  /*+ parallel(n) */
         p_brand,p_type,p_size,
         count(distinct ps_suppkey) as supplier_cnt
    from
         partsupp,part
    where
         p_partkey = ps_partkey
         and p_brand <> 'Brand#21'
         and p_type not like 'SMALL%'
         and p_size in (2, 15, 17, 23, 25, 41, 44, 45)
         and ps_suppkey not in (
               select
                    s_suppkey
               from
                    supplier
               where
                    s_comment like '%Customer%Complaints%'
         )
    group by
         p_brand,
         p_type,
         p_size
    order by
         supplier_cnt desc,
         p_brand,
         p_type,
         p_size;

  Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

57

34

23

15

13

III. SPL Optimization

  This query uses two foreign key tables PART and SUPPLIER for match filtering and then grouping, and two foreign key table fields will be used when grouping. The techniques mentioned before will be applied at this point: first the foreign key table is filtered according to conditions, and then the join field of the primary table is matched with the foreign key table, where the matched field is converted to the record pointer of the foreign key table, so that the records that do not match can be filtered out directly, and the foreign key table field can be directly referred in the subsequent grouping operation.

 

  The SPL script is as follows: 


A

1

=now()

2

>brand="Brand#21"

3

>type="SMALL"

4

>sizes=[2,15,17,23,25,41,44,45]

5

=file("part.ctx").open().cursor@m(P_PARTKEY,P_BRAND,P_TYPE,P_SIZE;P_BRAND!=brand     && !pos@h(P_TYPE,type)   &&     sizes.contain@b(P_SIZE)).fetch().keys@im(P_PARTKEY)

6

=file("supplier.ctx").open().cursor@m(S_SUPPKEY;!like(S_COMMENT,"*Customer*Complaints*")).fetch().keys@im(S_SUPPKEY)

7

=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY;PS_PARTKEY:A5,PS_SUPPKEY:A6)

8

=A7.groups@u(PS_PARTKEY.P_BRAND,PS_PARTKEY.P_TYPE,PS_PARTKEY.P_SIZE;icount(PS_SUPPKEY):supplier_cnt)

9

=A8.sort(-supplier_cnt,P_BRAND,P_TYPE,P_SIZE)

10

return   interval@ms(A1,now())

  A5 and A6 read and filter foreign key tables respectively. A7 creates a cursor while matching and filtering.

  Note that contain@b is used for IN judgment in A5, indicating that binary search will be used here. When the set members in IN judgment are large, the comparisons can be decreased by sorting the members first and applying binary search later, thus improving the performance.

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

57

34

23

15

13

SPL composite table

39

26

18

11

10

 

  This query involves a small amount of data, and the results of SPL operation are not very different from those of SQL.