Learn performance optimization skills from TPCH tests - Q20

 

I.   Query Requirement

  Q20 is to identify suppliers in a given nation that can provide a more competitive price of a particular part in a certain year. The so-called more competitive suppliers refer to those suppliers who have an excess of a given part available; an excess is defined to be more than 50% of the certain parts that the supplier shipped for a given nation in a given year.    

  Q20 is characterized as two tables join operations with sorting, aggregation, IN sub-query and ordinary sub-query.

II.   Oracle Execution

  The query SQL written in Oracle is as follows:

    select  /*+ parallel(n) */
	s_name,s_address
    from
	supplier,nation
    where
	s_suppkey in (
	    select
		ps_suppkey
	    from
		partsupp
	    where
		ps_partkey in (
		    select
			p_partkey
		    from
			part
		    where
			p_name like 'bisque%'
		)
		and ps_availqty > (
		    select
			0.5 * sum(l_quantity)
		    from
			lineitem
		    where
			l_partkey = ps_partkey
			and l_suppkey = ps_suppkey
			and l_shipdate >= date '1995-01-01'
			and l_shipdate < date '1995-01-01' + interval '1' year
		)
	)
	and s_nationkey = n_nationkey
	and n_name = 'CHINA'
    order by
	s_name;

  Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

437

307

242

201

175

III. SPL Optimization

  This query looks complex with many layers nested, mainly because SQL does not advocate step-by-step operation. If we divide it into several steps, it will be much clearer:

  1. Filter the NATION table according to conditions
  2. Filter the PART table according to conditions
  3. Match and filter the SUPPLIER table with the result of 1 as a foreign key table
  4. Match and filter the PARTSUPP table with the results of 2 and 3 as a foreign key table
  5. Use the result of 4 as a foreign key table to be joined on the LINEITEM table, and select the PS_AVAILQTY field, then group and aggregate by L_PARTKEY and L_SUPPKEY. As mentioned earlier, after grouping, the sub-query associated with the primary table with equivalent conditions can be rewritten as a joint statement with the primary table. Select L_SUPPKEY that satisfies the conditions after grouping and aggregation.
  6. Use the result of 5 as a foreign key table to filter the SUPPLIER table

  The whole process is to generate intermediate foreign key tables to do matching and filtering repeatedly. 

  The SPL script is as follows: 


A

1

=now()

2

1995-1-1

3

=elapse@y(A2,1)

4

>partname="bisque"

5

>nationname="CHINA"

6

=file("nation.btx").import@b().select(N_NAME==nationname).derive@o().keys@i(N_NATIONKEY)

7

=file("part.ctx").open().cursor@m(P_PARTKEY;pos@h(P_NAME,partname)).fetch().keys@im(P_PARTKEY)

8

=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS;A6.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY)

9

=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_AVAILQTY;A7.find(PS_PARTKEY),A8.find(PS_SUPPKEY)).fetch().keys@im(PS_PARTKEY,PS_SUPPKEY)

10

=file("lineitem.ctx").open().cursor@m(L_PARTKEY,L_SUPPKEY,L_QUANTITY;L_SHIPDATE>=A2     && L_SHIPDATE<A3)

11

=A10.join@i(L_PARTKEY:L_SUPPKEY,A9:PS_PARTKEY:PS_SUPPKEY,PS_AVAILQTY)

12

=A11.groups@u(L_PARTKEY,L_SUPPKEY,PS_AVAILQTY;sum(L_QUANTITY):quantity)

13

=A12.select(PS_AVAILQTY*2>quantity).id(L_SUPPKEY)

14

=A8.switch@i(S_SUPPKEY,A13)

15

=A14.new(S_NAME,S_ADDRESS).sort@o(S_NAME)

16

return interval@ms(A1,now())

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

437

307

242

201

175

SPL composite table

128

65

36

21

15