Learn performance optimization skills from TPCH tests - Q21

 

I.   Query Requirement

  Q21 queries for suppliers who cannot ship required parts in a timely manner.

  Q21 is characterized by four tables join operations with grouping, sorting, aggregation, EXISTS sub-query and NOT EXISTS sub-query. Query statements do not grammatically limit how many tuples are returned, but the TPC-H standard stipulates that only the first 100 rows of the query results are returned (usually depending on the application).

II.  Oracle Execution

  The query SQL written in Oracle is as follows:

    select * from (
	select  /*+ parallel(n) */
	    s_name,
	    count(*) as numwait
	from
	    supplier,lineitem l1,orders,nation
	where
	    s_suppkey = l1.l_suppkey
	    and o_orderkey = l1.l_orderkey
	    and o_orderstatus = 'F'
	    and l1.l_receiptdate > l1.l_commitdate
	    and exists (
		select
		    *
		from
		    lineitem l2
		where
		    l2.l_orderkey = l1.l_orderkey
		    and l2.l_suppkey <> l1.l_suppkey
	    )
	    and not exists (
		select
		    *
		from
		    lineitem l3
		where
		    l3.l_orderkey = l1.l_orderkey
		    and l3.l_suppkey <> l1.l_suppkey
		    and l3.l_receiptdate > l3.l_commitdate
	    )
	    and s_nationkey = n_nationkey
	    and n_name = 'CHINA'
	group by
	    s_name
	order by
	    numwait desc,
	    s_name
    ) where rownum<=100;

  Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

978

634

463

363

326

III. SPL Optimization

  The main query of this problem is relatively simple, that is, the joins about the primary-sub table and the foreign key table introduced earlier. The main problem here is two sub-queries with EXISTS. After carefully analyzing these two sub-queries, it can be found that they are all operations on the LINEITEM records under the same l_orderkey.

  We know that LINEITEM has been sorted by l_orderkey, and sub-table records can be regarded as the set fields of the primary table. If we orderly group the join result of ORDERS and LINEITEM according to orderkey (but do not aggregate), we can get small subsets of LINEITEM records with the same l_orderkey, and then calculate the two EXISTS conditions based on the small subsets, which will be simpler this way.

  There is no explicit set data type in SQL, which makes it impossible to describe this operation process. While SPL provides this data type, and can implement this idea.

 

  The SPL script is as follows 


A

1

=now()

2

>name="CHINA"

3

=file("nation.btx").import@b().select(N_NAME==name).derive@o().keys@i(N_NATIONKEY)

4

=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME;A3.find(S_NATIONKEY)).fetch().keys@im(S_SUPPKEY)

5

=file("orders.ctx").open().cursor@m(O_ORDERKEY;O_ORDERSTATUS=="F")

6

=file("lineitem.ctx").open().news(A5,L_ORDERKEY,L_SUPPKEY,L_COMMITDATE,L_RECEIPTDATE)

7

=A6.group(L_ORDERKEY)

8

=A7.conj(if(   (t=~.select(~.L_RECEIPTDATE>~.L_COMMITDATE)) &&   (s=t.m(1).L_SUPPKEY)&& !t.select@1(~.L_SUPPKEY!=s) &&   ~.select@1(~.L_SUPPKEY!=s),t,null) )

9

=A8.switch@i(L_SUPPKEY,A4)

10

=A9.groups@u(L_SUPPKEY.S_NAME:s_name;count(1):numwait)

11

=A10.top(100;[-numwait,s_name])

12

return interval@ms(A1,now())

  A6 joins ORDERS and LINEITEM, and A7 orderly groups it into subsets of LINEITEM records with the same l_orderkey value. In A8, these small subsets are judged with EXISTS condition to filter out the unsatisfactory ones. The latter code joins other foreign key tables and performs ordinary grouping operations.

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

978

634

463

363

326

SPL composite table

259

163

83

40

27