Learn performance optimization skills from TPCH tests - Q22

 

I.   Query Requirement

  Q22 queries to identity the geographies where there are customers likely to make a purchase. This query counts the number of consumers in specified countries who are more positive than the average but have not placed orders for seven years. It also reflects the attitude of ordinary consumers, that is, purchasing intention.

  Q22 is characterized by two tables join operations with grouping, sorting, aggregation, sub-query and NOT EXISTS sub-query.

II.  Oracle Execution

  The query SQL written in Oracle is as follows:

    select  /*+ parallel(n) */
	cntrycode,
	count(*) as numcust,
	sum(c_acctbal) as totacctbal
    from
	(
	    select
		substr(c_phone,1,2) as cntrycode,
		c_acctbal
	    from
		customer
	    where
		substr(c_phone,1,2) in
		    ('11', '14', '15', '19', '20', '21', '23')
		and c_acctbal > (
		    select
			avg(c_acctbal)
		    from
			customer
		    where
			c_acctbal > 0.00
			and substr(c_phone,1,2) in
			    ('11', '14', '15', '19', '20', '21', '23')
	        )
		and not exists (
	            select
		        *
		    from
			orders
		    where
			o_custkey = c_custkey
		)
	) custsale
    group by
	cntrycode
    order by
	cntrycode;

  Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

128

86

70

53

48

III. SPL Optimization

  The outer layer of this query is a conventional grouping aggregation, whose main complexity lies in the two sub-queries used for conditions in the inner layer. These two sub-queries can correspond to some CUSTOMER records that satisfy the conditions, while the latter sub-query is modified with NOT EXIST, so the final result will be the difference set of the two set of CUSTOMER records corresponding to the two sub-queries.

 

  The SPL script is as follows 

IMG_256

A

1

=now()

2

=["11","14","15","19","20","21","23"]

3

=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0     && A2.contain(left(C_PHONE,2))).fetch()

4

=A3.avg(C_ACCTBAL)

5

=A3.select(C_ACCTBAL>A4).derive@o().keys@i(C_CUSTKEY)

6

=file("orders.ctx").open().cursor@m(O_CUSTKEY;A5.find(O_CUSTKEY))

7

=A6.groups(O_CUSTKEY:C_CUSTKEY)

8

=[A5,A7].merge@d(C_CUSTKEY)

9

=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust,     sum(C_ACCTBAL):totacctbal)

10

return interval@ms(A1,now())

 

  The results of grouping operation in SPL are ordered, that is, A7 is ordered to C_CUSTKEY, while A5 itself comes from CUSTOMER table, which is also ordered to C_CUSTKEY, thus two ordered sets can use merge algorithm to calculate the difference set at high speed (in A8).

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

128

86

70

53

48

SPL composite   table

102

53

29

24

19