Learn performance optimization skills from TPCH tests - Q13

 

I.   Query Requirement

  Q13 queries the number of orders received by consumers, including those who did not and do not have an order record.

  Q13 is characterized by query operation with grouping, sorting, aggregation, sub-query and left outer join operation.

II.  Oracle Execution

  The query SQL written in Oracle is as follows:

    select  /*+ parallel(n) */
	c_count,
	count(*) as custdist
    from (
	select
	    c_custkey,
	    count(o_orderkey) c_count
	from
	    customer left outer join orders on
		c_custkey = o_custkey
		and o_comment not like '%special%accounts%'
	group by
	    c_custkey
    ) c_orders
    group by
	c_count
    order by
	custdist desc,
	c_count desc;

  Where /*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

316

175

89

48

37

III. SPL Optimization

  This query can be simply considered as two rounds of routine grouping on orders. The first round groups and calculates the order number of each customer by custkey, and the second one groups and calculates the number of customers for each order number.

  Notice that there is a left join in the original SQL, which will include the customers who have not placed an order (the order number is 0), but these data will be omitted in the above two rounds of grouping, which needs to be supplemented later.

 

  The SPL script is as follows


A

1

=now()

2

>filter="*special*accounts*"

3

=file("orders.ctx").open().cursor@m(O_CUSTKEY;!like(O_COMMENT,filter))

4

=A3.groups@u(O_CUSTKEY;count(1):c_count)

5

=A4.len()

6

=A4.cursor@m().groups@u(c_count;count(1):custdist)

7

=file("customer.ctx").open().cursor().skip()

8

=A6.insert(0,0,A7-A5)

9

=A6.sort@z(custdist,c_count)

10

return interval@ms(A1,now())

 

  A4 does the first round of grouping, A6 does the second; A7 calculates the number of all customers, subtracts the number of customers who have placed orders and gets the number of customers who havent, and adds it to A6 to sort together.

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

316

175

89

48

37

SPL composite table

352

218

131

99

109

 

  For this query, SPL is slower than Oracle, spending most of its time in the first round of grouping (A4). The main reason is that the grouping result here is large and occupies a lot of memory. SPL is developed in Java at present, while JVM is bad at memory management, which will frequently perform garbage collection and take a lot of time when memory is exhausted. But the database developed in C++ has no such problem.

IV.  Further Optimization

  When the groups function is used to group in SPL, if the grouping field is sequence number, it can be positioned directly with the @n option to avoid HASH calculation. In this example, the first round is grouped by O_CUSTKEY, which are integers in the data table, so the @n option can be used. However, the value of O_CUSTKEY in the example is large, that is, the number of groupings is large, occupying a lot of memory. When there are many parallel threads, if each thread holds a large group, the memory will overflow. In this case, the parallel number of groups@n should decrease.

  The SPL script is as follows: 


A

1

=now()

2

>filter="*special*accounts*"

3

=file("orders.ctx").open().cursor@m(O_CUSTKEY;!like(O_COMMENT,filter))

4

=A3.groups@n(O_CUSTKEY;count(1):c_count).select@m(c_count)

5

=A4.len()

6

=A4.cursor@m().groups@u(c_count;count(1):custdist)

7

=file("customer.ctx").open().cursor().skip()

8

=A6.insert(0,0,A7-A5)

9

=A6.sort@z(custdist,c_count)

10

return interval@ms(A1,now())

 

  Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

316

175

89

48

37

SPL composite table

352

218

131

99

109

SPL composite table (@n)

228

137

95

67

57

  The test results show that the efficiency is indeed improved, and the performance exceeds Oracle in low parallel, while in high parallel, collecting garbage takes too much time because of the large memory occupancy, which makes the performance surpassed by Oracle.