Learn performance optimization skills from TPCH tests – Q2

 

I、 Query Requirement

Q2 queries the supplier with the minimum cost. In a given area, which supplier can supply the specified parts (parts of a certain type and size) at the lowest price, then the supplier can be chosen to order from.

The characteristic of Q2 is multi-table query operation with sorting, aggregation and sub-query. Query statements do not grammatically limit how many tuples are returned, but according to the TPC-H standard, only the first 100 rows of query results need to be returned (which usually relies on application).

 

II、 Oracle Execution

The query SQL written in Oracle are as follows:

select * from (

         select   /*+ parallel(n) */

                   s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_comment

         from part,supplier,partsupp,nation,region

         where

                   p_partkey = ps_partkey

                   and s_suppkey = ps_suppkey

                   and p_size = 25

                   and p_type like '%COPPER'

                   and s_nationkey = n_nationkey

                   and n_regionkey = r_regionkey

                   and r_name = 'ASIA'

                   and ps_supplycost = (

                            select

                                     min(ps_supplycost)

                            from

                                     partsupp,

                                     supplier,

                                     nation,

                                     region

                            where

                                     p_partkey = ps_partkey

                                     and s_suppkey = ps_suppkey

                                     and s_nationkey = n_nationkey

                                     and n_regionkey = r_regionkey

                                     and r_name = 'ASIA'

                   )

         order by

                   s_acctbal desc,n_name,s_name,p_partkey

)

where rownum <= 100;

/*+ parallel(n) */ is the parallel query syntax of Oracle, and n is the parallel number.

Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

56

35

23

16

27

 

III、 SPL Optimization

Analyze this SQL carefully, if the sub-query

                            select

                                     *

                            from

                                     part,

                                     partsupp,

                                     supplier,

                                     nation,

                                     region

                            where

                                     p_partkey = ps_partkey

                                     and s_suppkey = ps_suppkey

                                     and s_nationkey = n_nationkey

                                     and n_regionkey = r_regionkey

                                     and r_name = 'ASIA'

                                     and p_size = 25

                                      and p_type like '%COPPER'

 

is regarded as a view V, the original query body can be rewritten as:

         select   /*+ parallel(n) */

                   s_acctbal,s_name,n_name,p_partkey,p_mfgr,s_address,s_phone,s_comment

         from V

         where

                   ps_supplycost = (

                            select

                                     min(ps_supplycost)

                            from

                                     V V1

                            where

                                     V.p_partkey = V1.p_partkey

                   )

In this way, the original query is transformed into a single table query, which is equivalent to finding some records in V, so that the ps_supplycost value of these records is minimized in all records with the same partkey value of the record. The essence of this operation is to group V by partkey and then aggregate each group so as to calculate the record with the smallest ps_supplycost in each group. However, this aggregation operation is not supported in SQL, which offers no other choice but a sub-query (even when transformed to a single table operation).

If the database optimization engine doesnt work well, the method described in this sub-query is strictly followed with traversing, it will lead to the complexity of N*N (N is the record number of V); even if the database optimization engine works well, it needs to first calculate the minimum value of ps_supplycost grouped by partkey to form an intermediate result set and index it, and then traverse V again, which contains a lot calculations.

A better way to solve this problem is to support the aggregation calculation of returning the record itself, which can be accomplished by one group aggregation. SPL has set and reference data types, and supports aggregation operation to aggregate records where the minimum value is located. If this idea can be implemented, the overall complexity will be much lower.

The SPL script is as follows:


A

1

=now()

2

>size=25

3

>type="COPPER"

4

>name="ASIA"

5

=file("region.btx").import@b().select(R_NAME==name).derive@o().keys@i(R_REGIONKEY)

6

=file("nation.btx").import@b().switch@i(N_REGIONKEY,A5).derive@o().keys@i(N_NATIONKEY)

7

=file("part.ctx").open().cursor@m(P_PARTKEY,P_MFGR;P_SIZE==size && pos@t(P_TYPE,type)).fetch().keys@im(P_PARTKEY)

8

=file("supplier.ctx").open().cursor@m(S_SUPPKEY,S_NAME,S_ADDRESS,S_NATIONKEY,S_PHONE,S_ACCTBAL,S_COMMENT;S_NATIONKEY:A6).fetch().keys@im(S_SUPPKEY)

9

=file("partsupp.ctx").open().cursor@m(PS_PARTKEY,PS_SUPPKEY,PS_SUPPLYCOST;PS_PARTKEY:A7,PS_SUPPKEY:A8)

10

=A9.groups(PS_PARTKEY;top@1(1;PS_SUPPLYCOST):rs).(rs)

11

=A10.new(PS_SUPPKEY.S_ACCTBAL,PS_SUPPKEY.S_NAME,PS_SUPPKEY.S_NATIONKEY.N_NAME,PS_PARTKEY.P_PARTKEY,PS_PARTKEY.P_MFGR,PS_SUPPKEY.S_ADDRESS,PS_SUPPKEY.S_PHONE,PS_SUPPKEY.S_COMMENT)

12

=A11.sort(S_ACCTBAL:-1,N_NAME,S_NAME,P_PARTKEY).to(100)

13

return interval@ms(A1,now())

What needs to be explained is that SPL is a bit lower than SQL, which does not have concepts of metadata or system-level table like SQL. Data access starts directly from the file, so the code for preparing data in SPL is slightly lengthy. In practice, it can be simplified by using SPL pre-defined whole process variables or pseudo table syntax to achieve the effect similar to SQL using data table directly. But this is not the focus of the article, the syntax of direct file access is adopted here in order to make it easier for readers to see the original flow of data.

A6-A9 of the code is used to define the cursor of view V. In A10, top function in groups is used to group while aggregating the record where the minimum value is located (rather than the minimum value itself).

The switch@i function in A6 filters out records that cannot be matched with foreign keys, and converts the matched joined fields into pointers for foreign key table records, so that the fields of foreign key table can be accessed directly in the form of.(dot). SPL executes JOIN operations in a different way than SQL; it can use pre-association to improve operation performance if the data can be loaded into memory beforehand. However, all the examples in this series assume that the data is fetched from external storage. In this question, there is no difference in JOIN performance between SPL and SQL expect for the way of writing. For detailed explanation please refer to the part about JOIN in SPL teaching plan.

In addition, the technique of using filtering conditions in cursor creation mentioned in Q1 is also used in A7. In A8 and A9, this technique is combined with the previous switch@i method (2nd segment of parameters) to implement a foreign key match when the cursor is created. Those that can not be matched are filtered out, their other fields are no longer read and the record is no longer generated. If they can be matched, the joined field is converted into pointer.

Script execution time, Unit: seconds

Number of parallel

1

2

4

8

12

Oracle

56

35

23

16

27

SPL composite table

20

14

8

5

4

 

The data of this question is not large, so the operating system can cache all the data into the memory after several times of execution. The columnar storage here is not the key point with negligible advantage it brings, and it is algorithm optimization that gives more performance advantages.           

It can also be seen from the table that the parallel effect of this grouping aggregation is also favorable.