Performance optimization case course: TPCH-Q22

 

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from (
    select
        substr(c_phone,1,2) as cntrycode,
        c_acctbal
    from
        customer
    where
        substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
        and c_acctbal > (
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
        )
        and not exists (
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
    ) custsale
group by
    cntrycode
order by
    cntrycode;

The main query’s outer layer is a regular grouping and aggregation operation, and its inner layer contains two conditional subqueries.

The two subqueries, where the second one is preceded by not exists, correspond to the customer record sets that meet their respective condition. The final result is the difference set of the two sets.

1. Data storage

There is no special requirement for data tables, store them in order by primary key.

We can directly use orders.ctx and customer.ctx from Q3. Copy them to the main directory of this query.

2. General method

Calculation code:


A

1

=now()

2

=["11","14","15","19","20","21","23"]

3

=file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch()

4

=A3.avg(C_ACCTBAL)

5

=A3.select@m(C_ACCTBAL>A4).derive@o().keys@im(C_CUSTKEY)

6

=file("orders.ctx").open().cursor@m(O_CUSTKEY;O_CUSTKEY:A5)

7

=A6.run(O_CUSTKEY.C_CUSTKEY=null).skip()

8

=A5.select@m(C_CUSTKEY)

9

=A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal)

10

=interval@ms(A1,now())

A5 is equivalent to selecting the records meeting condition specified in the first subquery. A6 associates with A5 and selects the records for the second subquery. A7 sets C_CUSTKEY values of these records selected in A6 as null, which means deleting them from A5 and is equivalent to achieving the not exists condition in the second subquery. A8 selects records satisfying the two conditional subqueries.

Test result:

Test items

Execution time (seconds)

General method

6

3. Data conversion

For this query, we need to use the dimension table primary key sequence-numberization method mentioned in previous articles. Copy orders_5.ctx and customer_5.ctx in Q3 to the main directory of this query.

Calculation code:


A

B

1

=now()

2

=["11","14","15","19","20","21","23"]

3

=file("customer_5.ctx").open()

=A3.cursor@m().skip().(null)

4

=A3.cursor@m(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~)

5

=B3.avg(C_ACCTBAL)

6

=B3.@m(if(C_ACCTBAL>A5,~,null))

7

=file("orders_5.ctx").open().cursor@m(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip()

8

=A6.select@m(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal)

9

=interval@ms(A1,now())

B3 and A6 are aligned sequence, and their values are null or records of customer table.

Test result:

Test items

Execution time (seconds)

General method

6

Data conversion

2

4. Column-wise computing

Calculation code:


A

B

1

=now()

2

=["11","14","15","19","20","21","23"]

3

=file("customer_5.ctx").open()

=A3.cursor@m().skip().(null)

4

=A3.cursor@mv(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~)

5

=B3.avg(C_ACCTBAL)

6

=B3.@m(if(C_ACCTBAL>A5,~,null))

7

=file("orders_5.ctx").open().cursor@mv(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip()

8

=A6.select(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal)

9

=interval@ms(A1,now())

Test result:

Test items

Execution time (seconds)

General method

6

Data conversion

2

Column-wise computing

1