Performance Optimization Exercises Using TPC-H

 

1 Preparing data

We use TPC-H tool to generate the original data of text file format. There are altogether 8 tables, with a total size of 100G (See table sizes below).

Then we generate SPL binary files for these text files. Each data table corresponds to a binary file.

Acomposite table has index information that always requires at least a minimum space utilization, so it is not suitable for storing small data tables. That’s why we store small data tables in row-wise storage bin files, which helps decrease the size of data stored and enables loading all data at one time. A large data table, on the other hand, is stored in a column-wise storage composite table that uses primary key as the dimension. When dumping data to the composite table, use sortx() function to sort the data table.

1.1 region/nation

Generate a bin file (Take region table as an example):


A

1

=file("region.tbl").import(; , "|").new(_1:R_REGIONKEY, _2:R_NAME, _3:R_COMMENT).sort(R_REGIONKEY)

2

=file("region.btx").export@b(A1)

There is no need to explicitly create data structure for a bin file, we just write data to it directly.

1.2 customer/supplier/part/orders

Generate a composite table (Take customer table as an example):


A

1

=file("customer.tbl").cursor(; , "|").new(_1:C_CUSTKEY, _2:C_NAME, _3:C_ADDRESS, _4:C_NATIONKEY, _5:C_PHONE, _6:C_ACCTBAL, _7:C_MKTSEGMENT, _8:C_COMMENT).sortx(C_CUSTKEY;15000000)

2

=file("customer.ctx").create(#C_CUSTKEY, C_NAME, C_ADDRESS, C_NATIONKEY, C_PHONE, C_ACCTBAL, C_MKTSEGMENT, C_COMMENT)

3

>A2.append(A1)

Creating a composite table requires listing data structure, specifying primary key field(s) (by preceding them with #), and sorting data by primary key. The second parameter in sortx() function is determined by memory size, and can be omitted without impact.

1.3 partsupp/lineitem

partsupp and lineitem are sub tables of part and orders respectively. To ensure synced segmentation between sub table and primary table, use @p option at composite table creation. Take lineitem table as an example:


A

1

=file("lineitem.tbl").cursor(; , "|").new(_1:L_ORDERKEY, _4:L_LINENUMBER, _2:L_PARTKEY, _3:L_SUPPKEY, _5:L_QUANTITY, _6:L_EXTENDEDPRICE,_7:L_DISCOUNT, _8:L_TAX, _9:L_RETURNFLAG, _10:L_LINESTATUS,_11:L_SHIPDATE, _12:L_COMMITDATE, _13:L_RECEIPTDATE,_14:L_SHIPINSTRUCT, _15:L_SHIPMODE, _16:L_COMMENT).sortx(L_ORDERKEY,L_LINENUMBER;4000000)

2

=file("lineitem.ctx").create@p(#L_ORDERKEY,#L_LINENUMBER,L_PARTKEY, L_SUPPKEY, L_QUANTITY, L_EXTENDEDPRICE,L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE, L_COMMITDATE, L_RECEIPTDATE,L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT)

3

>A2.append(A1)

A2 uses @p option to ensure that records having same L_ORDERKEY will be put into same segment, so that we can get result of parallel computation correctly.

2 Data table size

Below lists sizes of the original files and SPL files:

Table

Row count

tbl size

ctx/btxsize

region

5

1K

8K

nation

25

2K

10K

customer

15M

2.4G

1.3G

part

20M

2.4G

1.0G

supplier

1M

140M

0.1G

partsupp

80M

12.2G

4.9G

orders

150M

17.8G

7.2G

lineitem

600M

79.5G

29.4G

3 TPC-H exercises explained

Performance Optimization Exercises Using TPC-H – Q1

Performance Optimization Exercises Using TPC-H – Q2

Performance Optimization Exercises Using TPC-H – Q3

Performance Optimization Exercises Using TPC-H – Q4

Performance Optimization Exercises Using TPC-H – Q5

Performance Optimization Exercises Using TPC-H – Q6

Performance Optimization Exercises Using TPC-H – Q7

Performance Optimization Exercises Using TPC-H – Q8

Performance Optimization Exercises Using TPC-H – Q9

Performance Optimization Exercises Using TPC-H – Q10

Performance Optimization Exercises Using TPC-H – Q11

Performance Optimization Exercises Using TPC-H – Q12

Performance Optimization Exercises Using TPC-H – Q13

Performance Optimization Exercises Using TPC-H – Q14

Performance Optimization Exercises Using TPC-H – Q15

Performance Optimization Exercises Using TPC-H – Q16

Performance Optimization Exercises Using TPC-H – Q17

Performance Optimization Exercises Using TPC-H – Q18

Performance Optimization Exercises Using TPC-H – Q19

Performance Optimization Exercises Using TPC-H – Q20

Performance Optimization Exercises Using TPC-H – Q21

Performance Optimization Exercises Using TPC-H – Q22