SPL computing performance test series: TPCH

 

I. Test task

TPCH 100G.

TPCH is an international standard, the details of which will not be explained in this article.

It should be noted that although TPCH includes 22 queries, it still cannot fully reflect the response performance of the SUT (system under test) to actual business, for two main reasons:

1. The queries in TPCH are relatively conventional and do not involve order-related operations, and the stepwise operations are also fairly simple. In practice, however, there exist quires with performance bottleneck, and their complexity is usually much higher than the queries in TPCH, and a large number of order-related operations and stepwise operations are involved;

2. The queries in TPCH have been made public for a long time, and some databases may have been optimized specifically for these queries.

Of course, as an international standard, the test results of TPCH will also be informative.

II. Competitive technologies

We select the following products to make a comparation with SPL.

1. ClickHouse, which is said to be the fastest OLAP database in the world. ClickHouse 23 is to be tested.

2. Starrocks, which is claimed to be a faster OLAP database. Starrocks 2.5.2 is to be tested.

3. Oracle, which is widely used, and often used as a benchmark of database performance test. Oracle 19 is to be tested.

Since Oracle is not a professional OLAP database, and its performance indicators will not be as good as other products, the test results of Oracle are for reference only.

III. Test environment

One physical server with configurations as follows:

2 x Intel3014 CPUs, main frequency 1.7G, 12 cores in total

64G memory

SSD (Solid State Drive)

The size of the largest table in TPCH 100G is only about 70G, and is probably smaller than the physical memory of the machine after simple compression. In order to be able to test the computing ability of these products on external storage as well as their sensitivity to memory, we use virtual machines to limit the number of CPUs and the capacity of memory, and design two test environments based on the relatively common cloud VM specifications in the industry:

VM1: 8 CPUs, 32G memory

VM2: 4 CPUs, 16G memory

For Starrocks, at least two nodes, BE and FE, need to be installed. The BE that undertakes computing task is installed on one VM, while the FE that undertakes management task is installed on the physical machine to avoid affecting the test results.

For SPL, Clickhouse, and Oracle, we only need to install them on VM.

SPL, Clickhouse and Starrocks are tested on both VMs, while Oracle is tested only on VM1 since it is for reference only.

IV. Data preparation

The following conversions are made to the text data generated with TPCH tool:

1. Convert the single-field primary key of dimension tables and their corresponding foreign key of fact tables to sequence number (primary key of several dimension tables in TPCH are originally sequence number);

2. Sort every data table by primary key;

Import the converted data into database, and then

3. Convert the field value of the enumeration field string to sequence number, and convert the date type field to integer,

and finally generate composite tables of SPL.

Specific operation steps can be found at the material for study: Performance Optimization Exercises Using TPC-H.

Usually, a professional OLAP database will automatically execute step 3 above based on the metadata information to optimize data types, yet SPL has no metadata and lacks automatic optimization ability, so we need to pre-process with code. Nevertheless, the SPL code is still shorter than SQL code in most cases even if this additional step is attached.

V. Test process

1. SPL test

Two sets of scripts are written respectively in SPL Community Edition and SPL Enterprise Edition. One script is executed in parallel with 8 threads on VM1 and the other is executed in parallel with 4 threads on VM2.

SPL can use algorithms different from SQL. For more complex queries, SPL code is simpler and the computing complexity is lower, which make SPL more advantageous.

Since TPCH has 22 queries, if all queries were listed and explained one by one, it would take up too much space, so we provide a SPL code package for download and reading: SPL-TPCH.zip.

For more information about scripts, refer to Performance Optimization Exercises Using TPC-H

It should be noted that unlike the optimized code in the link above, the code in this test does not utilize any preloading action and all operations are performed after reading data from files on an ad hoc basis. Professional database with metadata information can preload some small data tables sometimes, and SPL without metadata can also process preloading manually (visit above link for details). However, considering the small memory scenario, this test does not utilize this technique.

2. SQL Test

Execute the SQL statement that comes with TPCH in Starrocks, Clickhouse and Oracle respectively. Since the SQL statement itself cannot describe low-complexity algorithms, it can only resort to the optimization engine of database.

We found through the monitoring tool that when testing Starrocks and Clickhouse on VM1 and VM2 respectively, 8 CPUs on VM1 and 4 CPUs on VM2 are all utilized, and the utilization rate is very high, indicating that both databases have the ability to fully utilize CPU resources. However, when testing Oracle (only on VM1), the utilization rate of 8 CPUs is only about 50%; only when Oracle runs on 4 CPUs, the utilization rate can reach over 90%, indicating that Oracle cannot fully utilize CPU resources.

During the test, if a certain query does not give a result after 10 minutes of running, we will terminate it and record “>600” in the test results.

Clickhouse’s SQL does not support referencing the fields of main query in subqueries, so several standard SQL statements in TPCH cannot be executed directly (reporting syntax errors), and need to be modified before executing them. Here below are the modified SQL statements:

Q2

select  s_acctbal, s_name,  n_name,  p.p_partkey,  p_mfgr,  s_address,  s_phone,  s_comment
from  part p,  partsupp,  supplier,  nation,  region, (
	      select  p_partkey, min(ps_supplycost) as supplycost
	      from  partsupp,  part, supplier,  nation,  region
	      where
	        p_partkey = ps_partkey
	        and s_suppkey = ps_suppkey
	        and s_nationkey = n_nationkey
	        and n_regionkey = r_regionkey
	        and r_name = 'ASIA'
	      group by p_partkey
	  ) pps
Where
	  p.p_partkey = pps.p_partkey
	  and ps_supplycost = pps.supplycost
  and p.p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p.p_size = 25
  and p.p_type like '%COPPER'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'ASIA'
order by  s_acctbal desc, n_name,  s_name,  p.p_partkey
limit 100;

Q4

Select
o_orderpriority,
count(*) as order_count
from (
select
        distinct l_orderkey,o_orderpriority
from orders
join lineitem on
        l_orderkey = o_orderkey
where
        o_orderdate >= date '1995-10-01'
        and o_orderdate < date '1995-10-01' + interval '3' month
        and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;

Q20

select s_name, s_address
from supplier, nation
where
  s_suppkey in (
select ps_suppkey
    from partsupp, (
        select l_partkey lpk, l_suppkey lsk, 0.5 * sum(l_quantity) as lq
        from lineitem
        where
           l_shipdate >= date '1995-01-01'
           and l_shipdate < date '1995-01-01' + interval '1' year
       group by l_partkey, l_suppkey
) li
Where
      ps_partkey = li.lpk
      and ps_suppkey = li.lsk
      and ps_partkey in (
        select p_partkey from part
        where p_name like 'bisque%'
      )
      and ps_availqty > li.lq
  )
  and s_nationkey = n_nationkey
  and n_name = 'CHINA'
order by s_name
limit 100;

Q21

Due to the difficulty in modifying this SQL statement, we give up this query.

Q22

Select
  cntrycode,
  count(*) as numcust,
  sum(c_acctbal) as totacctbal
from
  (
Select
      substr(c_phone, 1, 2) as cntrycode,
      c_acctbal
from
      customer
where
      substr(c_phone, 1, 2) in
        ('11', '14', '15', '19', '20', '21', '23')
      and c_acctbal > (
        select
          avg(c_acctbal)
        from
         customer
        where
          c_acctbal > 0.00
          and substr(c_phone, 1, 2) in
            ('11', '14', '15', '19', '20', '21', '23')
      )
      and c_custkey not in (
        select distinct o_custkey from orders
      )
  ) as custsale
group by
  cntrycode
order by
  cntrycode;

VI. Test results

VM1 (Unit: seconds)

TPCH Number

SPL Community Edition

SPL Enterprise Edition

Starrocks

Clickhouse

Oracle

Snowflake

1

29.9

9.7

14.0

15.4

114.3

7.3

2

2.7

1.3

0.6

17.3

1.9

3.0

3

17.8

8.8

9.8

Out-of-memory (OOM)

165.8

3.3

4

7.4

4.9

5.7

OOM

158.4

3.0

5

35.3

8.9

13.1

OOM

174.5

4.0

6

8.6

4.5

3.9

4.8

126.7

0.5

7

24.1

10.5

12.4

OOM

181.5

3.8

8

45.3

6.9

8.3

OOM

209.7

4.2

9

66.3

16.8

21.3

OOM

256.0

14.7

10

16.4

8.3

11.1

58.3

195.6

8.2

11

3.3

0.9

1.3

6.7

8.7

1.7

12

10.3

4.9

4.8

10.7

186.0

1.7

13

53.9

12.1

21.3

134.1

33.3

7.7

14

12.8

3.3

4.6

10.2

170.0

1.3

15

14.3

4.7

7.1

11.2

161.8

0.8

16

10.2

2.7

2.9

4.0

10.8

2.6

17

24.7

5.3

4.2

44.6

156.5

2.8

18

18.6

6.4

20.8

OOM

416.8

12.0

19

10.0

5.8

6.0

>600

144.1

2.2

20

14.3

5.2

5.2

31.2

171.0

1.5

21

25.5

11.9

14.5

Syntax error

360.7

9.0

22

11.0

2.5

1.9

8.4

37.7

1.7

Total

462.7

146.3

194.8

-

3441.8

96.5

Here we add a column of Snowflake's own test results (sourced from the statistical graph of a SIGMOD paper). The test values in this column may be inaccurate and are for reference. The test of Snowflake is performed on Snowflake Medium cluster that consists of four machines (8 cores, 16 threads) equipped with more advanced CPUs, and its hardware resources are equivalent to 4-8 times those used in this test.

VM2 (Unit: seconds)

TPCH Number

SPL Community Edition

SPL Enterprise Edition

Starrocks

Clickhouse

1

61.8

21.5

26.0

29.1

2

3.7

2.0

0.9

31.1

3

38.5

20.8

17.5

OOM

4

14.3

12.1

10.8

OOM

5

72.1

21.3

25.3

OOM

6

19.8

7.5

7.9

8.9

7

51.4

24.1

22.5

OOM

8

77.8

18.5

14.3

OOM

9

100.8

39.0

43.0

OOM

10

33.5

22.4

20.7

OOM

11

4.9

1.2

1.9

12.0

12

21.3

8.8

12.0

19.3

13

77.0

22.8

44.5

OOM

14

28.3

5.4

9.2

16.6

15

25.7

10.3

13.4

20.0

16

13.6

4.4

5.8

7.0

17

42.2

14.0

8.0

68.6

18

32.5

14.8

OOM

OOM

19

24.2

14.2

11.8

>600

20

30.2

15.5

9.6

47.0

21

52.2

20.9

28.8

Syntax error

22

21.8

3.5

3.5

11.7

Total

847.6

325.0

337.8

-

Note: one of the queries for testing Starrocks fails, and the total value doesn’t include this query.

VII. Comments on test results

1. Overall, SPL Enterprise Edition performs the best, followed by Starrocks, and then SPL Community Edition. Oracle's performance is far behind these professional computing products.

2. Clickhouse performs very poorly. Except the fact that Clickhouse does not support certain SQL statements mentioned earlier, it is unable to work out many queries, even its computing speed on some queries is slower than Oracle. Even if it can work out some queries, most of which are executed at a speed slower than SPL Enterprise Edition and Starrocks. When we test Clickhouse on VM2, more queries cannot be worked out. Therefore, the claim that Clickhouse is the fastest OLAP database in the world seems to fall significantly short of the reality. In short, there are few application scenarios for Clickhouse, and its speed is not fast even in the scenarios where Clickhouse can be used.

3. Starrocks performs well in SQL databases. Starrocks claims that it has the fastest SQL engine due to the adoption of CPU vector computing technology (SPL is not a SQL engine), which is confirmed at least from this test. However, OOM occurs when the capacity of memory is small, indicating that Starrocks has high requirements on hardware resources.

4. Both SPL Community and Enterprise editions can work out all queries under the environment with small memory, indicating that SPL has low requirements on hardware resources.

5. SPL Community Edition utilizes Java objects to store field value, which can achieve flexible generics effect, but it will take up a lot of memory space and needs to make a lot of judgments during the calculation. After adopting the low-complexity algorithms, the performance is much better than Oracle, yet there is still a gap with Starrocks. This shows that relying on engineering optimization alone can greatly improve computing performance.

6. SPL Enterprise Edition also implements vector-based calculation (generics abandoned), which is similar to Starrocks in this regard. Vector-based calculation together with low-complexity algorithms make its overall performance the best. However, because the features of CPU cannot be utilized directly when implementing in Java, and only the problems of too large objects and excessive judgments in SPL Community Edition are solved, so not every query can surpass Starrocks.

7. Conclusions

SPL and Starrocks are all outstanding computing engine. For less complex tasks like TPCH, SPL has a slight advantage.

Clickhouse is very poor in computing performance, and cannot be comparable to SPL and Starrocks.

Oracle is stable, but its computing performance is far from that of professional OLAP computing engine, and hence it is not suited to undertaking a big data computing task.