Is ClickHouse as Powerful as We Thought?

 

The open-source analytical database ClickHouse (CH) is hot recently and is said to be exceptionally fast for performing OLAP analyses. Many users suffering from performance problems are eager to have a try.

Does the DBMS really meet our expectations? The following performance test will tell us.

 

Our subjects for comparisons are CH, Oracle (Ora in the following sections), and the open-source esProc SPL. The benchmark is the internationally acknowledged TPC-H. The test is based on 8 tables, tasked with achieving computations defined by 22 SQL statements (Q1-Q22), and compares times spent in finishing the computations with CH, Ora, and SPL under the same hardware and software environments.

 

The test shows that:

 CH has great performance when performing traversal on a single table, better than Ora and SPL. Take TPC-H’s Q1 as an example, and the comparison result is as follows:

undefined 

                 Pic1:  Single-table traversal-based query performance comparison (Unit: second)

Q1 is a simple grouping & aggregation computation based on single table traversal. CH is the fastest, SPL ranks second, and Ora is the slowest for performing the computation.

The comparison result shows that CH’s column-oriented storage design is remarkable. SPL is not as fast as CH in performing the single table traversal though it also uses the column-oriented storage scheme. There are three reasons behind it. One is that SPL’s storage compression is not as good as CH’s. Another is that the Java-based SPL has weaker computational capability than CH written in C++. Lastly, SPL’s high-performance algorithms cannot be given full play in such a simple computing scenario. Ora becomes so slower because of its use of row-oriented storage, which is a heavy drag.

 

 Yet CH isn’t that good when handling more complicated scenarios. Q2, Q3, and Q7, for instance, involve complex queries, including multi-table joins and subqueries. Their performance comparisons are shown below:

 

undefined 

               Pic2: Complex multi-table join query performance comparison (Unit: second)

SPL is the fastest in handling these queries. CH and Ora are shown neck and neck when handling Q2, which involves a small amount of data and makes the advantage of columnar storage not obvious. CH is faster than Ora thanks to its columnar storage design in handling Q3, which involves a large amount of data. In handling Q7 that involves a large volume of data also as well as complex computations, Ora is better than CH.

The performance of CH dives as the computing logic increases. Even the advantageous columnar storage cannot improve the situation. The DBMS is overtaken by the row-oriented Ora. This shows that CH is very bad at optimizing algorithms, far poorer than Ora. SPL, in that case, shows excellent optimization techniques with algorithms and the advantage of columnar storage scheme. The result is that the Java-based SPL runs much faster than CH and Ora, both written in C++.

 

Q8 involves a highly complicated computation, where the subquery contains a multi-table join. No result is shown after CH runs of 2000 seconds, and it is probably that the application is stuck. It takes Ora 192 seconds and SPL 37 seconds to complete the computation. SPL is the winner. Like operator is added to Q8’s subquery to make the query of Q9. CH reports out-of-memory error directly, and Ora runs 234 seconds. SPL takes the lead with only 68 seconds of run time. As there are some complicated computations that CH cannot manage at all, an overall comparison becomes impossible.

 

So, CH is not as powerful as the legend says. It is quite good at handling single table queries, but at the same time, rather bad at dealing with complex queries. Though the system has excellent columnar storage, it exhibits very poor optimization algorithmic technique.

In short, you might expect too much if you want to improve performance, particularly try to solve complex query issues and even stored procedure performance problems using CH.

SPL is slower than CH in performing single table traversal directly, but it outstrips the latter in terms of performance in handling complex computations. Moreover, SPL can implement them using simple and concise code thanks to its rich data types and encapsulation of plenty of high-performance algorithms.

SPL is the best and wise choice in order to enhance the performance of complex computations (including stored procedures).

 

Appendix: Test environment

CPU: 2 Intel3014 processors of 1.7G MHz, with each CPU having 6 cores.

SSD: 1T with 561MB/s read and 523MB/s write.

Interface: SATA 6.0Gb/s.

RAM: 64G.

OS: Linux CentOS 7.

The size of TPC-H data set is 100G, and all tests use the 12-threads parallel processing.