Performance Optimization - 9.2 [Cluster] Multi-zone composite table of cluster

 

Performance Optimization - 9.1 [Cluster] Computation and data distribution

For conventional operations on data tables, coding with fork is a bit troublesome. SPL also provides cluster table and cluster cursor to simplify the code, but the situation is a little more complicated than that of a single machine.

Let’s review the concept of multi-zone composite table appeared in Chapter 2. For the ease of deleting old data, a multi-zone composite table can be composed of multiple physical files, i.e., zones, and each physical file has a number, i.e., zone number.

The zones of a multi-zone composite table can also be distributed on the nodes of a cluster. Let’s start with simple situation: the number of zones of a multi-zone composite table is the same as that of nodes in a cluster, and each node has one zone.

A
1 [“192.168.0.101:8281”,“192.168.0.102:8281”,…, “192.168.0.104:8281”]
2 =file@w(“orders.ctx”:to(4),A1)
3 =A2.create(…;(day(dt)-1)%4+1)
4 =A3.append(…)

Use the file@w() to create a writable cluster file and make the four zones correspond to the nodes one by one, that is, the ith zone file will be generated on the ith node. Then, create a cluster composite table and append data. Note that one zone expression is needed for the multi-zone composite table (the same as for a single machine).

When calculating, except for the difference in creating files, other syntaxes are basically the same as those in single-machine mode:

A
1 [“192.168.0.101:8281”,“192.168.0.102:8281”,…, “192.168.0.104:8281”]
2 =file(“orders.ctx”:[1,2,3,4],A1)
3 =A2.open()
4 =A3.cursor@m(area,amount;dt>=arg1 && dt< arg2;4)
5 =A4.groups(area;sum(amount))

When creating a cluster file in A2, the zone number needs to be written, as the multi-zone composite table allows not using all zones. SPL will search for the corresponding zone on each node according to certain rules. The composite table and the cursor created based on the cluster file are called cluster table and **cluster cursor **respectively, and then related operations can be performed.

Let’s review the multi-machine parallel computing framework described in the previous section, that is, the node calculates separately and then sends the calculation result to the master computer for aggregation, with no data exchange between the nodes during the calculation process. When this framework is performed based on the zone mechanism of cluster table, each node only needs to process the data of its own zone, without relying on other nodes, which is almost the same as single machine operation. As a result, many operations can be simply transferred from a single machine to a cluster. Moreover, with the cluster table syntax, the code writing is also very similar to that of a single machine operation.

Here we simply explain the working principles of these common operations without providing detailed examples.

The search without index can directly use this framework and syntax.

There are two methods to handle index. The simple method is to create an index for each node separately, in which case this framework can still be used, and every node is independent when searching and will return calculation results respectively to the master computer for aggregation. However, this method needs to access all nodes every time a search is performed, which consumes more resources. The complex method is to create a multi-zone index and sort it by zone, in which case the zone of the index can be immediately located based on the search value and the location of target value can be found. For accurate search (returned result set has only one member), only two nodes are involved (the zone where the index is located and the zone where the target value is located), which will consume less resources. In short, for a single task, there is almost no performance difference between the two methods, but for scenarios with high concurrency and requiring extreme performance, the latter has more advantages.

This framework and syntax can also be used for filtering and conventional small grouping.

When performing ordered grouping (and other types of ordered traversal), attention should be given that the records with the same grouping key value cannot be assigned to different zones. This requirement is usually easy to satisfy.

Under this framework, using sorting algorithm for big grouping (big sorting) will be simpler. After grouping (sorting) the data on every node separately, being ordered by the grouping keys (sorting expression) should be kept when aggregating them to the master computer, and then performing final merge. The hash method for big grouping can also be used on the node, but the results still need to be transmitted to the master computer for merging, and a certain order is still required. The final merge can also be implemented by sorting by the hash value (sorting by the grouping key when the hash values are the same), but it is relatively troublesome. The disadvantage of this framework is that it will put the burden of final merge calculation on the master computer, causing master computer’s computing power and network capacity to become a bottleneck.

For the above-mentioned single-table operations, it is relatively easy to implement distributed computing. Except for transmitting the calculation results of nodes to the master computer at the end, the nodes are independent from each other during most of the operation time, and there is no data transmission between them. Expanding the cluster size will not increase the network burden and can effectively share the computing workload.

Usually, distributed databases also adopt this method in the case of performing single-table operation on a cluster, which will not cause excessive network transmission between nodes. However, some databases adopt the hash method for big grouping. In this case, the data will be transmitted between nodes (the records with the same hash value will be transmitted to the same node for local grouping). The advantage of this method is that the nodes will share the aggregate calculation workload (equivalent to the final merge action in the above framework). However, a large amount of network transmission will limit the cluster size. When it reaches a certain limit, adding more nodes will not improve the computing performance.

As for join operations, it needs to be discussed according to different situations. Homo-dimension association and primary-sub association are relatively simple. As long as the data is properly distributed, and the data with the same primary key in the associated table is assigned to the same zone number, it can ensure that the associated data are all in the same zone (i.e., in the same node), and there is no need for data transmission between nodes. The data distribution required here is only related to the primary key, which is similar to requiring the primary key to be ordered, and can be easily processed and achieved during data sorting and appending. Once the data is properly distributed, the operation code is still the same as that of a single machine.

Since the dimension table in foreign key association needs to be accessed randomly, the situation is more complicated. We will discuss it in two sections below.

The database does not distinguish between various situations of join operation. A common way is to extend the hash method from a single machine to the cluster. Specifically, each node transmits local data to all nodes according to the hash value (to form the buffer file) to ensure that the associated data is on the same node, and then performs single-machine join operation on each node. The data transmission process of this algorithm will incur a large amount of network traffics, and will also cause the phenomenon that the cluster size is limited. As a result, the performance improvement brought by sharing the computing workload among multiple machines will be completely offset by the performance decrease caused by transmission. In this case, adding more nodes will not improve performance. Join operation has always been a challenging problem for distributed databases.


Performance Optimization - 9.3 [Cluster] Duplicate dimension table
Performance Optimization - Preface