Performance Optimization - 4.8 [Traversal technology] Redundant grouping key


Sometimes, there are some redundant fields in data table for the purpose of convenient processing. For example, there may be both customer number and customer name in the order table, however, the customer name can be determined by customer number, in this case, the customer name belongs to redundant information. This kind of data structure is not uncommon in the wide table on the server of multi-dimension analysis.

If we want to do grouping and aggregating by customers, the customer number is usually used as grouping key because there may be same customer name. Furthermore, if we hope to list the customer name in result set, we need to write customer name in grouping keys. Indeed, this writing way is often used in SQL.

However, this writing way will increase the amount of computation. In the process of grouping, the calculation and comparison of grouping key values account for a significant ratio in amount of computation. With one more grouping key, the calculation of hash value and comparison will be much more complex, resulting in a decrease in computing performance.

SPL allows such redundant grouping keys to be written in the aggregation parameters as aggregation value, as a result, these unnecessary calculations can be avoided.

1 =file(“orders.btx”).cursor@b()
2 =A1.groups(cust_id;cust_name,sum(amount))

If the aggregation parameter does not have an aggregation function, it will be considered as a redundant grouping key to take any one from the current grouped subset as the aggregation value. Here, cust_name will not participate in the calculation of grouping keys, but it can still be calculated correctly in the result set.

The main purpose of designing redundant fields in the data warehouse is to avoid the difficult-to-optimize join operation, so space is sacrificed for time. In the following chapters, we will discuss how to efficiently achieve this type of join in SPL without having to generate a wide table with redundant fields (fewer fields in the traversed table will increase computing performance). The processing of redundant grouping keys is rarely involved in SPL practices, generally, it is only used in the operation for migrating and replicating traditional data warehouse.