Performance Optimization - 8.6 [Multi-dimensional analysis] In-memory flag change

 

Performance Optimization - 8.5 [Multi-dimensional analysis] Flag bit dimension

The flag data may change over time, for example, the customers may be re-flagged once a month. In this case, if we want to query the flag situation of a certain month in the past, we need to save the flag information of each time point. However, this action may cause very large amount of data in multidimensional analysis, and will occupy a lot of storage space. Although it has little impact on external storage, it is a problem when we want to perform high speed in-memory queries as the insufficient memory space may not be able to hold the result data of all time points.

If the flag changes a lot each time, there is nothing we can do. However, the flag changes usually little, much smaller than the total amount of data at a certain time point. In this case, we can save the initial state and the information of change each time, then quickly calculate the data at a certain time point.

Since there are only two values for flag data, the new flag value can be calculated correctly as long as the changed flag name is saved (whether it changes from true to false, or from false to true). By using the bit dimension, we can calculate the changed bit, and then just perform a simple XOR operation with the original value.

A B
1 =file(“T_new.ctx”).open().import()
2 =file(“T_change.btx”).import()
3 for join@1m(A1,id;A2,id) =A3.#2.tags.(~-1).group(~\16)
4 =B3.(~.sum(shift(1,-(~%16))))
5 =B3.pselect@a(~!=0)
6 =B5.((“bits”/~/“=xor(bits”/~“,”/B4(~)/“)”)
7 >A3.#1.run(${B6.concat@c()})

Both the original data table and the changed information take id field as the primary key and are stored orderly. In A3, join@m() means that the ordered merge algorithm is used to perform in-memory association. After the changed information is aligned with original data, the XOR calculation is performed in the loop body. The operation of B3:B6 is similar to that in the previous section, which is to calculate the flag bit dimension that needs to be changed and its calculation expression. Due to little information change, converting the information to bits for storage may occupy more space, so it is better to directly store the changed flag sequence number set.


Performance Optimization - 9.1 [Cluster] Computation and data distribution
Performance Optimization - Preface