Performance Optimization - 8.6 [Multi-dimensional analysis] In-memory flag change

 

Performance Optimization - 8.5 [Multi-dimensional analysis] Flag bit dimension

Flag data may change over time. For example, the customers may be flagged once a month. In this case, if we want to query the flag information of a certain month in the past, we need to save the flag information of each time point. However, the amount of data in multidimensional analysis may be very large, doing so will take up a lot of storage space. It is not a problem when computing in external storage, but it will be a problem if we want to execute in-memory queries at high speed, because the limited memory space may not be able to hold the result data of all time points.

If the flag data changes a lot each time, there is nothing we can do. However, the flag data often changes little, much less than the total amount of data at a certain time point. In this case, we can save the initial state and the information of changes each time, and then quickly calculate the data at a certain time point.

Since the flag data has only two values, the new flag value can be correctly calculated by simply saving the changed flag name (whether it changes from true to false, or from false to true). After using the bit dimension, we can calculate the changed bit and then just perform a simple XOR operation with the original value.

A B
1 =file(“T_new.ctx”).open().import()
2 =file(“T_change.btx”).import()
3 for join@1m(A1,id;A2,id) =A3.#2.tags.(~-1).group(~\16)
4 =B3.(~.sum(shift(1,-(~%16))))
5 =B3.pselect@a(~!=0)
6 =B5.(“bits”/~/“=xor(bits”/~/“,”/B4(~)/“)”)
7 >A3.#1.run(${B6.concat@c()})

Both the original data table and the changed information take id field as the primary key and are stored in order. In A3, join@m() represents using the ordered merge algorithm to perform in-memory association. After aligning the changed information with the original data, the XOR calculation is performed in the loop body. The operation of B3:B6 is similar to that in the previous section, which is to calculate the flag bit dimension that needs to be changed and its calculation expression. Due to little change, converting the information to bits for storage may occupy more space, so it is better to directly store sequence number set of the changed flags.


Performance Optimization - 9.1 [Cluster] Computation and data distribution
Performance Optimization - Preface