Performance Optimization - 8.2 [Multi-dimensional analysis] Time period pre-aggregation

 

Performance Optimization - 8.1 [Multi-dimensional analysis] Partial pre-aggregation

For the statistics on the time period, the pre-aggregation will work after taking some techniques.

If the data in the original data table are stored day by day, then we can pre-aggregate the data by month. When the statistics on a time period is needed, we can read the data of the whole month spanned by the time period from the pre-aggregated data and do the re-aggregation, and then read the data of the dates at both ends of the time period that do not constitute a whole month from the original data table, and do aggregation once again, at this time, we can obtain the query target. In this way, the amount of calculation of statistics on long time period can be reduced by ten times or even more.

For instance, we want to query a certain statistical value in the interval from January 22 to September 8, and the pre-aggregated data has been prepared by month in advance. We can first calculate the aggregate value from February to August based on the pre-aggregated data, and then use the original data table to calculate the aggregate values from January 22 to January 31 and September 1 to September 8. In this process, the amount of calculation involved is 7 (Feb. – Aug.) + 10 (Jan. 22 - 31) + 8 (Sep. 1 - 8) = 25. If the aggregation is performed completely based on the original data table, the amount of calculation will be 223 (the number of days from Jan. 22 to Sep. 8). As a result, the calculation amount is almost reduced by 10 times.

The original data table mentioned here can also be a certain fine-grained pre-aggregated data.

SPL has already implemented this method by adding the conditional parameter at the cgroups() function:

A
1 =file(“orders.ctx”).open()
2 =A1.cuboid(file(“day.cube”),dt,area;sum(amount))
2 =A1.cuboid(file(“month.cube”),month@y(dt),area;sum(amount))
3 =A1.cgroups(area;sum(amount);dt>=date(2020,1,22)&&dt<=date(2020,9,8); file(“day.cube”),file(“month.cube”))

If it is found that there are time period condition and higher-level pre-aggregated data, SPL will use this method to reduce the amount of calculation. In this example, SPL will read the corresponding data from the pre-aggregated files month.cube and day.cube respectively before aggregation.

The time period pre-aggregation technology is essentially to solve the slicing (dicing) problem.


Performance Optimization - 8.3 [Multi-dimensional analysis] Redundant sorting
Performance Optimization - Preface