Performance Optimization - 5.4 [Traversal technology] First-half ordered grouping

 

Performance Optimization - 5.3 [Traversal technology] Program cursor

For the transaction table example taken above, we now want to change its order in a way that the transaction records of account are changed to be sorted by the day of the week of the occurrence date, that is, put together all the transactions occurred on Sunday, put together all the transactions occurred on Monday, Tuesday… Essentially, it changes the sorting method of original table, yet it belongs to big sorting operation. The big sorting algorithm will generate the buffer file, and the performance will not be good.

However, we found that the original data table has been ordered by account, it can be considered as a “half ordered” table, moreover, the amount of unordered data in each (ordered) account is not large. In this case, we can use the program cursor mentioned above to perform the big sorting.

A B C
1 func =file(“trades.btx”).cursor@b()
2 for B1;id return B2.sort(day@w(dt))
3 return cursor@c(A1)

Fetching the data of each account followed by sorting the “second-half”, the program cursor can collect the results and return the desired big sorting result.

Of course, this method can also be used for grouping. For example, if we want to calculate the total transaction amount of each account on the day of each week, we just need to change the above A3 to:

return cursor@c(A1).group(id,day@w(dt);sum(amount))

It can also be handled in the subprogram, in this way, the amount of returned data will be less:

A B C
1 func =file(“trades.btx”).cursor@b()
2 for B1;id return B2.groups(id,day@w(dt);sum(amount))
3 return cursor@c(A1)

This situation is not uncommon in reality. SPL directly provides an option at the group() function:

A
1 =file(“trades.btx”).cursor@b()
2 =A1.group@q(id;day@w(dt);sum(amount))

Note that group@q()has three groups of parameters. The first represents the ordered grouping keys, the second represents the unordered grouping keys, and the third is aggregating expressions. By means of the three group of parameters together with the @q option, SPL will know that it should firstly use the ordered cursor to fetch the data with same id, and then group by day@w(dt).

This code can also be used directly to sort.

A
1 =file(“trades.btx”).cursor@b()
2 =A1.group@qs(id;day@w(dt))

If the @s option is added, it means that only sorting is performed without grouping, that is, sort the data with the same id by day@w(dt).


Performance Optimization - 5.5 [Traversal technology] Second-half ordered grouping
Performance Optimization - Preface