# Performance Optimization - 5.5 [Traversal technology] Second-half ordered grouping

We have handled the “first-half ordered” situation, then is there something we can take advantage of in the “second-half ordered” situation?

Still, we take the account transaction table as an example, where the data under each account is sorted by date, now we want to count the total transaction amount under all dates. Since the whole table is not ordered by date, the ordered grouping algorithm cannot be used. Yet, because the number of dates in a few years is not large, it still belongs to a small grouping, which can be easily written with groups():

A
2 =A1.groups(dt;sum(amount))

However, this code does not take advantage of the characteristic that the data in the account is ordered by date, and remains a conventional hash grouping mechanism.

Let’s take this example to explain how to take advantage of “second-half ordered” situation.

Assuming that we use the ordered grouping mechanism to implement the “second-half ordered” situation. In this way, when traversing the records, we only need to compare the grouping key with the current last grouped record to determine whether to generate a new group or accumulate this record. As long as the dt is in order in the same account, errors will not occur, and the calculation and comparison of hash values can also decrease. When traversing to the next account, the dt will start resorting, and the dt of new record may be smaller than that of the current last grouped record. Once this phenomenon is found, we need to compare this dt with the first record of the current grouped records to find an appropriate position where we can either accumulate this record or insert a record, and then traverse the next transaction record. There is no need to compare once again the record that has been compared in grouped record table because it is also in order, and it will be compared with the grouped record table from the head again when the next account is traversed.

As a result, when the accounts are traversed one by one, this grouped record table will be compared again and again from the head to the tail. Every time a record is traversed, the ordered grouping algorithm only compares it once, but many times of comparison may be needed before this algorithm can find an appropriate position. However, as long as the grouped record is compared, it will not be compared again until the next account is traversed.

As a whole, the comparison times will certainly be more than that of ordered grouping, but if most of the data are in order, the occurrence proportion of comparison from the head (when traversing to the next account) will be relatively less (within the same account), and the increased comparison times will not be too many, which may be more advantageous than hash grouping.

Here comes a key problem that it may involve the insertion to grouped record table. The conventional sequential insertion is a very complex action. To maintain order, it needs to move all the following members backward to make room, and the complexity of this operation will offset all the advantages of taking advantages of order. Thus, we need to use the linked list to maintain the grouped record table, in this way, only the property of member before and after the insertion point is changed when inserting, and a large number of members will not be moved. We can’t do binary search after using the linked list, yet the grouped record table is originally compared in order, therefore it has little impact here.

SPL also makes this mechanism an option, which can be used directly:

A
2 =A1.groups@h(dt;sum(amount))

Executing the groups@h() will not use the conventional hash method but the linked list mechanism mentioned above.

Big grouping can also perform this kind of second-half ordered grouping with a similar algorithm. The difference is that when the grouped record table in memory is large enough to a certain extent, it will be written out to the buffer file, and then the process will start again, finally, these buffer files will be merged. Generally, the process is similar to that of sorting big grouping, and it is also self-adaptively to big grouping and small grouping.

A