Performance Optimization - 1.3 [In-memory search] Position index

 

Sometimes we want to find the position of the target value in the table sequence, not the target value itself. If the table sequence is out of order for the to-be-searched key (TBS key), binary search cannot be used to improve the performance. After sorting the data according to the TBS key in advance, binary search can be used, but the position information of the target value in the original table sequence will be lost, so we can’t complete our search task.

For example, a login system records the login time and user ID of a group of users, sorted by login time. Now we want to find out after the user with the specified ID logged in, for how long there is no other user who logs in.

If we can find the position of the login record of the user with the specified ID in the data table, we can easily complete the task as long as we find the next record to calculate the time difference. However, the original data is not sorted by user ID and we can only use sequential search, which will be very slow. If we sort the data by user ID, the position information will be lost.

Using position index can solve this problem:

A B
1 =10000.sort(rand()) =20000.sort(rand()).to(10000)
2 =A1.new(elapse@s(now(),-B1(#)):tm,~:id, … )
3 =A2.psort(id) =A2(A3)
4 =B3.pselect@b(id:1234) =A2.calc(A3(A4),tm[1]-tm)

This code is slightly convoluted and needs to be understood carefully.

When B3 obtains the ordered record sequence by user ID, maintain a position sequence returned by psort()in A3. In this way, after A4 uses binary search to find the position of the target value in B3, use A3 to calculate the position of the target value in the original table sequence A2, and then use positioning calculation to realize the task. The search process (A4 and B4) uses one binary search and two sequence number positioning searches. The overall complexity is equivalent to binary search, which is much faster than the sequential search directly on the original set.