Performance Optimization - 1.3 [In-memory search] Position index
Performance Optimization - 1.2 [In-memory search] Sequence number positioning
Sometimes we want to find the position of the target value in the table sequence, not the target value itself. If the table sequence is out of order for the to-be-searched key (TBS key), binary search cannot be used to improve the performance. After sorting the data according to the TBS key in advance, binary search can be used, but the position information of the target value in the original table sequence will be lost, so we can’t complete our search task.
For example, a login system records the login time and user ID of a group of users, sorted by login time. Now we want to find out after the user with the specified ID logged in, for how long there is no other user who logs in.
If we can find the position of the login record of the user with the specified ID in the data table, we can easily complete the task as long as we find the next record to calculate the time difference. However, the original data is not sorted by user ID and we can only use sequential search, which will be very slow. If we sort the data by user ID, the position information will be lost.
Using position index can solve this problem:
A | B | |
---|---|---|
1 | =10000.sort(rand()) | =20000.sort(rand()).to(10000) |
2 | =A1.new(elapse@s(now(),-B1(#)):tm,~:id, … ) | |
3 | =A2.psort(id) | =A2(A3) |
4 | =B3.pselect@b(id:1234) | =A2.calc(A3(A4),tm[1]-tm) |
This code is slightly convoluted and needs to be understood carefully.
When B3 obtains the ordered record sequence by user ID, maintain a position sequence returned by psort()in A3. In this way, after A4 uses binary search to find the position of the target value in B3, use A3 to calculate the position of the target value in the original table sequence A2, and then use positioning calculation to realize the task. The search process (A4 and B4) uses one binary search and two sequence number positioning searches. The overall complexity is equivalent to binary search, which is much faster than the sequential search directly on the original set.
Performance Optimization - 1.4 [In-memory search] Hash index
Performance Optimization - Preface
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/2bkGwqTj
Youtube 👉 https://www.youtube.com/@esProc_SPL