"[url] Sometimes the value of the to-be-searched key is just the sequence number (i.e., position) .."

mars RaqForum 25 No.
261 View • 2 Years ago

Performance Optimization - 1.2 [In-memory search] Sequence number positioning

Performance Optimization - 1.1 [In-memory search] Binary search

Sometimes the value of the to-be-searched key is just the sequence number (i.e., position) of the target value in the table sequence, or it is easy to calculate the sequence number of the target value through the search value. At this time, the sequence number positioning method can be used.

	A
1	=10000.new(~:id, …)
2	=A1(1234)
3	=10000.new(string(10000+~-1):id, … )
4	=A3(int(“12345”)-10000+1)
5	=10000.new(date(1999,12,31)+~):dt, …)
6	=A5(now()-date(1999,12,31))
7	=A5.groups(month@y(dt):ym; … )
8	=month&y(now())
9	=A7((A8\100-2000)*12+A8%100)

The id field of A1 itself is the sequence number value, and you can directly use the search value as the position to find the target value. The id field of A3 is also a string converted from continuous numbers, and the sequence number should be reversely calculated when searching. Sequence number positioning is often used for date related search. Dates or months are usually continuous values, which are easy to correspond to sequence number through some operation. The dt field of A5 is an ordered date, and the number of days from the start date is the sequence number of the target value. A7 takes the month and year as the sequence number, and the calculation will be slightly more complicated.

No comparison is required to locate the sequence number. The sequence number is calculated only once, and the complexity of this search is O(1).

In A5 or A7 above, we ensure that each sequence number has a corresponding date when generating data, but there may be a vacant date in the actual business data, if we still use the sequence number positioning, dislocation will happen.

	A
1	=10000.new(date(2000,1,1)+rand(10000)):dt, … )
2	=A1.groups(dt; …)
3	=A2(now()-date(1999,12,31))
4	=A2.align@b(10000,dt-date(1999,12,31))
5	=A4(now()-date(1999,12,31))

The dates generated by A1 cannot guarantee to cover every date of the corresponding period, so A2 may have missing dates, A3 may make an error by using the sequence number positioning, and the returned data may not be today’s data. After using align()in A4, we can ensure that the record of a date must fall in the position of the corresponding sequence number, and A5 can use sequence number positioning to find it. For the missing date, align() will fill in a null in the corresponding position. In this way, if the target value corresponding to the search value does not exist, null will be returned, and whether the target value can be found can be judged. The @b option of align() means that binary search will be used to find the position during alignment, which will make the alignment faster.

Similar to the sorting mentioned earlier, alignment is also a slower action compared with sequence number positioning. You can’t align for a temporary search and then use sequence number positioning. Only if you need to repeatedly search for many times can you align first, so that the subsequent searches can obtain high performance.

After aligning the searched records by sequence number, we can use the efficient sequence number positioning. It seems that it is OK as long as we can find a way to calculate the search value into sequence number.

Of course not. China’s citizen ID number, for example, is made up of a series of numbers, which can be understood as a natural number. It is naturally a serial number. But we cannot locate the ID number by aligning it to the natural number and then use the sequence number positioning.

Sequence number positioning requires a sequence of not less than the largest sequence number to store all the searched records and the null values to fill in the vacancy. If the ID number is converted directly into a natural number, the sequence number range will be so large that the current computer memory cannot bear (10¹⁷, the ID number is 18 digits, the last check digit is not counted). The number of sequence numbers to be filled with null is much more than the number of sequence numbers with records. In this case, the sequence number positioning cannot be directly used.

Performance Optimization - 1.3 [In-memory search] Position index
Performance Optimization - Preface

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

performance optimization(62)

eBook