Performance Optimization - 1.1 [In-memory search] Binary search

 

Performance Optimization - Preface

The table sequence T in memory has a field K, given the search value k, find the record with the value of field K in T as k. Field K is called the to-be-searched key, and the found record is called the target value or target record.

Conventional sequential search will traverse each member of T in turn for comparison. If there are N records in T, the average comparison number is N/2 and the complexity is O(N). If T is just ordered for K (sequence T.(K) or T.(-K) is an increasing sequence), binary search can be used.

Before introducing the algorithm further, let’s explain: most of the algorithms discussed in this book are general for table sequence and record sequence, and even valid for sequence. But sometimes it is relatively convenient to use a certain term to describe it. If you understand the principle, you can extend it to other similar scenarios. When applying these algorithms, it is not necessary to strictly abide by the premises and scenarios described in the book, but to understand the principle and then confirm whether the facing scenario is suitable for the algorithm, and sometimes make a few modifications to the algorithm.

Binary search is a common algorithm. Taking T.(K) in ascending order as an example, the basic logic is as follows:

A B C
1 =1 =T.len()
2 for =(A1+B1)\2 =T(B2)
3 if k==C2.K return C2
4 else if k>C2.K >A1=B2+1
5 else >B1=B2-1
6 if A1>B1 return null

Compare k with the middle member of T, if they are not equal, exclude half of the members according to the comparison result, and continue to perform this action in the subset composed of the remaining half of the members until the target value is found or the remaining subset is empty. Because half of the members can be excluded each time, you can find the target value (or confirm that it cannot be found) by comparing log2N times at most. The complexity is O(logN). When N is large, it will have a very obvious advantage over sequential search. When N is small, because the process of binary search is relatively complex, it may be slower than sequential search. Usually, when N>=8, binary search will have an advantage.

If T.(K) has duplicate values, there may be multiple target values. After finding one target value using binary search, you need to compare more records continuously forward and backward. When the average number of repetitions is M, each search will make an average of M more comparisons, and the total complexity is O(logN+M). Usually, there are only occasional duplicate values, that is, M≪log2N is satisfied, and the complexity can still be considered as O(logN). If there are many duplicate values, M cannot be ignored when analyzing complexity. The complexity of sequential search is always O(N) whether there are duplicate values or not. When there are duplicate values, the complexity of binary search is generally still lower, but in practice, a larger N is needed to show the advantage over sequential search.

SPL provides the binary search option, which can be used directly:

A B
1 =10000.new(string(~,“00000”):id, ~\3:K).keys(id)
2 =A1.find@b(“01234”)
3 =A1.select@b(K:1234) =A1.select@b(K-1234)
4 =A1.pselect@b(K:3210) =A1.pselect@b(K-3210)

find@b()searches according to the primary key, find one and return. select@b() will handle the situation as if there are duplicate values, pselect@b() only returns one by default, but it will find the first one, and duplicate values are also considered.

It should be noted that logical expressions cannot be used in the parameters of select@b()and pselect@b(), but a colon separator or expressions that return a numeric value should be used, because logical expressions can only calculate out true and false values, and cannot determine the search direction of the next step in case of inequality; A numerical expression of 0 indicates that it is found, and <0 and >0 are used to indicate the search direction of the next step.

You can use SPL to compare the performance difference between binary search and sequential search.

It should also be reminded that the use of binary search requires order in advance. Sorting is a very slow action, and you can’t sort first just to temporarily use a binary search, it will only be even slower. Usually, we need to search in a table sequence many times. In this case, we can sort the table sequence first, and then the following multiple searches can get better performance.


Performance Optimization - 1.2 [In-memory search] Sequence number positioning
Performance Optimization - Preface