Performance Optimization - 3.5 [Search in external storage] Index preloading

 

Performance Optimization - 3.4 [Search in external storage] Row-based storage and index with values

We know that the index of big data is often very large, and multi-level indexes need to be established. For each search, we need to read in them level by level before we can finally locate the target value. Because of the high complexity of external storage access, even if the operating system cache can avoid the actual hard disk action, there will still be more in-memory moves and object generations. The time to read and sort out these index segments becomes the main cost of search by index.

Of course, if it’s just one search, it takes very little time. On modern computers, it can be completed at the millisecond level, which is generally not felt. However, if you have to search hundreds or thousands of times, you will feel an obvious delay.

When processing index search, the database will generally automatically cache the index segments that have been read. If they are used again in a short time, they will not be read again, so as to effectively reduce the search delay. If the new search values always involve index segments that have not been accessed before, it will appear a little slow. This phenomenon is easy to occur when the system just starts.

SPL also provides this mechanism of automatically caching index segments, and also provides a method of actively preloading some index segments. When the system starts, they will be loaded actively, and the subsequent search will be faster without waiting until these index segments have been accessed.

A
1 =file(“data.ctx”).open()
2 =file(“data.idx”)
3 =A1.index@3(A2)
4 =A1.icursor(…;ID==123456).fetch()

@3 means that the first three levels of index segments are preloaded into memory.

The index of SPL composite table has four levels, each level is 1K key values, and the four levels can support up to 1T records. The first three-level indexes may occupy about several gigabytes of space. After loading the first three-level indexes, the index segments do not need to be read again for composite tables with no more than 1G records, and the index segment will only be read once more for composite tables with larger data volume.

If there is not enough memory, you can also use @2 to load only two levels.

The preloaded index segments will exist throughout the process and will be automatically used by other threads of the same process.


Performance Optimization - 3.6 [Search in external storage] Batch search
Performance Optimization - Preface