Performance Optimization - 3.4 [Search in external storage] Row-based storage and index with values

 

Performance Optimization - 3.3 [Search in external storage] Sorting index

As mentioned in the previous chapter, columnar storage is a common means to improve performance. However, for most search tasks, columnar storage will lead to worse performance.

Even if it has been stored in order, the usual columnar storage can be regarded as unable to perform binary search that does not depend on the index. The reason is the same as the difficulty of columnar storage segmentation mentioned before, that is, the synchronization of each column cannot be guaranteed. The columnar storage using double increment segmentation mechanism can perform binary search (that is what the composite table does), but it also needs to read the data from the data blocks of each field separately, and the performance will not be very good.

Index can also be established on the columnar storage data table to avoid traversal, but it is very troublesome. Theoretically, if we want to record the physical positions of each field in the index, the index size will be much larger than the index of row-based storage, and may even be as large as the original data table (because each field has a physical position, the amount of data in the index is the same as the original data, only the data type is simple). Moreover, when reading, it is also necessary to read in the data areas of each field, and the hard disk has a minimum reading unit, which will cause the total reading amount of each column to far exceed that of row-based storage, and the result is that the search performance is very poor.

Columnar storage is more suitable for data traversal. For scenarios that require high-performance search, try not to use columnar storage.

SPL adopts the double increment segmentation mechanism, and can quickly find each field value in columnar storage according to the record sequence number, so what SPL actually stores in the index is not the physical position of each field value, but the sequence number of the whole record. In this way, the size of the index can be much smaller and has little difference from the row-based storage. However, the data blocks of each field still need to be read separately, and the performance still cannot catch up with the index of row-based storage.

By default, the SPL composite table uses columnar storage, but also provides row-based storage mode. You can use option @r to specify when creating:

A
1 =file(“data.ctx”).create@r(ID,…)

For scenarios with high requirements for both traversal and search, we can only trade space for time. The data is stored twice redundantly. The columnar storage is used for traversal and the row-based storage is used for search.

SPL also provides a kind of index with values. When establishing the index, other field values can be copied at the same time. The original composite table can continue to use columnar storage for traversal, while the index itself has stored field values and used row-based storage. Generally, the original composite table is no longer accessed during search, which can obtain better performance, but it will also occupy more storage space.

A
1 =file(“data.ctx”).open()
2 =file(“data.idx”)
3 =A1.index(A2;ID;…)
4 =A1.icursor(…;ID==123456;A2).fetch()
5 =A1.icursor(…;ID>=123456 && ID<=654321;A2)

When creating an index in A2, copy the fields to be referenced in the… part of the parameter, and it can copy these field values in the index. When getting the target value later, as long as the involved fields are in this part, it is not necessary to read the original composite table. You can compare the performance and spatial differences between valued indexes and ordinary indexes.


Performance Optimization - 3.5 [Search in external storage] Index preloading
Performance Optimization - Preface