Performance Optimization - 2.7 [Dataset in external storage] Data update

 

Performance Optimization - 2.6 [Dataset in external storage] Multi-zone composite table

We previously only discussed how to append data in the external storage data table, not how to modify it.

We have been trying to store data as compact and continuous as possible in order to reduce the storage capacity of the hard disk and reduce the reading time. After compact storage, the data cannot be updated directly, and the modified data may not be able to fill the empty space of the original data (because there are various compression codes, the same type of data does not necessarily occupy the same number of bytes). If it can’t be filled in, it can’t be stored in the original location. If it can’t fill in enough, it will also cause empty space. Either case will cause the data to be no longer continuous and compact, especially for insertion and deletion. Columnar storage will exacerbate this problem, which is faced by data blocks of all fields.

The OLTP database, in order to adapt to the frequent modification requirements, usually adopts the uncompressed row storage method, which is not suitable for the OLAP services dominated by reading and computing. Moreover, even so, it is still possible to generate many discontinuous data blocks after multiple writings, resulting in a serious decline in accessing performance.

Data modifiability and high-performance computing are a pair of contradictions. It is impossible to realize high-performance operation with a huge amount of data and allowing frequent and large modifications at the same time.

This book focuses on high-performance computing, so we sacrifice the need for frequent and large modifications.

Both text files and bin files can be considered not to support modification (modification means rewriting all the file), and composite tables can support a small amount of modifications.

When the amount of modification is small, SPL writes the modified (also including inserted and deleted) data to a separate data area, which is called the supplementary area. When reading, the supplementary area data is merged with the normal data, so that the existence of the supplementary area cannot be felt during access. Moreover, to ensure high-performance merge processing, the amount of data in the supplementary area should not be too large.

To modify, there must be a primary key. SPL requires that the primary key of a composite table must be in order (that is why there is an insert action. Relational databases do not distinguish between insert and append).

A B
1 =file(“data.ctx”).open()
2 =10.new(rand(1000):ID,…) >A1.update(A2)
3 =10.new(rand(10000):ID,…) >A1.delete(A3)

Because too many modifications are not allowed, SPL only provides the use of in-memory record sequence as a parameter to perform modification and deletion actions, rather than the use of a cursor during appending. When modifying, if the primary key value does not exist in the composite table, it will be inserted.

After many times of modification, the supplementary area will become larger and larger, and the impact on performance will increase. Therefore, the supplementary area needs to be cleaned on a regular basis: merge it into the original data area to ensure compact and continuous storage, and then remove it .

A B
1 =file(“data.ctx”).open()
2 if (day(now())==1 >A1.reset()

reset()will mix the supplementary area into the normal storage area, and rewrite part of the data (starting from the smallest primary key value in the supplementary area, and the previous data will not be changed) to maintain the compact and continuous storage of whole composite table.

For larger amounts of data, the multi-zone composite table can also be used to implement the real-time update process. Similar to the real-time appending method mentioned earlier, it needs to first divide the data into multiple layers of tables, and then merge them layer by layer during the real-time update process. The process will be a bit more complex than appending only, especially when it involves deletion. For specific details, you can also refer to relevant routines on the SPL forum.


Performance Optimization - 3.1 [Search in external storage] Binary search
Performance Optimization - Preface