How to Make High-Concurrency Account Queries Fast?
A high-concurrency account query aims to quickly retrieve the several to several thousands of detailed records relative to a specified account from the historical data containing tens of millions of or hundreds of millions of records. The computation simply consists of data retrieval, yet we demand the sub-second response time, or even quicker.
Databases can locate records of a specified account according to the index in a split second, but there are apparent delays for highly concurrent operations. That is because they cannot ensure that records of the same account are physically stored continuously – probably disordered instead. Hard disks have the smallest reading units and will retrieve a lot of irrelevant information when reading discontinuous data, dragging down the query speed. When each query in the highly concurrent accesses is laggard, the overall performance becomes really low even though the data amount for each account is small.
The key to solving the problem is to sort data by account to ensure data of the same account is stored continuously, so that, for each query, the block of data read from the hard disk will contain almost all the target values and performance is greatly boosted.
The trouble is that most databases find hard to ensure a certain physical order for data stored in them. So, the optimal algorithm can hardly be implemented.
It is convenient to achieve the algorithm using the open-source esProc SPL. Code is as follows:
1. Create a row-wise stored, ordered file to make sure the physical data is stored continuously:
file("T.ctx").create@r(#id,tdate,amt,…).append@i(file("T-original.ctx").cursor().sortx(id))
Remember that you should use the row-wise storage scheme rather than the column-wise one that stores data column by column and where fields of one account are located in different columns. So, the latter will still have the problem of reading discontinuous data from the hard disk. The problem becomes particularly serious with a high-concurrency workload.
esProc supports both row-wise and column-wise storage schemes. Users can choose the more appropriate one as needed.
2. Perform the high-concurrency account query:
T.icursor(id,tdate,amt,...;id==10100 && tdate>=date("2021-01-10") && tdate<date("2021-01-25") && …,index_id).fetch()
Under the same hardware environment, SPL is 26 times faster than Oracle for implementing the optimization strategy.
3. To take a step further, we can redundantly store the field values to be queried, whose size is small, in the index so that we do not need to retrieve data from the original table, enhancing the performance. Create an index containing field values:
file("T.ctx").index(index_id;id;tdate,amt,…)
The subsequent computations are the same.
Find more implementation methods in SQL Performance Enhancement: Queries on Highly Concurrent Accounts
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version