Performance Optimization Skills: Multi-layer Serial Bytes
Abstract
Serial byte is a data type exclusive to SPL, which is suitable to replace the key values with multiple discontinuous layers such as ID number, contract number, product number, organization code, etc. Serial bytes are quick in locating records and usually used to optimize in-memory index query and foreign key associative calculation. Click Performance Optimization Skills: Multi-layer Serial Bytes to learn more details.
In-memory index query
cardNormal.btx is an ID number information table in the format of bin file with a million records, in which there are many fields: cardNo (ID number, primary key), name, gender, province, email, mobile, and address. The structure and data of cardK.btx and cardNormal,btx are exactly the same except that the cardNo fields are serial bytes.
This case will perform millions of index queries on cardNormal.btx and cardK.btx respectively to compare the performance of both.
Among those fields, cardNo is the simplified ID number whose format is shown in the table below:
Bits |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
Rules |
Administrative division |
Birthday |
Serial number |
Check digit |
||||||||||||
Details |
Province |
Region |
Year |
Month |
Date |
Serial number |
Check digit |
|||||||||
Example |
1 |
0 |
0 |
3 |
1 |
9 |
8 |
0 |
1 |
2 |
1 |
3 |
0 |
2 |
3 |
x |
Province and region: both values range from 1 to 10.
Birthday: the values range from “1980-06-01” to “1981-01-01”.
Serial number: the values range from 1 to 100.
Check digit: the redundant check digits are calculated based on the first 15 bits and the values range from 0 to 10 where x indicates 10.
Following the steps below, we can convert the above ID number to serial bytes:
1. Province and region: one layer of serial byte only supports integers from 1 to 255. So convert the province and region into integers as the first and second layers of the serial byte respectively.
2. Birthday: the performance will be better if the serial bytes start from 1. So the number of days between the birthday and 1980-06-01 is calculated and taken as the third layer of the serial byte.
3. Serial number: convert the number to an integer and take it as the forth layer of the serial byte.
4. Check digit: it is redundant data which can be deleted.
The detailed script of converting the above data is:
k(int(mid(cardNo,1,2)), int(mid(cardNo,3,2)), interval("1980-06-01",date(mid(cardNo,5,8),"yyyyMMdd")), int(mid(cardNo,13,3)) ) |
Then run index query one million times on cardNormal.btx and cardK.btx respectively as follows.
A |
B |
C |
|
1 |
=cardNormal=file("d:\\temp\\cardNormal.btx").import@b().keys(cardNo).index() |
/import cardNormal.btx in memory |
|
2 |
=paramList=cardNormal.(cardNo).sort(rand()).to(100000) |
/take 10,000 ID numbers randomly |
|
3 |
|||
4 |
=now() |
||
5 |
for 100 |
=paramList.(cardNormal.find(~)) |
/query the records one million times |
6 |
=interval@ms(A4,now()) |
/time of string keys: 5537ms |
|
7 |
|||
8 |
=cardK=file("d:\\temp\\cardk.btx").import@b().keys(cardNo).index@s() |
/import cardk.btx in memory |
|
9 |
=paramListK=paramList.(k(int(mid(~,1,2)), int(mid(~,3,2)), interval("1980-06-01",date(mid(~,5,8),"yyyyMMdd")), int(mid(~,13,3)) )) |
/convert string parameters to serial byte parameters |
|
10 |
=now() |
||
11 |
for 100 |
=paramListK.(cardK.find(~)) |
/query the records one million times |
12 |
=interval@ms(A10,now()) |
/time of serial bytes: 1977ms |
A8: create in-memory HASH index on serial bytes, and @s option of the function should be used.
The query takes 5,547 milliseconds when indexing on string keys, while it takes only 1,977 milliseconds when indexing on serial bytes, which is obviously faster.
Foreign key associative query
taxNormal.btx is a tax return information table in the format of bin file with 10 million records, in which there are many fields: serial (primary key, serial number), cardNo (foreign key, ID number), tax (amount of tax return), area, declaretype, unit, declareTime, and network. The structure and data of taxK.btx and taxNormal.Btx are exactly the same except that cardNo fields are serial bytes.
The algorithms of this case are:
1. Perform foreign key associative calculation on taxNormal.btx and cardNormal.btx, avoiding the effects of OS caching and hard disk IO, and only the time consumed during the association actions is counted.
2. Perform foreign associative calculation on taxK.btx and cardK.btx, avoiding the effects of OS caching and hard disk IO, and only the time consumed during the association actions is counted.
3. Compare the differences between 1 and 2.
The detailed script is:
A |
B |
C |
|
1 |
=file("d:\\temp\\taxNormal.btx").cursor@b() |
for A1,10000 |
/open the tax return table, pre-traverse the cursor |
2 |
=A1.reset() |
/reset the cursor at the starting point |
|
3 |
=file("d:\\temp\\cardNormal.btx").import@b().keys(cardNo).index() |
/open the ID number table |
|
4 |
=now() |
||
5 |
for A1,10000 |
/traverse the cursor formally |
|
6 |
=interval@ms(A4,now()) |
=A1.reset() |
/time of traversing cursor: 3748ms |
7 |
=A1.switch(cardNo,A3:cardNo) |
for A1,10000 |
/create association, pre-traverse the cursor |
8 |
=A1.reset() |
/reset the cursor at the starting point |
|
9 |
=now() |
||
10 |
for A7,10000 |
/traverse the associated cursor formally |
|
11 |
=interval@ms(A9,now()) |
=A11-A6 |
/time of associating: 7553ms |
12 |
/above: string key association; below: serial byte association |
||
13 |
=file("d:\\temp\\taxK.btx").cursor@b() |
for A13,10000 |
/open the tax return table, pre-traverse the cursor |
14 |
=A13.reset() |
/reset the cursor at the starting point |
|
15 |
=file("d:\\temp\\cardK.btx").import@b().keys(cardNo).index@s() |
/open the ID number table |
|
16 |
=now() |
||
17 |
for A13,10000 |
/traverse the cursor formally |
|
18 |
=interval@ms(A16,now()) |
=A13.reset() |
/time of traversing cursor: 2884ms |
19 |
=A13.switch(cardNo,A15:cardNo) |
for A19,10000 |
/create association, pre-traverse the cursor |
20 |
=A19.reset() |
/reset the cursor at the starting point |
|
21 |
=now() |
||
22 |
for A19,10000 |
/traverse the associated cursor formally |
|
23 |
=interval@ms(A21,now()) |
=A23-A18 |
/time of associating: 966ms |
B2, B7, B13, B19: pre-traverse the cursor to avoid the effects of OS caching. And the reset function is used to reset the cursor after traversing in order to perform traversal again.
B11, B23: extract the traversal time from the total time to get the time consumed in association.
It can be observed that the association action takes 7,553 milliseconds when performing foreign key associative calculation with ordinary keys; while it takes only 966 milliseconds when performing the same operation with serial bytes, which is significantly faster. Besides, it takes less time to fetch records using serial bytes than using strings (all other fields and row numbers of records are the same), which demonstrates that the serial byte performs better in generating object.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version