Performance Optimization Skills: In-Memory Associative Calculation
Abstract
Associative actions may influence the performance badly. While SPL supports in-memory pre-association, which speeds up associative actions and thus improves the performance. In order to clarify the impact of associative actions on the performance, we’ll design a set of Oracle associated tables and a non-associated wide table to perform the same operations on both of them.
First of all, perform the same operations on both Oracle associated table and the non-associated wide table.
The structure and relations of the associated tables are as follows:
Data volume: call record table (millions of records), user table (one hundred thousand of records), telecom branch table (ten thousand of records), telecom agent table (ten thousand of records).
Calculation aim: get the total cost of telecommunication, i.e., the sum of the average cost of both the telecom branches and the telecom agents which correspond to all outbound and inbound calls respectively.
Import the associated result to another table to generate a non-associated wide table:
callRecordWide |
SERIALNUMBER |
CHARGE |
OUTBRANCHOUTCOST |
INBRANCHINCOST |
OUTAGENTOUTCOST |
INAGENTINCOST |
The following SPL script is used to demonstrate the impact of associative actions on the performance:
A |
B |
|
1 |
=connect("orcl") |
|
2 |
=now() |
|
3 |
for 10 |
=A1.query("select sum(outBranch.outCost+inBranch.inCost+outAgent.outCost+inAgent.inCost) from callRecord,callUser outUser,callUser inUser,telecomBranch outBranch,telecomBranch inBranch,telecomAgent outAgent,telecomAgent inAgent where callRecord.outID=outUser.userID and callRecord.inID=inUser.userID and outUser.branchID=outBranch.branchID and outUser.agentID=outAgent.agentID and inUser.branchID=inBranch.branchID and inUser.agentID=inAgent.agentID") |
4 |
=interval@ms(A2,now()) |
/Oracle associated table 25802ms |
5 |
||
6 |
=now() |
|
7 |
for 10 |
=A1.query("select sum(outBranchOutCost+inBranchInCost+outAgentOutCost+inAgentInCost) from callRecordWide") |
8 |
=interval@ms(A6,now()) |
/oracle wide table 2055ms |
9 |
=A1.close() |
As we can see, the associated operation is 12.6 times (25802/2055) slower than non-associated operation, which may seriously affect the calculation performance.
SPL supports the pre-association to improve the performance of associative actions. First load the data in memory, and the code is:
A |
B |
|
1 |
=connect("orcl") |
|
2 |
=A1.query("select * from telecomAgent").keys(AGENTID) |
|
3 |
=A1.query("select * from telecomBranch").keys(BRANCHID) |
|
4 |
=A1.query("select * from callUser").keys(USERID) |
|
5 |
=A1.query("select * from callRecord").keys(SERIALNUMBER) |
|
6 |
=A1.switch(AGENTID,A2:AGENTID; BRANCHID,A3:BRANCHID) |
|
7 |
=A5.switch(OUTID,A14:USERID; INID,A4:USERID) |
|
8 |
=env(callRecord,A7) |
/global variable: pre-association |
The switch function can replace the field values with record references to perform pre-association.
Then the fields of other tables can be directly referred in the subsequent business algorithms, thus improving the calculation performance:
=callRecord.sum(OUTID.BRANCHID.OUTCOST+INID.BRANCHID.INCOST +OUTID.AGENTID.OUTCOST+INID.AGENTID.INCOST) |
To visualize the performance improvement of pre-association, we’ll compare the SPL pre-association with the wide table in the following section.
A |
B |
|
11 |
=connect("orcl") |
|
12 |
=A11.query("select * from telecomAgent").keys(AGENTID) |
|
13 |
=A11.query("select * from telecomBranch").keys(BRANCHID) |
|
14 |
=A11.query("select * from callUser").keys(USERID) |
|
15 |
=A11.query("select * from callRecord").keys(SERIALNUMBER) |
|
16 |
=A14.switch(AGENTID,A12:AGENTID; BRANCHID,A13:BRANCHID) |
|
17 |
=A15.switch(OUTID,A14:USERID; INID,A14:USERID) |
|
18 |
=env(callRecord,A17) |
/global variable: pre-association |
19 |
=A11.query@s("select * from callRecordWide").keys(SERIALNUMBER) |
|
20 |
=env(callRecordWide,A19) |
/global variable: wide table |
21 |
||
22 |
=now() |
|
23 |
for 10 |
=callRecord.sum(OUTID.BRANCHID.OUTCOST +INID.BRANCHID.INCOST +OUTID.AGENTID.OUTCOST+INID.AGENTID.INCOST) |
24 |
=interval@ms(A22,now()) |
/SPL pre-association 13272ms |
25 |
||
26 |
=now() |
|
27 |
for 10 |
=callRecordWide.sum(OUTBRANCHOUTCOST +INBRANCHINCOST+OUTAGENTOUTCOST +INAGENTINCOST) |
28 |
=interval@ms(A26,now()) |
/SPL wide table2210ms |
As shown above, pre-association is 6 times (13272/2210) slower than the wide table, which is already a relatively substantial improvement compared to the 12.6 times between the associated table and the wide table. When performing on the wide table, the performance of SPL is almost as quick as ORACLE (2210:2055); however, when there is association operation, the speed of pre-association in SPL is significantly faster than the temporary association in ORACLE (13272:25802).
It is also worth noting that although the above algorithm uses the wide table for comparison, it does not mean that wide table can replace the associated table. In fact, the wide table usually occupies a lot of space and brings about some maintenance difficulties in creating and synchronizing, which is rarely used in practice. On the contrary, pre-assoiation creates associations through references, which does not create new tables or waste storage space and there is no need to synchronize the data either.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version