Performance Optimization - 4.5 [Traversal technology] Multi-cursor
Performance Optimization - 4.4 [Traversal technology] Load from database in parallel
Using fork can flexibly implement parallel computing, but the code is still a bit cumbersome, especially when performing statistics on a very common single data table. Moreover, it should be noted that when performing another aggregate operation on the results returned by threads, the function may be changed (from count to sum). To solve this, SPL provides a simpler multi-cursor syntax, which can directly generate parallel cursors.
A | |
---|---|
1 | =file(“orders.txt”) |
2 | =A1.cursor@tm(area,amount;4) |
3 | =A2.groups(area;sum(amount):amount) |
4 | =A1.cursor@tm(area,amount;4) |
5 | =A4.select(amount>=50).groups(area;count(1):quantity) |
Using @m option can create parallel multi-cursor. Just like the usage of single cursor mentioned earlier, SPL will automatically handle the actions of parallel computing and performing another aggregate operation on the returned results, and will also automatically and correctly handle the functions to be used in the second round.
On the multi-cursor, channels can also be used to implement multipurpose traversal:
A | B | |
---|---|---|
1 | =file(“orders.txt”).cursor@tm(area,amount;4) | |
2 | cursor A1 | =A2.groups(area;sum(amount):amount) |
3 | cursor | =A3.select(amount>=50).groups(area;count(1):quantity) |
Multi-file cursors can also be concatenated into a multi-cursor to perform parallel computing:
A | |
---|---|
1 | =12.(file(“orders”\~\“.txt”).cursor@t(area,amount)) |
2 | =A1.mcursor() |
3 | =A2.select(amount>=50).groups(area;count(1):quantity) |
In addition, it can create the in-memory multi-cursor on in-memory table sequence, and use parallel technology to improve computing performance.
A | |
---|---|
1 | =file(“orders.txt”).import@t() |
2 | =A1.cursor@m(4) |
3 | =A2.groups(area;sum(amount):amount) |
For scenarios where the CPU is strong and the hard drive is weak, we can also convert the single cursor into multi-cursor. That is, use single thread when the cursor fetches data to avoid parallel computing on the hard disk, and use multiple threads when computing to improve performance by means of multiple CPUs, which is suitable for cases where more CPUs are required to parse, like text files.
A | |
---|---|
1 | =file(“orders.txt”).cursor@t(area,amount) |
2 | =A1.mcursor(4) |
3 | =A2.groups(area;sum(amount):amount) |
Aggregate operation on multi-cursor requires a second round of aggregation (the calculation results of every thread need to be aggregated again). The calculation logic of the second aggregation may be different. From the above example, it can be seen that SPL has implemented common operations, so you just need to consider multi-cursor as single cursor to calculate. However, if you encounter uncommon operations, you need to do the second round of aggregation yourself.
A | B | |
---|---|---|
1 | =file(“orders.txt”).cursor@tm(area,amount;4) | |
2 | fork A1 | return A2.groups(area;count(1):C) |
3 | =A2.conj().groups(area;sum(C):C) |
This is an example that uses count (in fact, count has been processed by SPL). SPL also provides a simplified fork syntax for multi-cursor, allowing each thread of multi-cursor to be paralleled.
Performance Optimization - 4.6 [Traversal technology] Grouping and aggregating
Performance Optimization - Preface
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL