Performance Optimization - 4.5 [Traversal technology] Multi-cursor

 

Using fork can flexibly implement parallel calculations, but its code is still a bit cumbersome. Particularly, when we count a very common single data table, attention needs also be given that the function (from count to sum) may be changed while aggregating once again the results returned by the thread. Fortunately, SPL provides a simpler multi-cursor syntax, which can directly generate parallel cursors.

A
1 =file(“orders.txt”)
2 =A1.cursor@tm(area,amount;4)
3 =A2.groups(area;sum(amount):amount)
4 =A1.cursor@tm(area,amount;4)
5 =A4.select(amount>=50).groups(area;count(1):quantity)

Using @m option can create parallel multi-cursors. Same as the usage for single cursor, SPL will automatically process the actions of paralleling and results re-aggregating, and will automatically and correctly process the functions that should be used in the second round.

On the multi-cursor, it can also use channels to implement multipurpose traversal:

A B
1 =file(“orders.txt”).cursor@tm(area,amount;4)
2 cursor A1 =A2.groups(area;sum(amount):amount)
3 cursor =A3.select(amount>=50).groups(area;count(1):quantity)

Multi-file cursors can also be concatenated into a multi-cursor to perform parallel calculation:

A
1 =12.(file(“orders”\~\“.txt”).cursor@t(area,amount))
2 =A1.mcursor()
3 =A2.select(amount>=50).groups(area;count(1):quantity)

In addition, it can create the in-memory multi-cursor on in-memory table sequence, and improve the operation performance by using parallel technology.

A
1 =file(“orders.txt”).import@t()
2 =A1.cursor@m(4)
3 =A2.groups(area;sum(amount):amount)

For scenarios with high-performance CPUs and low-speed hard disk, we can convert single cursor into multi-cursor. This conversion enables single thread to be used when cursor fetches data to avoid the parallel tasks of hard disk, and enables multiple threads to be used during calculation to improve performance by means of multiple CPUs, which is suitable for situations that need more CPUs to parse such as text files.

A
1 =file(“orders.txt”).cursor@t(area,amount)
2 =A1.mcursor(4)
3 =A2.groups(area;sum(amount):amount)

The aggregating operation on multi-cursor needs a second round (the calculation results of each thread need to be aggregated again). The calculation logic of the secondary round may be different. From the above example, it can be seen that SPL has implemented common operations, so it is enough to directly consider multi-cursor as single cursor operation, however, if you encounter uncommon operations, it is necessary to do the second round yourself.

A B
1 =file(“orders.txt”).cursor@tm(area,amount;4)
2 fork A1 return A2.groups(area;count(1):C)
3 =A2.conj().groups(area;sum(C):C)

Let’s take count as an example (in fact, count has been processed by SPL). SPL also provides a simplified fork syntax for multi-cursor, allowing each thread of multi-cursor to be paralleled.