Performance Optimization - 4.4 [Traversal technology] Load from database in parallel

 

Performance Optimization - 4.3 [Traversal technology] Parallel traversal

Sometimes, we need to read data from database for complex operations, but we often find that the time to read data is very long even if the workload of database is not heavy. This is mainly due to the poor performance of database’s access driver. Using parallel technology can effectively alleviate this problem.

A B
1 fork to(4) =connect(…)
2 =range(MinID,MaxID+1,A1:4)
3 =B1.query("select * from T where id>=? and id<?,B2(1),B2(2))
4 >B1.close()
5 =A1.conj()

This code will fetch the records whose id values are between MinID and MaxID in parallel from T table. The conditional interval is split into multiple segments, with each thread corresponding to one segment. Note that the database needs to be connected separately in each thread, otherwise the database will force the requests of the same connection to be executed serially, and the parallel logic fails.

Database does not have an interface to implement segmented splitting, so you can only use where to split. You can use a more reasonable splitting scheme based on your understanding on the data characteristics. Since where computing will consume database resources, parallel fetching can only be performed when the database resources are sufficient (slow fetching is due to slow driver rather than slow database computing speed). Too complex where computing can also lead to a decrease in parallel fetching effect.

If it needs to use a database cursor to handle large amounts of data, you need to process the data read out in each thread. Only in this way can the parallel effect be achieved. You cannot return a delayed cursor in a thread without substantial computation.

A B
1 fork to(4) =connect(…)
2 =range(MinID,MaxID+1,A1:4)
3 =B1.cursor("select * from T where id>=? and id<?,B2(1),B2(2))
4 =B3.groups(…)
5 =A1.conj().groups(…)

Parallel scheme can also be used when fetching the data with multiple different SQLs:

A B
1 =[“select …”,“select…”,…]
2 fork A1 =connect(…)
3 =B2.query(A2)
4 >B2.close()

When using multiple SQLs, you can select a greater number of parallel threads. Since different SQLs have different execution speeds, load balancing can still be achieved even if there are no more threads physically. When a thread is assigned to SQL that executes faster, more tasks can be executed to avoid thread waiting.


Performance Optimization - 4.5 [Traversal technology] Multi-cursor
Performance Optimization - Preface