Performance Optimization - 4.4 [Traversal technology] Load from database in parallel


Sometimes we need to read data from the database to execute complex operations, however, we often find the time to read data is very long when database is not heavily loaded. This is mainly due to poorer performance of database’s access driver; this problem can be effectively alleviated by using the parallel technology.

1 fork to(4) =connect(…)
2 =range(MinID,MaxID+1,A1:4)
3 =B1.query("select * from T where id>=? and id<?,B2(1),B2(2))
4 >B1.close()
5 =A1.conj()

This code will fetch the records in T table whose id value is between MinID and MaxID in parallel. The conditional interval is divided into multiple segments, and each thread takes one of segments. Note that the database needs to be connected separately in each thread, otherwise, database will force the requests of same connection to be executed serially, as a result, the parallel logic fails.

Database does not have an interface to achieve segmentation split, you can only use where to split, and you can use a more appropriate split scheme based on your understanding on the data characteristics. Since where computing will consume database resources, the parallel data-fetching can be used only when database resources are sufficient (slow data-fetching is due to slow driver rather than the slow database computing). Too complex where computing will also make the effect of parallel data fetching worse.

If the database cursor is needed due to the large amount of fetched data, only when the read data is processed in each thread can the parallel effect be achieved, and you cannot return the delayed cursor in the thread without substantive calculation.

1 fork to(4) =connect(…)
2 =range(MinID,MaxID+1,A1:4)
3 =B1.cursor("select * from T where id>=? and id<?,B2(1),B2(2))
4 =B3.groups(…)
5 =A1.conj().groups(…)

Parallel schemes can also be used while data-fetching with multiple different SQLs:

1 =[“select …”,“select…”,…]
2 fork A1 =connect(…)
3 =B2.query(A2)
4 >B2.close()

For multiple SQLs, some bigger parallel thread numbers can be used. Since different SQL has different execution speeds, the load can also be balanced even if there is no more number of threads physically. When the threads are assigned to SQLs having faster execution speed, more tasks can be executed to avoid thread waiting.