Performance Optimization - Postscript

 

Performance Optimization - 9.7 [Cluster] Multi-job load balancing

This book contains 60 sections, describing dozens of basic high-performance algorithms or storage schemes for big structured data. The flexible use of such methods has been made in many actual scenarios. Compared with SQL on conventional relational database, the algorithms implemented in SPL can often improve the performance several times or even a hundred times.

As a programming language, SPL theoretically would not be faster than SQL. The reason why SPL actually runs faster is that it can implement the high-performance algorithms that SQL cannot implement. If a computing task has already adopted the best algorithm when it is implemented in SQL, rewriting it in SPL will not run faster. However, there are too many high-performance algorithms that are hard to be implemented in SQL, as long as the code is somewhat complex, it is almost certain that we will be able to find the points that can change the storage scheme and algorithm so as to optimize the performance. For more than ten scenarios we have performed, we could always find certain points to improve every time, in this case, rewriting in SPL can effectively improve the overall performance. Of course, coding in C/C++ or Java may also get a higher performance, but the development efficiency is too low.

It needs to be noted once again that each algorithm in this book has its own scenario to which it adapts; and even some algorithms cannot be used at the same time due to certain contradictions between them, therefore, the algorithm needs to be selected according to practical situation. We have been emphasizing that we should first fully understand task’s objectives and data characteristics, and then design an optimization scheme according to actual conditions. Moreover, the big data computing task in reality does not simply correspond to the algorithm presented in this book one by one but a comprehensive task, and hence we should use these basic algorithms in an appropriately combined manner to cope with real task. Consequently, it is more important to learn the analysis methods.

Due to space limitations, this book does not give all examples and tests. If you are interested in learning more test examples and codes, go to Raqforum to find them.


Performance Optimization - Preface