How to Achieve T+0 Query and Analysis?

The T+0 query refers to real-time data query and analysis where the latest data is available in real-time. There are non-real-time T+1 and T+N queries. A T+0 query can be achieved using the business database. But as data accumulates, transactions become slow. The general solution is to divide data in the database into two parts and use one part to create a separate, special database for performing queries and analysis. By doing this, T+0 queries will involve cross-database calculations (because the cold data is stored in the new database and the hot data is generated in the original business database). Both the database’s own synchronization mechanism (such as OGG) and the database’s cross-database query tool (such as DBLink) are not satisfactory because they have many restrictions and poor performance, and provide weak support for heterogeneous databases. Besides, Java heavily relies on the database at the application side to achieve computations due to difficult coding issues, resulting in huge amounts of historical data being overstocked in the database. This further brings in low IO utilization and poor performance.

esProc SPL can completely resolve all those issues. esProc is an open-source data computing engine equipped with rich class libraries and agile SPL syntax for processing data fast and efficiently. It supports various data sources, and more importantly, provides cross-data-source computational capability to achieve mixed T+0 queries based on the database specifically used for data analysis and the business database.


A


1

=cold=db1.cursor(“select * from orders where   odate

/Retrieve the cold historical data from analysis database

2

=hot=db2.cursor(“select * from orders where   odate>=?”,date(now()))

/Retrieve the required hot data from business database

3

=[cold,hot].conjx()


4

=A3.groups(area,customer;sum(amout):amout)


To take a step further to approach the database capacity and performance problems, we can store the cold historical data in a file system to improve IO performance and computational efficiency. It is faster to write historical data to a file. Then we perform the mixed computing between the file (cold historical data) and the business database (current hot data) to achieve high-performance T+0 queries. The SPL code is simple. We just need to modify the first line of the above database-based script:


A


1

=cold=file(“/his/orders.btx”).cursor@b()

/Retrieve the cold historical data from the file


The rest of the code is completely the same


Boasting concise syntax and high performance, SPL beats SQL in terms of algorithm implementation and computational performance. SPL is interpreted before the execution that enables convenient operations for hot deployment/hot switching. The SPL code can be used as embedded JDBC to be integrated into the application without adjusting the application structure. The use is simple and flexible.