How to easily consolidate data from different sources in Java?

To other questions:
How to easily handle text file calculations in Java?
What should a Java programmer do when it is too difficult to write complex SQL?
What is the lightweight Java library to read and write Excel dynamically?
What would be a dynamic and flexible way to transform Json to Java object?
How to perform SQL-like queries on MongoDB in Java?
What should I do when it is difficult to implement code with Java Stream?

Solution: esProc – the professional computational package for Java

imagepng

esProc is a class library dedicated to Java-based calculations and aims to simplify Java code. SPL is a scripting language based on the esProc computing package. It can be deployed together with Java programs and understood as a stored procedure outside the database. Its usage is the same as calling a stored procedure in a Java program. It is passed to the Java program for execution through the JDBC interface, realizing step-by-step structured computing, return the ResultSet object.

Same type data sources

Some databases support cross-database queries but generally require the same type of database by mapping a data table in another database or using a special computing gateway. SPL submits SQL statements to various databases through multiple threads executed parallel by these databases. Their respective calculation results are returned and then summarized and processed and provided to the application program.

A telecommunications company uses the table userService to store user service information, which is stored in multiple databases by region, and the data is summarized as follows using SPL:


A

B

1

[mysql1,mysql2,mysql3,mysql4]

2 fork A1 =connect(A2)
3
=B2.query@x("select product_no,sum(allDuration) sallDuration,sum(allTimes) sallTimes,sum(localDuration) slocalDuration ,sum(localTimes) slocalTimes from userService where I0419=? group by product_no", argType)
4 =A2.conj()
5 =A4.groups(product_no;sum(sallDuration):ad,sum(sallTimes):at,sum(slocalDuration):ld,sum(slocalTimes):lt)

The statement fork starts four threads in parallel, and each thread fetches data from the corresponding database and returns the grouped and summarized results to the main thread. Finally, the main thread merges the calculation results of the sub-threads, performs grouping and summarization again, and returns the final result set. It involves parallel calculations, merging, grouping, and other structured data calculations. If it is done directly in Java, the code is very lengthy.

This block of code can be debugged or executed in esProc IDE, or stored as a script file (like condition.dfx) for invocation from a Java program through the JDBC interface. Below is the code for invocation:

  package Test;
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.ResultSet;
  import java.sql.Statement;
  public class test1 {
      public static void main(String[]   args)throws Exception {
          Class.forName("com.esproc.jdbc.InternalDriver");
          Connection connection   =DriverManager.getConnection("jdbc:esproc:local://");
          Statement statement =   connection.createStatement();
          ResultSet result =   statement.executeQuery("call condition.dfx");
          printResult(result);
          if(connection != null)   connection.close();
      }

…

}

For details on integration with Java programs, please refer to How to Call an SPL Script in Java

If the data table is the same, but the database type is different, you can execute the same SQL statement through SPL:


A

1

>sql="SELECT *   FROM sales WHERE WEEKOFYEAR(orderdate) = 35"

2

>dbname=["oracle","mysql"]

3

>dbtype=["ORACLE","MYSQL"]

4

=dbname.(connect@l(~))

5

=dbtype.(sql.sqltranslate(~))

6

=A4.conj(~.query@x(A5(#)))

If the amount of data is large, SPL also has a cursor reading method. You can bind calculations on the cursor, such as grouping by the Dept field of the Employees table, summarizing the Amount field of the Orders table, and implementing association and grouping calculations on the cursor. Thus, large data calculations can be done using small memory.


A

1

=orcl.cursor("select EId, Dept from   Employees order by EId")

2

=mysql1.cursor("select SellerId, Amount from   Orders order by SellerId")

3

=joinx(A2:O,SellerId; A1:E,EId)

4

=A3.groups(E.Dept;sum(O.Amount))

Different types of data sources

Data often comes from a variety of heterogeneous data sources, such as relational databases (oracle, db2, MySQL), nosql databases (MongoDB), HTTP data sources, Hadoop (hive, hdfs), and even excel or text files. Usually, it is necessary to gather all the data into one database for calculation, which will increase a lot of workloads. However, if SPL is used to achieve heterogeneous data association, not only can the existing data storage process remain unchanged, but the development workload is also very small. Furthermore, because SPL has a wealth of structured algorithms, it is easy to implement related queries and subsequent calculations and is often simpler than SQL.

For example, the sales data is stored in MongoDB and the salesperson information table from the DB2 database. Therefore, the calculation example is as follows:

A
1 >hrdb=connect(“db22”)
2 =hrdb.query(“select * from employee where state=?”,state)
3 =mongodb(“mongo://localhost:27017/test?user=test&password=test”)
4 =A3.find(“orders”,,“{_id:0}”).fetch()
5 =A4.switch@i(SELLERID,A2:EID)
6 =A5.new(ORDERID,CLIENT,SELLERID.NAME:SELLERNAME,AMOUNT,ORDERDATE)
7 >hrdb.close()
8 >A3.close()
9 result A6

For another example, the text is associated with JSON, sales.txt is a structured text separated by tabs, and city.json is an unstructured JSON string. There is a foreign key relationship between the second column of sales.txt and part of the text in city.json. The two files are connected as a two-dimensional table.

imagepng

The SPL code is implemented as follows:

A
1 =json(file(“/workspace/city.json”).read())
2 =A1.new(name,#1.(#1):desc,(firstblank=pos(desc," "),left(desc,firstblank-1)):key,right(desc,len(desc)-firstblank):value)
3 =file(“/workspace/sales.txt”).import@t()
4 =join(A3:sales,#2;A2:city,key)
5 =A4.switch@i(SELLERID,A2:EID)
6 =A4.new(sales.OrderID,sales.Client,sales.Amount,sales.OrderDate,city.name,city.value)

More SPL applications

Refer to Use SPL in applications