* Perform Join Queries over Different Databases


DBlink enables you to directly access objects in another database, of the same type. To access a database of different structure, you need to either change the deployment architecture or write a lot of complicated code. Both are difficult to implement.

The seemingly easy solution is using ETL. The process transfers data in one database and then you can write a SQL join query in the familiar way. But this needs to change the original deployment architecture by imposing an ETL layer and a data mart layer that requires an admin to maintain timer and scheduler services or extra coding work to judge “if data is ETLed”. To make matters worse, the ETL way turns a real-time computation into a non-real-time computation, which may trigger a user experience crisis.

It appears that microservices are the once-for-all solution. The technique replaces tightly-coupled ODBC/JDBC interface with loosely-coupled http interface that makes it possible to transmit data all in JSON format or XML format. This has a long list of jobs, such as writing data service interface for every table and client side interface for each algorithm, adopting unfamiliar frames and class libraries and handling privilege security issues outside of the database. Furthermore, real computations need to be hardcoded.

You can also achieve this in high-level languages, like Java. First, write SQL to retrieve data respectively from databases of different structures, convert them to two classes and then perform a join between two lists. That’s the manual replacement of SQL in essence. You can make it for simple merges, but it’s too difficult for a join. We haven’t taken functional algorithms, such as subqueries and post-join grouping and aggregation into account yet. In short, it’s cost-ineffective to do SQL work with Java that lacks structured class library.

But I have to say, there IS a solution that requires no change of deployment architecture and very few development sources. It is esProc’s Structured Process Language. SPL accesses any database table in a uniform data structure, which gets rid of the different-structure problem. The language offers a rich class library for processing structured data, making it convenient to implement a join query and the subsequent function algorithms, more convenient than SQL’s way. esProc offers JDBC/ODBC interface to let programmers achieve a join of database of different structures by introducing the driver jars and without rewriting the code.

For example, to access the orders table in Oracle database via esProc JDBC in Java and perform a join between it and MySQL’s customer table and group and summarize the joining result, SPL does it in this way:


ResultSet rs = con.executeQuery(“orcl.query(/“select * from orders/”).join(cust,my.query(/“select * from customer/”):custid,city).groups(city;sum(amount))”);

More explanations about multidatabase joins can be found in Query Sharded Databases.

To embed esProc JDBC into a Java program, refer to How to Call an SPL Script in Java.

About esProc installation, free license download and relevant documentation, see Getting Started with esProc.