How to Perform Table Joins between MongoDB and MySQL

Key words: Table joins  MongoDB  MySQL

Dealing with cross-database computations involving tables of different structures is hard and complicated. The usual method is to load data sources to one data warehouse using ETL and then compute them separately or perform a mixed computation using the data warehouse’s computing ability. This is inefficient because:

1. A special data warehouse to which data is ETLed is needed. This adds workload.

2. It isn’t real-time but can only do the dump and load at regular time.

3. The IO operation of dump and load reduces performance and increases database space usage that requires more storage expansion.

Cross-database computations would become simple if there was a computing engine independent of the database. Actually there is. esProc has a full set of set operation functions and agile syntax, which makes it fully independent. It can perform structured data processing directly over a retrieved data source.

One example: order is a MongoDB collection, custom is a MySQL table. order’s O_CUSTKEY field is a logical foreign key pointing to custom’s C_CUSTKEY field. The goal is to sum the orders amounts in a certain time period by regions. Below are samples of the data sources:

MongoDB collection order:

undefined

MySQL table custom:

undefined

Final result:

undefined

esProc gets it done with a 5-line script:


A

1

=mongo_open("mongodb://localhost:27017/test?user=root&password=sa")

2

=mongo_shell@x(A1,"order.find()").fetch()

3

=Mysql.query("select   C_CUSTKEY,C_REGION from custom")

4

>A2.switch(O_CUSTKEY,A3:C_CUSTKEY)

5

=A2.groups(O_CUSTKEY.C_REGION:C_REGION;sum(O_TOTALPRICE):AMOUNT)

esProc SPL is efficient because:

1. It retrieves source data to process it directly without the need of establishing a special database. The performance is good.

2. It enables a slim computing structure by cutting down database IO, which in turn gets rid of the intermediate tables and reduces workload.

3. It retrieves data in real time, maintaining good data consistency.

4. It has consistent computing interface to deal with various types of data source.

With the help of esProc SPL, we can conveniently handle structured data processing in MongoDB. For more examples, read esProc-driven MongoDB Hackers, Simplifying MongoDB Data Association, How SPL Assists MongoDB Calculation, MongDB Collection Joins for Reporting and MongoDB Join MySQL for Reporting.

esProc is integration-friendly. Read How to Call an SPL Script in Java to see how we can easily embedded an SPL script into a Java program.

Read Getting Started with esProc to download and install esProc, get a license for free and find related documentation.