MongoDB JOIN MySQL for Reporting Tools
In many cases of report development, the original data comes from different types of data source, such as MongoDB and MySQL. Generally it’s difficult to deal with the heterogeneous data sources with a reporting tool alone. JasperReport and Birt offers virtual data source or table join in their business editions or advanced editions, but there are no similar features in free editions. Those features have limitations yet. The joining result can’t be manipulated by SQL-like structured operations.
SPL (Structured Process Language) has great support of heterogeneous data sources, which makes esProc a powerful computing engine. An SPL script is integration-friendly. You can process this type of report source with esProc for various reporting tools. We’ll illustrate this through an example.
emp is a MongoDB collection. cities is a MySQL table that consists of two fields – CityID and CityName. emp’s CityID field is treated as a foreign key logically, which points to CityID field of cities table. The task is to find employees from emp by a specified time period and replace CityID with CityName. Below is part of the sources:
emp collection:
EID |
Dept |
CityID |
Name |
Gender |
Salary |
Birthday |
10 |
R&D |
199 |
Ryan |
M |
13000 |
1976-03-12 |
100 |
Sales |
Jacob |
M |
5000 |
1978-02-12 |
|
101 |
Sales |
56 |
Michael |
M |
6500 |
1984-03-29 |
102 |
Sales |
46 |
Christian |
M |
12000 |
1972-07-25 |
103 |
Marketing |
34 |
Madison |
F |
5000 |
1976-07-11 |
104 |
Marketing |
6 |
Sarah |
F |
8000 |
1982-11-17 |
105 |
Marketing |
4 |
Tyler |
M |
6500 |
1978-04-08 |
106 |
Marketing |
6 |
Emily |
F |
7000 |
1975-12-05 |
107 |
Marketing |
2 |
Madison |
F |
5000 |
1981-09-29 |
cities table:
CityID |
CityName |
Population |
StateId |
1 |
New York |
8084316 |
32 |
2 |
Los Angeles |
3798981 |
5 |
3 |
Chicago |
2886251 |
13 |
4 |
Houston |
2009834 |
43 |
5 |
Philadelphia |
1492231 |
38 |
6 |
Phoenix |
1371960 |
3 |
7 |
San Diego |
1259532 |
5 |
SPL script:
A |
|
1 |
=mongo_open("mongodb://localhost:27017/test?user=test&password=test") |
2 |
=mongo_shell(A1,"emp.find({'$and':[{'Birthday':{'$gte':'"+string(begin)+"'}},{'Birthday':{'$lte':'"+string(end)+"'}}]},{_id:0})").fetch() |
3 |
=mongo_close(A1) |
4 |
=myDB1.query("select * from cities") |
5 |
=A2.switch(CityID,A4: CityID) |
6 |
=A5.new(EID,Dept,CityID.CityName:CityName,Name,Gender) |
7 |
result A6 |
A1: Connect to MongoDB. The connection string syntax is mongo://ip:port/db?arg=value&….
A2: find() function gets all fields of documents (except for _id) in a certain time period from emp collection and returns them as a cursor. begin and end are conditional parameters coming outside of the report. They indicate the start time and end time of Birthday. An SPL cursor can be retrieved and processed in batches. That avoids memory overflow due to a data volume exceeding memory capacity. Here the data volume is relatively small, so we can fetch all records from the cursor at one time. Below is A2’s result:
A3: Close MongoDB connection.
A4: Perform a SQL query to retrieve data from the MySQL table. myDB1 is the database name. Below is the configuration window:
The MySQL database is connected via esProc JDBC, which supports connection to any type of database. A JDBC-type data source can be automatically or manually connected or closed. Here’s an auto connect/close.
query() function retrieves data using a SQL statement. Here’s the result:
A5: Switch over A2’s CityID field with A4’s corresponding records. The operation is similar to a left join. Below is A2 after the switchover (Both A2 and A5 points to the same two-dimensional table):
You can click the hyperlinks in blue under CityID field to view the records:
To perform an inner join, just use @i option in switch() function: A2.switch@i(CityID,A4). The result will be like this:
A6: Get desired fields from the joining result to create a new two-dimensional table using new() function. “CityID.CityName:CityName” gets CityName field from the records identified by CityID field and renames it CityName (Because a reporting tool can’t identify a field name like CityID.CityName).
The switchover operation enables accessing an object in an associated table directly. This makes the access intuitive and simple. It is especially convenient when it is a multi-table, multilevel association. Below is the data the report building needs:
A7: Return A6’s two-dimensional table to the reporting table.
SPL offers a JDBC interface to be identified as database by a reporting tool for integration.
Take JasperReport as an example. Here’s the report template:
Then we need to define two report parameters – Pbegin and Pend, which correspond to parameter begin and parameter end in the SPL script. Now you can preview the report:
A reporting tool calls an SPL script in the same way of calling a stored procedure. We can save the SPL script as mongodbJoin2.dfx, which can be called from JasperReport’s designer using "mongodbJoin2 $P{pbegin},$P{pend}"
In brief, SPL retrieves data from the MongoDB collection and gets records from a MySQL table, joins them up through a switchover operation to replace a field in the collection with the corresponding records in the table, creates a new table sequence based on the joining result set using new() function, and then returns the result set to the reporting tool for presentation. The participation of SPL’s data association makes processing heterogeneous data sources simple and easy. And the integration is simple, too.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL