Joining MongoDB Collections for Reporting Tools
In many cases of report development, the original data comes from multiple data sources, such as two MongoDB collections. It’s difficult to deal with this type of source data with a reporting tool alone. JasperReport and Birt offers virtual data source or table join in their business editions or advanced editions, but there are no similar features in free editions. Those features only support joining two sources, instead of more. Besides, designed as visual interface, they can’t manipulate the joining result with SQL-like structured operations.
SPL (Structured Process Language) has great support of multiple data sources, which makes esProc a powerful computing engine. An SPL script is integration-friendly. You can process this type of report source with esProc for various reporting tools. We’ll illustrate this through an example.
sales and emp are two MongoDB collections. sale’s SellerID field is treated as a foreign key logically, which points to EId field of emp table. The task is to find orders within a specified time period from sales and LEFT JOIN the two collections. Below is part of the source data:
sales collection:
OrderID |
Client |
SellerId |
Amount |
OrderDate |
26 |
TAS |
1 |
2142.4 |
2009-08-05 |
27 |
HP |
13 |
538.6 |
2010-08-06 |
28 |
DY |
11 |
307.2 |
2010-08-07 |
29 |
WVF |
19 |
420 |
2010-08-08 |
30 |
UFS |
18 |
1200.8 |
2009-08-09 |
31 |
SWFR |
15 |
1488.8 |
2010-08-12 |
32 |
JFS |
3 |
468 |
2009-08-13 |
emp collection:
EId |
State |
Dept |
Name |
Gender |
Salary |
Birthday |
2 |
New York |
Finance |
Ashley |
F |
11001 |
1980-07-19 |
3 |
New Mexico |
Sales |
Rachel |
F |
9000 |
1970-12-17 |
4 |
Texas |
HR |
Emily |
F |
7000 |
1985-03-07 |
5 |
Texas |
R&D |
Ashley |
F |
16000 |
1975-05-13 |
6 |
California |
Sales |
Matthew |
M |
11000 |
1984-07-07 |
7 |
Illinois |
Sales |
Alexis |
F |
9000 |
1972-08-16 |
SPL script:
A | |
1 | =mongo_open("mongodb://localhost:27017/test?user=root&password=sa") |
2 | =mongo_shell(A1,"sales.find({$and:[{OrderDate:{$gte:'"+string(begin)+"'}}, {OrderDate:{ $lte:'"+string(end)+"'}} ] },{_id:0} )").fetch() |
3 | =mongo_shell(A1,"emp.find(,,"{_id:0}").fetch() |
4 | >mongo_close(A1) |
5 | =join@1(A2:sales,SellerId;A3:emp,EId) |
6 | =A5.new(sales.OrderID:OrderID,sales.Client:Client,sales.Amount:Amount, sales.OrderDate:OrderDate,emp.Name:Name,emp.Dept:Dept,emp.Gender:Gender) |
7 | result A6 |
A1: Connect to MongoDB. The connection string syntax is mongo://ip:port/db?arg=value&….
A2: find() function gets all fields of documents (except for _id) in a certain time period from sales collection and returns them as a cursor. begin and end are conditional parameters coming outside of the report. They indicate the start time and end time of OrderDate.
Since the function returns data as a cursor instead of retrieving data directly into the memory, it can handle a large volume of data. The cursor can be further manipulated by skip() function, sort() function, and conj() function. Data won’t be retrieved until it is handed over to fetch() function, groups() function or the for statement. In our example, the cursor data will be retrieved by fetch() function. Suppose we need to get documents from 2009-01-01 to 2009-12-31, then A2’s result is as follows:
A3: find() function retrieves all fields (except for _id) in documents from carInfo collection and returns them as a cursor. Here’s the result:
A4: Close MongoDB connection.
A5: Left join A2 and A3 through the joining fields, which are A2’s SellerId and A3’s EId. For the sake of convenience, we name the two parts of the joined result set sales and emp. join() function performs a join and @1 option enables a left join. The result is shown below (the left part):
Some SellerIds in sales don’t have corresponding records in emp. You can use @f to perform a full join. The absence of any option means an inner join.
A6: Execute the join and get desired fields from the joining result to form a two-dimensional table using new() function. “sales.OrderID:OrderID” gets sales.OrderID field from A5 and rename it OrderID (because the reporting tool can’t identify field name like sales.OrderID). Below is the data for report development:
A7: Return A6’s two dimensional table to the reporting tool.
SPL offers a JDBC interface to be identified as database by a reporting tool for integration. Take JasperReport as an example. Here’s the report template:
Then we need to define two report parameters – Pbegin and Pend, which correspond to parameter begin and parameter end in the SPL script. Now you can preview the report:
A reporting tool calls an SPL script in the same way of calling a stored procedure. We can save the SPL script as mongodbJoin.dfx, which can be called from JasperReport’s designer using “mongodbJoin $P{pbegin},$P{pend}”.
In short, SPL retrieves data from the MongoDB collections, joins them up through a join operation, creates a new table sequence based on the joining result set using new() function, and then returns the result set to the reporting tool for presentation. The participation of SPL’s data association makes processing MongoDB data source simple and easy. And the integration is simple, too.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL