Joining MongoDB Collections for Reporting Tools

 

  In many cases of report development, the original data comes from multiple data sources, such as two MongoDB collections. It’s difficult to deal with this type of source data with a reporting tool alone. JasperReport and Birt offers virtual data source or table join in their business editions or advanced editions, but there are no similar features in free editions. Those features only support joining two sources, instead of more. Besides, designed as visual interface, they can’t manipulate the joining result with SQL-like structured operations.

  SPL (Structured Process Language) has great support of multiple data sources, which makes esProc a powerful computing engine. An SPL script is integration-friendly. You can process this type of report source with esProc for various reporting tools. We’ll illustrate this through an example.

  sales and emp are two MongoDB collections. sale’s SellerID field is treated as a foreign key logically, which points to EId field of emp table. The task is to find orders within a specified time period from sales and LEFT JOIN the two collections. Below is part of the source data:

  sales collection:

OrderID

Client

SellerId

Amount

OrderDate

26

TAS

1

2142.4

2009-08-05

27

HP

13

538.6

2010-08-06

28

DY

11

307.2

2010-08-07

29

WVF

19

420

2010-08-08

30

UFS

18

1200.8

2009-08-09

31

SWFR

15

1488.8

2010-08-12

32

JFS

3

468

2009-08-13

  emp collection:   

EId

State

Dept

Name

Gender

Salary

Birthday

2

New York

Finance

Ashley

F

11001

1980-07-19

3

New Mexico

Sales

Rachel

F

9000

1970-12-17

4

Texas

HR

Emily

F

7000

1985-03-07

5

Texas

R&D

Ashley

F

16000

1975-05-13

6

California

Sales

Matthew

M

11000

1984-07-07

7

Illinois

Sales

Alexis

F

9000

1972-08-16

 

  SPL script:

A
1 =mongo_open("mongodb://localhost:27017/test?user=root&password=sa")
2 =mongo_shell(A1,"sales.find({$and:[{OrderDate:{$gte:'"+string(begin)+"'}}, {OrderDate:{ $lte:'"+string(end)+"'}} ] },{_id:0} )").fetch()
3 =mongo_shell(A1,"emp.find(,,"{_id:0}").fetch()
4 >mongo_close(A1)
5 =join@1(A2:sales,SellerId;A3:emp,EId)
6 =A5.new(sales.OrderID:OrderID,sales.Client:Client,sales.Amount:Amount, sales.OrderDate:OrderDate,emp.Name:Name,emp.Dept:Dept,emp.Gender:Gender)
7 result A6

  A1: Connect to MongoDB. The connection string syntax is mongo://ip:port/db?arg=value&….

  A2: find() function gets all fields of documents (except for _id) in a certain time period from sales collection and returns them as a cursor. begin and end are conditional parameters coming outside of the report. They indicate the start time and end time of OrderDate.

  Since the function returns data as a cursor instead of retrieving data directly into the memory, it can handle a large volume of data. The cursor can be further manipulated by skip() function, sort() function, and conj() function. Data won’t be retrieved until it is handed over to fetch() function, groups() function or the for statement. In our example, the cursor data will be retrieved by fetch() function. Suppose we need to get documents from 2009-01-01 to 2009-12-31, then A2’s result is as follows:

   undefined

  A3: find() function retrieves all fields (except for _id) in documents from carInfo collection and returns them as a cursor. Here’s the result:

  undefined

  A4: Close MongoDB connection.

  A5: Left join A2 and A3 through the joining fields, which are A2’s SellerId and A3’s EId. For the sake of convenience, we name the two parts of the joined result set sales and emp. join() function performs a join and @1 option enables a left join. The result is shown below (the left part):

  undefined

  Some SellerIds in sales don’t have corresponding records in emp. You can use @f to perform a full join. The absence of any option means an inner join.

  A6: Execute the join and get desired fields from the joining result to form a two-dimensional table using new() function. “sales.OrderID:OrderID” gets sales.OrderID field from A5 and rename it OrderID (because the reporting tool can’t identify field name like sales.OrderID). Below is the data for report development:

  undefined

  A7: Return A6’s two dimensional table to the reporting tool.

SPL offers a JDBC interface to be identified as database by a reporting tool for integration. Take JasperReport as an example. Here’s the report template:

  undefined

  Then we need to define two report parameters – Pbegin and Pend, which correspond to parameter begin and parameter end in the SPL script. Now you can preview the report:

  undefined

  A reporting tool calls an SPL script in the same way of calling a stored procedure. We can save the SPL script as mongodbJoin.dfx, which can be called from JasperReport’s designer using “mongodbJoin $P{pbegin},$P{pend}”.

 

  In short, SPL retrieves data from the MongoDB collections, joins them up through a join operation, creates a new table sequence based on the joining result set using new() function, and then returns the result set to the reporting tool for presentation. The participation of SPL’s data association makes processing MongoDB data source simple and easy. And the integration is simple, too.