MongoDB JOIN MySQL for Reporting Tools

 

  In many cases of report development, the original data comes from different types of data source, such as MongoDB and MySQL. Generally it’s difficult to deal with the heterogeneous data sources with a reporting tool alone. JasperReport and Birt offers virtual data source or table join in their business editions or advanced editions, but there are no similar features in free editions. Those features have limitations yet. The joining result can’t be manipulated by SQL-like structured operations.

  SPL (Structured Process Language) has great support of heterogeneous data sources, which makes esProc a powerful computing engine. An SPL script is integration-friendly. You can process this type of report source with esProc for various reporting tools. We’ll illustrate this through an example.

  emp is a MongoDB collection. cities is a MySQL table that consists of two fields – CityID and CityName. emp’s CityID field is treated as a foreign key logically, which points to CityID field of cities table. The task is to find employees from emp by a specified time period and replace CityID with CityName. Below is part of the sources:

  emp collection:

EID

Dept

CityID

Name

Gender

Salary

Birthday

10

R&D

199

Ryan

M

13000

1976-03-12

100

Sales


Jacob

M

5000

1978-02-12

101

Sales

56

Michael

M

6500

1984-03-29

102

Sales

46

Christian

M

12000

1972-07-25

103

Marketing

34

Madison

F

5000

1976-07-11

104

Marketing

6

Sarah

F

8000

1982-11-17

105

Marketing

4

Tyler

M

6500

1978-04-08

106

Marketing

6

Emily

F

7000

1975-12-05

107

Marketing

2

Madison

F

5000

1981-09-29

  cities table:

CityID

CityName

Population

StateId

1

New   York

8084316

32

2

Los   Angeles

3798981

5

3

Chicago

2886251

13

4

Houston

2009834

43

5

Philadelphia

1492231

38

6

Phoenix

1371960

3

7

San   Diego

1259532

5

 

  SPL script:


A

1

=mongo_open("mongodb://localhost:27017/test?user=test&password=test")

2

=mongo_shell(A1,"emp.find({'$and':[{'Birthday':{'$gte':'"+string(begin)+"'}},{'Birthday':{'$lte':'"+string(end)+"'}}]},{_id:0})").fetch()

3

=mongo_close(A1)

4

=myDB1.query("select * from cities")

5

=A2.switch(CityID,A4: CityID)

6

=A5.new(EID,Dept,CityID.CityName:CityName,Name,Gender)

7

result A6

  A1: Connect to MongoDB. The connection string syntax is mongo://ip:port/db?arg=value&….

  A2: find() function gets all fields of documents (except for _id) in a certain time period from emp collection and returns them as a cursor. begin and end are conditional parameters coming outside of the report. They indicate the start time and end time of Birthday. An SPL cursor can be retrieved and processed in batches. That avoids memory overflow due to a data volume exceeding memory capacity. Here the data volume is relatively small, so we can fetch all records from the cursor at one time. Below is A2’s result:

   undefined

  A3: Close MongoDB connection.

  A4: Perform a SQL query to retrieve data from the MySQL table. myDB1 is the database name. Below is the configuration window:

  undefined

  The MySQL database is connected via esProc JDBC, which supports connection to any type of database. A JDBC-type data source can be automatically or manually connected or closed. Here’s an auto connect/close.

  query() function retrieves data using a SQL statement. Here’s the result:

  undefined

  A5: Switch over A2’s CityID field with A4’s corresponding records. The operation is similar to a left join. Below is A2 after the switchover (Both A2 and A5 points to the same two-dimensional table):

  undefined

  You can click the hyperlinks in blue under CityID field to view the records:

  undefined

  To perform an inner join, just use @i option in switch() function: A2.switch@i(CityID,A4). The result will be like this:

  undefined

  A6:  Get desired fields from the joining result to create a new two-dimensional table using new() function. “CityID.CityName:CityName” gets CityName field from the records identified by CityID field and renames it CityName (Because a reporting tool can’t identify a field name like CityID.CityName).

  The switchover operation enables accessing an object in an associated table directly. This makes the access intuitive and simple. It is especially convenient when it is a multi-table, multilevel association. Below is the data the report building needs:

  undefined

  A7: Return A6’s two-dimensional table to the reporting table.

  SPL offers a JDBC interface to be identified as database by a reporting tool for integration.

  Take JasperReport as an example. Here’s the report template:

  undefined

  Then we need to define two report parameters – Pbegin and Pend, which correspond to parameter begin and parameter end in the SPL script. Now you can preview the report:

  undefined

  A reporting tool calls an SPL script in the same way of calling a stored procedure. We can save the SPL script as mongodbJoin2.dfx, which can be called from JasperReport’s designer using "mongodbJoin2 $P{pbegin},$P{pend}"

 

  In brief, SPL retrieves data from the MongoDB collection and gets records from a MySQL table, joins them up through a switchover operation to replace a field in the collection with the corresponding records in the table, creates a new table sequence based on the joining result set using new() function, and then returns the result set to the reporting tool for presentation. The participation of SPL’s data association makes processing heterogeneous data sources simple and easy. And the integration is simple, too.