Performing Joins over MongoDB with SPL
The MongoDB versions before MongoDB 3.2 don’t support collection joins in a direct way and coders have to resort hardcoding. Though it added $lookup interface beginning from MongoDB 3.2, the coding is still not straightforward. Here we illustrate how esProc SPL handles MongoDB data joins efficiently, including inner join, left join, full join and subdocument join through an example.
Logically the relationship between categories and rules are the main collection and the sub collection. They are associated by cat field. Task: perform a left join to get the title field, regex field and cat field from rules and the corresponding data in categories’s path field. Below is part of the source data:
categories |
rules |
{ "_id" : ObjectId("551d1c61a82a701368ec49ad"), |
{"_id" : ObjectId("551d1c7da82a701368ec49af"), |
SPL script:
A |
B |
|
1 |
=mongo_open("mongodb://localhost:27017/local?user=test&password=test") |
|
2 |
=mongo_shell(A1,"categories.find({},{_id:0}).sort({cat:1})") |
=mongo_shell(A1,"rules.find(,{_id:0})") |
3 |
=joinx@1(B2,cat;A2,cat) |
|
4 |
=A3.new(_1.title,_1.regex,_1.cat,_2.path) |
|
5 |
=A4.fetch() |
|
6 |
=mongo_close(A1) |
A1: Connect to MongoDB using the connection string mongodb://ip:port/db?arg=value&….
A2, B2: find() function retrieves and sorts data in the MongoDB collections and return a cursor. The left table A2 should be ordered by cat.
A3: joinx() function joins the two tables via join field. @1 option enables a left join.
A4: et desired fields from A3. _1 and _2 represent the two cursors respectively.
A5: Fetch data from A4’s cursor:
_1.title |
_1.regex |
_1.cat |
_2.path |
braveheart |
^.*braveheart.*$ |
Films Anglais |
W:\videos |
Titanic |
^.*Titanic.*$ |
Films Anglais |
W:\videos |
The Avengers |
^.*TheAvengers.*$ |
Films Anglais |
W:\videos |
someSEO |
^.*someSEO.*$ |
SEO Anglais |
|
agriculture |
^.*agriculture.*$ |
Transaction Anglais |
W:\Tran |
industry |
^.*industry.*$ |
Transaction Anglais |
W:\Tran |
If the final result set is too large to be loaded into the memory at a time, you can use export() function to write A4’s result to a target file.
A6: Close connection to MongoDB.
About more complicated joins over MongoDB data, see Simplifying MongoDB Data Association.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL
Chinese version