Performing Joins over MongoDB with SPL

The MongoDB versions before MongoDB 3.2 dont support collection joins in a direct way and coders have to resort hardcoding. Though it added $lookup interface beginning from MongoDB 3.2, the coding is still not straightforward. Here we illustrate how esProc SPL handles MongoDB data joins efficiently, including inner join, left join, full join and subdocument join through an example.

Logically the relationship between categories and rules are the main collection and the sub collection. They are associated by cat field. Task: perform a left join to get the title field, regex field and cat field from rules and the corresponding data in categoriess path field. Below is part of the source data:

categories

rules

{ "_id" : ObjectId("551d1c61a82a701368ec49ad"),
   "cat" : "Transaction Anglais",
   "path" : "W:\\Tran"}
{"_id" :  ObjectId("551d1c61a82a701368ec49ae"),
   "cat" : "Films Anglais",
   "path" : "W:\\videos"}

{"_id" :  ObjectId("551d1c7da82a701368ec49af"),
   "title" : "braveheart",
   "regex" : "^.*braveheart.*$",
   "cat" : "Films Anglais"}
{"_id" : ObjectId("551d1c7da82a701368ec49b0"),
   "title" : "Titanic",
   "regex" : "^.*Titanic.*$",
   "cat" : "Films Anglais"}
{"_id" :  ObjectId("551d1c7da82a701368ec49b1"),
   "title" : "The Avengers",
   "regex" : "^.*TheAvengers.*$",
   "cat" : "Films Anglais"}
{"_id" :  ObjectId("551d1c7da82a701368ec49b2"),
   "title" : "someSEO",
   "regex" : "^.*someSEO.*$",
   "cat" : "SEO Anglais"}
{"_id" :  ObjectId("551d1c7da82a701368ec49b3"),
   "title" : "agriculture",
   "regex" : "^.*agriculture.*$",
   "cat" : "Transaction Anglais"}
{"_id" :  ObjectId("551d1c7da82a701368ec49b4"),
   "title" : "industry",
   "regex" : "^.*industry.*$",
   "cat" : "Transaction Anglais"
 }

SPL script:


A

B

1

=mongo_open("mongodb://localhost:27017/local?user=test&password=test")

2

=mongo_shell(A1,"categories.find({},{_id:0}).sort({cat:1})")

=mongo_shell(A1,"rules.find(,{_id:0})")

3

=joinx@1(B2,cat;A2,cat)


4

=A3.new(_1.title,_1.regex,_1.cat,_2.path)


5

=A4.fetch()


6

=mongo_close(A1)


A1: Connect to MongoDB using the connection string mongodb://ip:port/db?arg=value&….
A2, B2: find() function retrieves and sorts data in the MongoDB collections and return a cursor. The left table A2 should be ordered by cat.
A3: joinx() function joins the two tables via join field. @1 option enables a left join.
A4: et desired fields from A3. _1 and _2 represent the two cursors respectively.
A5: Fetch data from A4s cursor:

_1.title

_1.regex

_1.cat

_2.path

braveheart

^.*braveheart.*$

Films Anglais

W:\videos

Titanic

^.*Titanic.*$

Films Anglais

W:\videos

The Avengers

^.*TheAvengers.*$

Films Anglais

W:\videos

someSEO

^.*someSEO.*$

SEO Anglais


agriculture

^.*agriculture.*$

Transaction Anglais

W:\Tran

industry

^.*industry.*$

Transaction Anglais

W:\Tran

 

If the final result set is too large to be loaded into the memory at a time, you can use export() function to write A4s result to a target file.
A6: Close connection to MongoDB.

About more complicated joins over MongoDB data, see Simplifying MongoDB Data Association.