" With $lookup operator, MongoDB supports the most basic data association between multiple table .."

pjf RaqForum 22 No.
1 Reply • 1,054 View • 5 Years ago

Simplifying MongoDB Data Association

With $lookup operator, MongoDB supports the most basic data association between multiple tables. When data association tasks are complicated, the Mongo shell script becomes complicated. With SPL’s discreteness and ease of use (SPL is the abbreviation of Structured Process Language developed by Raqsoft), esProc can help simplify data association in MongoDB. There are more related articles in Raqsoft Community (https://c.raqsoft.com/).

MongoDB is a document-oriented NoSQL database operated on a distributed file system. It stores documents in BSON format, which descripts entity attributes in a more natural and intuitive manner. Since Version 3.2, MongoDB has improved its query capability by supporting data association via $lookup. The real-world business scenarios, however, are complicated and the computing problems are knotty. This results in tortuous scripting process.

In this article, we’ll explain how to simplify association of MongoDB data in SPL.

1. Association with embedded structures (1)
2. Association with embedded structures (2)
3. Association with embedded structures (3)
4. Association with selected fields
5. Multi-table association
6. Associated query by array values
7. Calling esProc dfx script in Java

1. Association with embedded structures (1)

This involves two associated tables. One of table A’s field is associated with the sub-documents in table B, and the to-be-returned data is contained in the sub-documents. For example, the childs field of childsgroup table contains embedded arrays, in which the name item and mobile item need to be returned.

Test data:

history:

_id	id	History	child_id
1	001	today worked	ch001
2	002	Working	ch004
3	003	now working	ch009

childsgroup:

_id	groupid	name	childs
1	g001	group1	{"id":"ch001","info":{"name":"a",mobile:1111}}, {"id":"ch002","info":{"name":"b",mobile:2222}}
2	g002	group1	{"id":"ch004","info":{"name":"c",mobile:3333}}, {"id":"ch009","info":{"name":"d",mobile:4444}}

The child_id field in history table associates with childs.id in childsgroup table. Below is the desired joining result:
{
“_id” : ObjectId(“5bab2ae8ab2f1bdb4f434bc3”),
“id” : “001”,
“history” : “today worked”,
“child_id” : “ch001”,
“childInfo” :
{
“name” : “a”,
“mobile” : 1111
}
………………
}

MongoDB script:

db.history.aggregate([
{$lookup: {
from: "childsgroup",
let: {child_id: "$child_id"},
pipeline: [
{$match: { $expr: { $in: [ "$$child_id", "$childs.id"] } } },
{$unwind: "$childs"},
{$match: { $expr: { $eq: [ "$childs.id", "$$child_id"] } } },
{$replaceRoot: { newRoot: "$childs.info"} }
],
as: "childInfo"
}},
{"$unwind": "$childInfo"}
])

The script uses five operators, including lookup, pipeline, match, unwind and replaceRoot, to handle the computation. It’s complicated and difficult to write.

SPL script (childsgroup.dfx)

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"history.find()").fetch()
3	=mongo_shell(A1,"childsgroup.find()").fetch()
4	=A3.conj(childs)
5	=A2.join(child_id,A4:id,info)
6	>A1.close()

Associated query result:

_id	id	history	child_id	info
1	001	today worked	ch001	[a,1111]
2	002	working	ch004	[c,3333]
3	003	now working	ch009	[d,4444]

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read in data from history table.
A3: Read in data from childsgroup table.
A4: Concatenate the array values in childsgroup table’s childs field into a table sequence.
A5: Perform an associated query between history table an A4’s childs table via the associated fields child_id and id and append an info field to generate a new table sequence.
A6: Close the database connection.

The SPL script is simple, clear and efficient.

2. Association with embedded structures (2)

There are two associated tables. Table A is associated with table B’s embedded documents. The task is to combine table A’s data under the embedded documents. For example, txtPost table’s Comment field contains array values; combine comment_content field of txtComment table under the arrays.

txtComment:

_ID	comment_no	comment_content
1	143	test test
2	140	math

txtPost:

_ID	post_no	Comment
1	48	[{"comment_no" : 143, "comment_group" : 1} ]
2	47	[{"comment_no" : 140, "comment_group" : 2}， {"comment_no" : 143, "comment_group" : 3} ]

Desired result:

_ID	post_no	Comment
1	48	[{"comment_no" : 143, "comment_group" : 1，"comment_content" : "test test"} ]
2	47	[{"comment_no" : 140, "comment_group" : 2，"comment_content" : "math"}， {"comment_no" : 143, "comment_group" : 3，"comment_content" : "test test"} ]

MongoDB script:

db.getCollection("txtPost").aggregate([
{"$unwind": "$comment"},
{ "$lookup": {
"from": "txtComment",
"localField": "comment.comment_no",
"foreignField": "comment_no",
"as": "comment.comment_content"
}},
{"$unwind": "$comment.comment_content"},
{"$addFields": { "comment.comment_content":"$comment.comment_content.comment_content"}},
{ "$group": {
"_id": "$_id",
'post_no':{"$first": "$post_no"},
"comment": {"$push": "$comment"}
}},

]).pretty()

Split txtPost records according to members of Comment arrays into more records, perform an associated query between them with records of txtComment, store the result as an array and split the array into records, move comment_content values under Comment field, and group and concatenate the records.

SPL script:

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"txtPost.find()").fetch()
3	=mongo_shell(A1,"txtComment.find()").fetch()
4	=A2.conj(comment.derive(A2.post_no:pno))
5	=A4.join(comment_no,A3:comment_no,comment_content:Content)
6	=A5.group(pno;~:comment)
7	>A1.close()

Associated query result:

pno	Comment
47	[[140, 2，47, …]，[143, 3，47, …] ]
48	[[143, 1，48, …]]

SPL script explanation:
A1：Connect to MongoDB database.
A2：Read data from txtPost table.
A3：Read data from txComment table.
A4：Create a table sequence with A2’s Comment field and post_no field, which is renamed pno.
A5：Associate A4’s table sequence with A3’s table sequence via comment_no and append comment_content field, which is renamed Content.
A6：Group records by pno and return the current table; ~ represents the current record.
A7：Close the database connection.

Both scripts transform the embedded data into data stored in columns and rows, perform association, and then group the associated records. But SPL script is clearer and more concise.

3. Association with embedded structures (3)

This involves two associated tables. Table A is associated with the sub-documents in table B, and the to-be-returned information is contained in the sub-records. For example, product field in collection2 contains array values and the association result needs to include fields in isCompleted array in collection2.

Test data:

collection1:

{
   _id: '5bc2e44a106342152cd83e97',
   description:
   {
     status: 'Good',
     machine: 'X'
   },
   order: 'A',
   lot: '1'
};

collection2：

{
   _id: '5bc2e44a106342152cd83e80',
   isCompleted: false,
   serialNo: '1',
   batchNo: '2',
   <!-- note the subdocuments here-->   
   product: [ 
     { order: 'A', lot: '1' },
     { order: 'A', lot: '2' }
   ]
}

Desired result:

{
   _id: 5bc2e44a106342152cd83e97,
   description:
   {
      status: 'Good',
      machine: 'X',
   },
   order: 'A',
   lot: '1' ,
   isCompleted: false,
   serialNo: '1',
   batchNo: '2'
}

MongoDB script:

db.collection1.aggregate([{
$lookup: {
from: "collection2",
let: {order: "$order", lot: "$lot"},
pipeline: [{
$match: {
$expr:{$in: [ { order: "$$order", lot: "$$lot"}, "$product"] }
}
}],
as: "isCompleted"
}
}, {
$addFields: {
"isCompleted": {$arrayElemAt: [ "$isCompleted", 0] }
}
}, {
$addFields: { // add the required fields to the top level structure
"isCompleted": "$isCompleted.isCompleted",
"serialNo": "$isCompleted.serialNo",
"batchNo": "$isCompleted.batchNo"
}
}])

$lookup performs an associated query between the two tables. The first $addFields gets the first field from isCompleted array, and the second $addFields adds the other required fields.

SPL script:

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"collection1.find()").fetch()
3	=mongo_shell(A1,"collection2.find()").fetch()
4	=A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno))
5	>A1.close()

SPL script explanation:
A1：Connect to MongoDB database.
A2：Read data from collection1 table.
A3：Read data from collection2 table.
A4：Query A2’s table according to order field and lot field, add A3’s serialNo field and batchNo field, and return the associated table sequence.
A5：Close the database connection.

Both scripts get the desired result. SPL script is clear in selecting required data from the embedded documents and concatenating them into a new table sequence.

4. Association with selected fields

Create a new table through association and with selected fields.

collection1:

user1	user2	income
1	2	0.56
1	3	0.26

collection2:

user1	user2	output
1	2	0.3
1	3	0.4
2	3	0.5

Desired result：

user1	user2	income	output
1	2	0.56	0.3
1	3	0.26	0.4

MongoDB script:

db.c1.aggregate([
{ "$lookup": {
"from": "c2",
"localField": "user1",
"foreignField": "user1",
"as": "collection2_doc"
}},
{"$unwind": "$collection2_doc"},
{ "$redact": {
"$cond": [
{"$eq": [ "$user2", "$collection2_doc.user2"] },
"$$KEEP",
"$$PRUNE"
]
}},
{ "$project": {
"user1": 1,
"user2": 1,
"income": "$income",
"output": "$collection2_doc. output"
}}
]).pretty()

$lookup performs an associated query. $redact traverses records to perform conditional handling. $project selects the desired fields.

SPL script:

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"c1.find()").fetch()
3	=mongo_shell(A1,"c2.find()").fetch()
4	=A2.join(user1:user2,A3:user1:user2,output)
5	>A1.close()

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from collection1 table.
A3: Read data from collection2 table.
A4: Join the two tables by user1 and user2, append A3’s output field and return a new table sequence.
A5: Close the database connection.

Both MongDB script and SPL script achieve the computing goal. The latter combines some of fields in two tables via a join into a new table, which is similar to relational database computation.

5. Multi-table association

This is the association between more than two tables to get a merged table.

Doc1:

_id	firstName	lastName
U001	shubham	verma

Doc2:

_id	userId	address	mob
2	U001	Gurgaon	9876543200

Doc3:

_id	userId	fbURLs	twitterURLs
3	U001	http://www.facebook.com	http://www.twitter.com

Merged result:

{
     "_id" : ObjectId("5901a4c63541b7d5d3293766"),
     "firstName" : "shubham",
     "lastName" : "verma",

     "address" : {
         "address" : "Gurgaon"
     },
     "social" : {
         "fbURLs" : "http://www.facebook.com",
         "twitterURLs" : "http://www.twitter.com"
     }
}

MongoDB script:

db.doc1.aggregate([
{$match: { _id: ObjectId("5901a4c63541b7d5d3293766") } },
{
$lookup:
{
from: "doc2",
localField: "_id",
foreignField: "userId",
as: "address"
}
},
{
$unwind: "$address"
},
{
$project: {
"address._id": 0,
"address.userId": 0,
"address.mob": 0
}
},
{
$lookup:
{
from: "doc3",
localField: "_id",
foreignField: "userId",
as: "social"
}
},
{
$unwind: "$social"
},

{
$project: {
"social._id": 0,
"social.userId": 0
}
}
]).pretty();

There can be different MongoDB scripts to do this and various merged results in MongoDB thanks to the database’s unique data structure.

SPL script:

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"doc1.find()").fetch()
3	=mongo_shell(A1,"doc2.find()").fetch()
4	=mongo_shell(A1,"doc3.find()").fetch()
5	=A2.join(_id,A3:userId,address,mob)
6	=A5.join(_id,A4:userId,fbURLs,twitterURLs)
7	>A1.close()

This SPL script is similar to that in the previous example, with only one more joined table. Every join adds a new field to form a merged table.

Obviously, the SPL script is concise and coherent.

6. Associated query by array values

Get records from a table according to the array field in its associated table and generate a new table with certain fields.

Test data:

users:

_id	Name	workouts
1000	xxx	[2,4,6]
1002	yyy	[1,3,5]

workouts:

_id	Date	Book
1	1/1/2001	Othello
2	2/2/2001	A Midsummer Night's Dream
3	3/3/2001	The Old Man and the Sea
4	4/4/2001	GULLIVER’S TRAVELS
5	5/5/2001	Pickwick Papers
6	6/6/2001	The Red and the Black

Desired result:

Name	_id	Date	Book
xxx	2	2/2/2001	A Midsummer Night's Dream
xxx	4	4/4/2001	GULLIVER’S TRAVELS
xxx	6	6/6/2001	The Red and the Black
yyy	1	1/1/2001	Othello
yyy	3	3/3/2001	The Old Man and the Sea
yyy	5	5/5/2001	Pickwick Papers

MongoDB script:

db.users.aggregate([
{ "$lookup": {
"from" : "workouts",
"localField" : "workouts",
"foreignField" : "_id",
"as" : "workoutDocumentsArray"
}},
{$project: { _id:0,workouts:0} } ,
{"$unwind": "$workoutDocumentsArray"},

{"$replaceRoot": { "newRoot": { $mergeObjects: [ "$$ROOT", "$workoutDocumentsArray"] } } },
{$project: { workoutDocumentsArray: 0} }
]).pretty()

Put the result of associated query between users and workouts into an array, split the array and move the embedded document to the top level, and remove the unwanted fields.

SPL script (users.dfx):

	A
1	=mongo_open("mongodb://127.0.0.1:27017/raqdb")
2	=mongo_shell(A1,"users.find()").fetch()
3	=mongo_shell(A1,"workouts.find()").fetch()
4	=A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name))
5	>A1.close()

SPL script explanation:
A1: Connect to MongoDB database.
A2: Read data from users table.
A3: Read data from workouts table.
A4: Get records from workouts table where _id field is included in workouts field of A2’s table sequence, add name field to the records, and return the concatenated table sequence.
A5: Close the database connection.

The operation is considered normal only if the matching result isn’t null. So _id field is converted into a sequence to make the comparison convenient in A4.

The SPL script is more integration-friendly, more flexible and much simpler.

7. Calling esProc dfx script in Java

The result of executing an SPL script can be easily integrated by a Java application. esProc provides JDBC driver to be accessed withJava stored procedure. That is the same as calling a SQL stored procedure in Java. (For esProc JDBC deployment, refer to 1.9 esProc JDBC in esProc Tutorial)

Below is the process of calling an SPL script from Java:

    public void testUsers(){
        Connection con = null;
        com.esproc.jdbc.InternalCStatement st;
        try{
            //Establish database connection
            Class.forName("com.esproc.jdbc.InternalDriver");
            con= DriverManager.getConnection("jdbc:esproc:local://");
            // Call the stored procedure, in which user is the dfx file’s name
            st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call users ()");
            // Execute the stored procedure
            st.execute();
            //Get result set
            ResultSet rs = st.getResultSet();

            ......

            catch(Exception e){
                System.out.println(e);
            }

This is a standard JDBC invocation, which is easy to embed an SPL script into a Java application. esProc also supports the ODBC driver, which makes it easy to be integrated with languages that use ODBC.

Compared with relational databases, MongoDB stores data in a more complicated, more flexible way, has powerful and adaptable query syntax, and provides various functions and rich cooperative possibilities between them. It’s not easy to be proficient in MongoDB. esProc, however, features discreteness and ease of use to become the NoSQL database’s right-hand and can help it play to its full potential.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 1,054 View • 5 Years ago

Simplifying MongoDB Data Association

1. Association with embedded structures (1)

2. Association with embedded structures (2)

3. Association with embedded structures (3)

4. Association with selected fields

5. Multi-table association

6. Associated query by array values

7. Calling esProc dfx script in Java

ToC