Simplifying MongoDB Data Association

 

  With $lookup operator, MongoDB supports the most basic data association between multiple tables. When data association tasks are complicated, the Mongo shell script becomes complicated. With SPL’s discreteness and ease of use (SPL is the abbreviation of Structured Process Language developed by Raqsoft), esProc can help simplify data association in MongoDB. There are more related articles in Raqsoft Community (http://c.raqsoft.com/).

  MongoDB is a document-oriented NoSQL database operated on a distributed file system. It stores documents in BSON format, which descripts entity attributes in a more natural and intuitive manner. Since Version 3.2, MongoDB has improved its query capability by supporting data association via $lookup. The real-world business scenarios, however, are complicated and the computing problems are knotty. This results in tortuous scripting process.

  In this article, we’ll explain how to simplify association of MongoDB data in SPL.

  1. Association with embedded structures (1)
  2. Association with embedded structures (2)
  3. Association with embedded structures (3)
  4. Association with selected fields
  5. Multi-table association
  6. Associated query by array values
  7. Calling esProc dfx script in Java

1. Association with embedded structures (1)

  This involves two associated tables. One of table A’s field is associated with the sub-documents in table B, and the to-be-returned data is contained in the sub-documents. For example, the childs field of childsgroup table contains embedded arrays, in which the name item and mobile item need to be returned.

Test data:

history:

_id id History child_id
1 001 today worked ch001
2 002 Working ch004
3 003 now working ch009

childsgroup:

_id groupid name childs
1 g001 group1 {"id":"ch001","info":{"name":"a",mobile:1111}},
{"id":"ch002","info":{"name":"b",mobile:2222}}
2 g002 group1 {"id":"ch004","info":{"name":"c",mobile:3333}},
{"id":"ch009","info":{"name":"d",mobile:4444}}

  The child_id field in history table associates with childs.id in childsgroup table. Below is the desired joining result:
  {
    “_id” : ObjectId(“5bab2ae8ab2f1bdb4f434bc3”),
    “id” : “001”,
    “history” : “today worked”,
    “child_id” : “ch001”,
    “childInfo” :
    {
      “name” : “a”,
      “mobile” : 1111
    }
    ………………
  }

MongoDB script:

db.history.aggregate([
  {$lookup: {
    from: "childsgroup",
    let: {child_id: "$child_id"},
    pipeline: [
      {$match: { $expr: { $in: [ "$$child_id", "$childs.id"] } } },
      {$unwind: "$childs"},
      {$match: { $expr: { $eq: [ "$childs.id", "$$child_id"] } } },
      {$replaceRoot: { newRoot: "$childs.info"} }
      ],
      as: "childInfo"
    }},
  {"$unwind": "$childInfo"}
])

  The script uses five operators, including lookup, pipeline, match, unwind and replaceRoot, to handle the computation. It’s complicated and difficult to write.

SPL script (childsgroup.dfx)

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"history.find()").fetch()
3 =mongo_shell(A1,"childsgroup.find()").fetch()
4 =A3.conj(childs)
5 =A2.join(child_id,A4:id,info)
6 >A1.close()

Associated query result:

_id id history child_id info
1 001 today worked ch001 [a,1111]
2 002 working ch004 [c,3333]
3 003 now working ch009 [d,4444]

SPL script explanation:
  A1: Connect to MongoDB database.
  A2: Read in data from history table.
  A3: Read in data from childsgroup table.
  A4: Concatenate the array values in childsgroup table’s childs field into a table sequence.
  A5: Perform an associated query between history table an A4’s childs table via the associated fields child_id and id and append an info field to generate a new table sequence.
  A6: Close the database connection.

  The SPL script is simple, clear and efficient.

2. Association with embedded structures (2)

  There are two associated tables. Table A is associated with table B’s embedded documents. The task is to combine table A’s data under the embedded documents. For example, txtPost table’s Comment field contains array values; combine comment_content field of txtComment table under the arrays.

txtComment:

_ID comment_no comment_content
1 143 test test
2 140 math

txtPost:

_ID post_no Comment
1 48 [{"comment_no" : 143, "comment_group" : 1} ]
2 47 [{"comment_no" : 140, "comment_group" : 2},
 {"comment_no" : 143, "comment_group" : 3} ]

Desired result:

_ID post_no Comment
1 48 [{"comment_no" : 143, "comment_group" : 1,"comment_content" : "test test"} ]
2 47 [{"comment_no" : 140, "comment_group" : 2,"comment_content" : "math"},
 {"comment_no" : 143, "comment_group" : 3,"comment_content" : "test test"} ]

MongoDB script:

db.getCollection("txtPost").aggregate([
  {"$unwind": "$comment"},
  { "$lookup": {
    "from": "txtComment",
    "localField": "comment.comment_no",
    "foreignField": "comment_no",
    "as": "comment.comment_content"
  }},
  {"$unwind": "$comment.comment_content"},
  {"$addFields": { "comment.comment_content":"$comment.comment_content.comment_content"}},
  { "$group": {
    "_id": "$_id",
    'post_no':{"$first": "$post_no"},
    "comment": {"$push": "$comment"}
  }},
  
]).pretty()

  Split txtPost records according to members of Comment arrays into more records, perform an associated query between them with records of txtComment, store the result as an array and split the array into records, move comment_content values under Comment field, and group and concatenate the records.

SPL script:

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"txtPost.find()").fetch()
3 =mongo_shell(A1,"txtComment.find()").fetch()
4 =A2.conj(comment.derive(A2.post_no:pno))
5 =A4.join(comment_no,A3:comment_no,comment_content:Content)
6 =A5.group(pno;~:comment)
7 >A1.close()

Associated query result:

pno Comment
47 [[140, 2,47, …],[143, 3,47, …] ]
48 [[143, 1,48, …]]

SPL script explanation:
  A1:Connect to MongoDB database.
  A2:Read data from txtPost table.
  A3:Read data from txComment table.
  A4:Create a table sequence with A2’s Comment field and post_no field, which is renamed pno.
  A5:Associate A4’s table sequence with A3’s table sequence via comment_no and append comment_content field, which is renamed Content.
  A6:Group records by pno and return the current table; ~ represents the current record.
  A7:Close the database connection.

  Both scripts transform the embedded data into data stored in columns and rows, perform association, and then group the associated records. But SPL script is clearer and more concise.

3. Association with embedded structures (3)

  This involves two associated tables. Table A is associated with the sub-documents in table B, and the to-be-returned information is contained in the sub-records. For example, product field in collection2 contains array values and the association result needs to include fields in isCompleted array in collection2.

Test data:

collection1:

{
   _id: '5bc2e44a106342152cd83e97',
   description:
   {
     status: 'Good',
     machine: 'X'
   },
   order: 'A',
   lot: '1'
};

collection2:

{
   _id: '5bc2e44a106342152cd83e80',
   isCompleted: false,
   serialNo: '1',
   batchNo: '2',
   <!-- note the subdocuments here-->   
   product: [ 
     { order: 'A', lot: '1' },
     { order: 'A', lot: '2' }
   ]
}

Desired result:

{
   _id: 5bc2e44a106342152cd83e97,
   description:
   {
      status: 'Good',
      machine: 'X',
   },
   order: 'A',
   lot: '1' ,
   isCompleted: false,
   serialNo: '1',
   batchNo: '2'
}

MongoDB script:

db.collection1.aggregate([{
  $lookup: {
    from: "collection2",
    let: {order: "$order", lot: "$lot"},
    pipeline: [{
      $match: {
      $expr:{$in: [ { order: "$$order", lot: "$$lot"}, "$product"] }
      }
      }],
      as: "isCompleted"
      }
    }, {
      $addFields: {
      "isCompleted": {$arrayElemAt: [ "$isCompleted", 0] }
      }
    }, {
      $addFields: { // add the required fields to the top level structure
      "isCompleted": "$isCompleted.isCompleted",
      "serialNo": "$isCompleted.serialNo",
      "batchNo": "$isCompleted.batchNo"
    }
}])

  $lookup performs an associated query between the two tables. The first $addFields gets the first field from isCompleted array, and the second $addFields adds the other required fields.

SPL script:

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"collection1.find()").fetch()
3 =mongo_shell(A1,"collection2.find()").fetch()
4 =A3.conj(A2.select(order:A3.product.order,lot:A3.product.lot).derive(A3.serialNo:sno,A3.batchNo:bno))
5 >A1.close()

SPL script explanation:
  A1:Connect to MongoDB database.
  A2:Read data from collection1 table.
  A3:Read data from collection2 table.
  A4:Query A2’s table according to order field and lot field, add A3’s serialNo field and batchNo field, and return the associated table sequence.
  A5:Close the database connection.

  Both scripts get the desired result. SPL script is clear in selecting required data from the embedded documents and concatenating them into a new table sequence.

4. Association with selected fields

  Create a new table through association and with selected fields.

collection1:

user1 user2 income
1 2 0.56
1 3 0.26

collection2:

user1 user2 output
1 2 0.3
1 3 0.4
2 3 0.5

Desired result:

user1 user2 income output
1 2 0.56 0.3
1 3 0.26 0.4

MongoDB script:

db.c1.aggregate([
  { "$lookup": {
  "from": "c2",
    "localField": "user1",
    "foreignField": "user1",
    "as": "collection2_doc"
  }},
  {"$unwind": "$collection2_doc"},
  { "$redact": {
    "$cond": [
      {"$eq": [ "$user2", "$collection2_doc.user2"] },
      "$$KEEP",
      "$$PRUNE"
    ]
  }},
  { "$project": {
    "user1": 1,
    "user2": 1,
    "income": "$income",
    "output": "$collection2_doc. output"
  }}
]).pretty()

  $lookup performs an associated query. $redact traverses records to perform conditional handling. $project selects the desired fields.

SPL script:

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"c1.find()").fetch()
3 =mongo_shell(A1,"c2.find()").fetch()
4 =A2.join(user1:user2,A3:user1:user2,output)
5 >A1.close()

SPL script explanation:
  A1: Connect to MongoDB database.
  A2: Read data from collection1 table.
  A3: Read data from collection2 table.
  A4: Join the two tables by user1 and user2, append A3’s output field and return a new table sequence.
  A5: Close the database connection.

  Both MongDB script and SPL script achieve the computing goal. The latter combines some of fields in two tables via a join into a new table, which is similar to relational database computation.

5. Multi-table association

  This is the association between more than two tables to get a merged table.

Doc1:

_id firstName lastName
U001 shubham verma

Doc2:

_id userId address mob
2 U001 Gurgaon 9876543200

Doc3:

_id userId fbURLs twitterURLs
3 U001 http://www.facebook.com http://www.twitter.com

Merged result:

{
     "_id" : ObjectId("5901a4c63541b7d5d3293766"),
     "firstName" : "shubham",
     "lastName" : "verma",

     "address" : {
         "address" : "Gurgaon"
     },
     "social" : {
         "fbURLs" : "http://www.facebook.com",
         "twitterURLs" : "http://www.twitter.com"
     }
}

MongoDB script:

db.doc1.aggregate([
  {$match: { _id: ObjectId("5901a4c63541b7d5d3293766") } },
  {
    $lookup:
    {
      from: "doc2",
      localField: "_id",
      foreignField: "userId",
      as: "address"
    }
  },
  {
    $unwind: "$address"
  },
  {
    $project: {
      "address._id": 0,
      "address.userId": 0,
      "address.mob": 0
    }
  },
  {
    $lookup:
    {
      from: "doc3",
      localField: "_id",
      foreignField: "userId",
      as: "social"
    }
  },
  {
    $unwind: "$social"
  },
  
  {
    $project: {
      "social._id": 0,
      "social.userId": 0
    }
  }
]).pretty();

  There can be different MongoDB scripts to do this and various merged results in MongoDB thanks to the database’s unique data structure.

SPL script:

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"doc1.find()").fetch()
3 =mongo_shell(A1,"doc2.find()").fetch()
4 =mongo_shell(A1,"doc3.find()").fetch()
5 =A2.join(_id,A3:userId,address,mob)
6 =A5.join(_id,A4:userId,fbURLs,twitterURLs)
7 >A1.close()

  This SPL script is similar to that in the previous example, with only one more joined table. Every join adds a new field to form a merged table.

  Obviously, the SPL script is concise and coherent.

6. Associated query by array values

  Get records from a table according to the array field in its associated table and generate a new table with certain fields.

Test data:

users:

_id Name workouts
1000 xxx [2,4,6]
1002 yyy [1,3,5]

workouts:

_id Date Book
1 1/1/2001 Othello
2 2/2/2001 A Midsummer Night's Dream
3 3/3/2001 The Old Man and the Sea
4 4/4/2001 GULLIVER’S TRAVELS
5 5/5/2001 Pickwick Papers
6 6/6/2001 The Red and the Black

Desired result:

Name _id Date Book
xxx 2 2/2/2001 A Midsummer Night's Dream
xxx 4 4/4/2001 GULLIVER’S TRAVELS
xxx 6 6/6/2001 The Red and the Black
yyy 1 1/1/2001 Othello
yyy 3 3/3/2001 The Old Man and the Sea
yyy 5 5/5/2001 Pickwick Papers

MongoDB script:

db.users.aggregate([
  { "$lookup": {
    "from" : "workouts",
    "localField" : "workouts",
    "foreignField" : "_id",
    "as" : "workoutDocumentsArray"
  }},
  {$project: { _id:0,workouts:0} } ,
  {"$unwind": "$workoutDocumentsArray"},
  
  {"$replaceRoot": { "newRoot": { $mergeObjects: [ "$$ROOT", "$workoutDocumentsArray"] } } },
  {$project: { workoutDocumentsArray: 0} }
]).pretty()

  Put the result of associated query between users and workouts into an array, split the array and move the embedded document to the top level, and remove the unwanted fields.

SPL script (users.dfx):

A
1 =mongo_open("mongodb://127.0.0.1:27017/raqdb")
2 =mongo_shell(A1,"users.find()").fetch()
3 =mongo_shell(A1,"workouts.find()").fetch()
4 =A2.conj(A3.select(A2.workouts^~.array(_id)!=[]).derive(A2.name))
5 >A1.close()

SPL script explanation:
  A1: Connect to MongoDB database.
  A2: Read data from users table.
  A3: Read data from workouts table.
  A4: Get records from workouts table where _id field is included in workouts field of A2’s table sequence, add name field to the records, and return the concatenated table sequence.
  A5: Close the database connection.

  The operation is considered normal only if the matching result isn’t null. So _id field is converted into a sequence to make the comparison convenient in A4.

  The SPL script is more integration-friendly, more flexible and much simpler.

7. Calling esProc dfx script in Java

  The result of executing an SPL script can be easily integrated by a Java application. esProc provides JDBC driver to be accessed withJava stored procedure. That is the same as calling a SQL stored procedure in Java. (For esProc JDBC deployment, refer to 1.9 esProc JDBC in esProc Tutorial)

  Below is the process of calling an SPL script from Java:

    public void testUsers(){
        Connection con = null;
        com.esproc.jdbc.InternalCStatement st;
        try{
            //Establish database connection
            Class.forName("com.esproc.jdbc.InternalDriver");
            con= DriverManager.getConnection("jdbc:esproc:local://");
            // Call the stored procedure, in which user is the dfx file’s name
            st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call users ()");
            // Execute the stored procedure
            st.execute();
            //Get result set
            ResultSet rs = st.getResultSet();

            ......

            catch(Exception e){
                System.out.println(e);
            }

  This is a standard JDBC invocation, which is easy to embed an SPL script into a Java application. esProc also supports the ODBC driver, which makes it easy to be integrated with languages that use ODBC.

  Compared with relational databases, MongoDB stores data in a more complicated, more flexible way, has powerful and adaptable query syntax, and provides various functions and rich cooperative possibilities between them. It’s not easy to be proficient in MongoDB. esProc, however, features discreteness and ease of use to become the NoSQL database’s right-hand and can help it play to its full potential.