Merging MongDB Subdocuments

 

MongoDB stores data in JSON-like BSON format. BSON enables a flexible, versatile and powerful storage model, yet at the same time compromise the data query efficiency.

To make up for this weakness, esProc SPL offers sufficient interfaces to make MongoDB query as convenient as the SQL-like database queries. Here we take a collection with nested subdocument as an example to illustrate how SPL handles MongoDB queries.

Sample data from Collection C1:

{
 "_id" :  ObjectId("55014006e4b0333c9531043e"),
 "acls" : {
       "append" : {
           "users" :  [ObjectId("54f5bfb0336a15084785c393") ],
           "groups" : [ ]
        },
       "edit" : {
           "groups" : [ ],
           "users" : [
              ObjectId("54f5bfb0336a15084785c392")
            ]
        },
        "fullControl" : {
           "users" : [ ],
           "groups" : [ ]
        },
       "read" : {
           "users" : [ObjectId("54f5bfb0336a15084785c392"),
           ObjectId("54f5bfb0336a15084785c398")],
           "groups" : [ ]
        }
    },
   name: "ABC"
}

{
    "_id" :  ObjectId("55014006e4b0333c9531043f"),
    "acls" : {
        "append" : {
           "users" : [ObjectId("54f5bfb0336a15084785c365")   ],
           "groups" : [ ]
        },
        "edit" : {
            "groups" : [ ],
            "users" : [
               ObjectId("54f5bfb0336a15084785c392")
            ]
        },
        "fullControl" : {
           "users" : [ ],
           "groups" : [ ]
        },
        "read" : {
           "users" : [ObjectId("54f5bfb0336a15084785c392"),  
           ObjectId("54f5bfb0336a15084785c370")],
           "groups" : [ ]
        }
     },
     name: "ABC"
}

Task: Group the collection by name field. Each group contains all distinct users fields in a subdocument under a name. Below is the expected result:

{
result : [
    {
        _id: "ABC",
        readUsers : [
           ObjectId("54f5bfb0336a15084785c393"),
           ObjectId("54f5bfb0336a15084785c392"),
           ObjectId("54f5bfb0336a15084785c398"),
           ObjectId("54f5bfb0336a15084785c365"),
           ObjectId("54f5bfb0336a15084785c370")
        ]
        }
]
}

SPL script:


A

B

1

=mongo_open("mongodb://localhost:27017/local?user=test&password=test")

2

=mongo_shell(A1,"c1.find(,{_id:0};{name:1})")

3

for A2;name

=A3.(acls.read.users|acls.append.users|acls.edit.users|acls.fullControl.users)

4


=B3.new(A3.name:_id,B3.union().id():readUsers)

5


=@|B4.group@1(~._id,~.readUsers)

6

=mongo_close(A1)


 

A1: Connect to MongoDB database. The connection strings format is mongo://ip:port/db?arg=value&….

A2: find() function retrieves documents from collection c1, sorts them and return result as a cursor. As no filtering condition is specified, it gets all fields except _id and sorts them by name.

A3: Fetch a set of documents with same name field value from A2s cursor each time and do this iteratively. A3s working scope covers an indented area from B3 to B5, within which A3 references a loop variable.

B3: Get all users fields from the current group of documents:

 

undefined 

 

B4: Union all users in subdocuments in the current group.

B5: Remove duplicate records from each group in B4 and append it to B5. group@1 function deletes the duplicate records. Below is B5s result:

 

undefined 

 

If the final result set is too large to be wholly loaded into the memory, use export@j function to convert B4s table sequence into JSON strings to append them to a target text file.

A6: Close MongoDB connection.

MongoDB boasts a lightweight, efficient storage model. Its cooperation with esProc can make data processing more effectively and extensively.