SPL: Access MongoDB

 

MongoDB is a non-relational database storing data in BSON format, which is similar to JSON format, and it provides a full set of commands to process data. SPL, the lightweight computation engine language, supports nested data structure and is very easy to load JSON data. Therefore, we can extend the computational abilities of MongoDB with SPL. SPL provides the mongo_shell function, through which we can execute Mongo commands to read and write Mongo data.

 

Open/close Mongo connection

 

Similar to the JDBC connection of relational database, the method of paired “open/close” is also used in SPL to connect to MongoDB.

 

mongo_open(connectionString), please refer to the official website for detailed parameters: Connection String.

 

mongo_close(mongoConn), mongoConn is the connection to be closed.

 

Sample: A1 creates the connection, and A3 closes the connection after some reading, writing, and calculation operations are done in the middle steps.

 


A

1

=mongo_open("mongodb://127.0.0.1:27017/mymongo")

2

……

3

=mongo_close(A1)

 

Execute Mongo command

 

Mongo defines hundreds of commands including querying, updating data, managing database users, etc. All the commands are inputted and outputted in a specific JSON format.

 

mongo_shell(inputJson), inputJson is the input of the command.

 

Sample:


A

1

=mongo_open("mongodb://127.0.0.1:27017/mymongo")

2

=mongo_shell(A1,"{'find':'orders',filter:{OrderID: { $gte: 50}},batchSize:10}")

3

=A2.cursor.firstBatch

4

=mongo_shell(A1,"{'getMore':"+string(A2.cursor.id)+",batchSize:20}")

5

=create(OrderID,Client,SellerId,Amount,OrderDate)

6

>A5.insert(0,26,"TAS",1,2142.4,"2009-08-05")

7

>A5.insert(0,28,"DSGC",21,2125.4,"2009-09-05")

8

=mongo_shell(A1,"{'insert':'orders',documents:"+json(A5)+"}")

9

=mongo_close(A1)

 

A2 queries the data that satisfy OrdersID>50 from the “orders” table and returns the first 10 rows of data. The result is a nested SPL table sequence which is identical to the JSON outputted by the “find” command. The following compares the JSON format and SPL table sequence structure returned by the “find” command:

{
         "cursor": {
                  "firstBatch": [{
                          "_id": 61 adf78694c8ba530702472e,
                          "OrderID": 84,
                          "Client": "GC",
                          "SellerId": 1,
                          "Amount": 88.5,
                          "OrderDate": "2009-10-16"
                          }
                         ,......other 9 rows data ……
                  ],
                  "id": 8676451393605378424,
                  "ns": "mymongo.orders"
         },
         "ok": 1
}

undefined 

A3 is the data table sequence firstBatch, the third layer of A2.

 

A4 executes the getMore command to fetch another 20 rows of data using the cursor ID (A2.cursor.id) in the result table sequence of A2.

 

A5 creates an SPL table sequence in the same structure as “orders” table. A6 and A7 initialize two pieces of data to the table sequence of A5.

undefined 

A8 uses the “insert” command to update A5 table sequence in the “orders” table of mango. And the SPL function json(A5) converts A5 table sequence to json strings, thus concatenating them to a complete “insert” command JSON string. “insert” command is executed to return {"n":2,"ok":1}, indicting two rows of data are successfully added to the “orders” table of Mongo.

 

We can see that the value of A8 is still the table sequence equivalent to the JSON result.

 

Return all query results at once

 

A complete query usually gets all the data through combining “find” command and multiple subsequent “getMore” commands. The mongo_shell function returns all the data directly with the @d option.


A

1

=mongo_open("mongodb://127.0.0.1:27017/mymongo")

2

=mongo_shell@d(A1,"{'find':'orders',filter:{OrderID:   {$gte: 50}},batchSize:10}")

3

=mongo_close(A1)

A2 directly gets the data of “first ‘find’ + N ‘getMore’”, and encapsulates the getMore command executed N times inside the mongo_shell function which automatically loops through until all the data are obtained:

undefined 

 

Return query results with SPL cursor

 

Returning big result set as a whole may lead to memory overflow. In this case, we can add @c option along with @d on the mongo_shell to return the SPL cursor, and then use the fetch function to retrieve data gradually:


A

1

=mongo_open("mongodb://127.0.0.1:27017/mymongo")

2

=mongo_shell@dc(A1,"{'find':'orders',filter:{OrderID: { $gte: 50}},batchSize:10}")

3

=if(A2==null, null, A2.fetch(5))

4

=if(A3==null, null, A2.fetch(10))

5

=mongo_close(A1)

 

A2 is executed to get the SPL cursor, A3 fetches the 1st to 5th data, and

A4 fetches the 6th to 15th data.

 

Please note that @dc cannot return SPL cursor if A2 is not the “mongo” command for querying data, and we need to identify whether A2 is null when fetching data in A3 and A4.