Compare Members of Same Positions for MongDB


A MongoDB script has limitation in implementing complicated algorithms and logics. Its difficult to handle certain computing problems with it alone. So in many cases we need to retrieve data out to process it in the main application. Yet its still inconvenient to perform set-oriented operations in advanced languages, like Java. esProc SPL (Structured Process Language) can make all these much easier. Lets look at an example.

Heres a test collection in MongoDB database:


> db.test.find({},{"_id":0})

{"value" : NumberLong(112937552) }

{"value" : NumberLong(715634640) }

{"value" : NumberLong(487229712) }

{"value" : NumberLong(79198330) }

{"value" : NumberLong(440998943) }

{"value" : NumberLong(93148782) }

{"value" : NumberLong(553008873) }

{"value" : NumberLong(336369168) }

{"value" : NumberLong(369669461) }

The collection contains multiple values. Each value has a numeric string. The requirement: compare each numeric string with all the others to find its maximum number of equals and unequals. If the numeric string in the 1st document contains number 1, and another string in the nth document also contains number 1, then the former has one equal; if the former contains number 1 but the latter doesnt contain number 1, then the former gets one unequal.

Below is SPL script for implementing the logic:











for A3




for A3

if #B5==#A4














A1: Connect to MongoDB. The IP and port are localhost:27017; database name is test; both user and password are test.

A2: Retrieve test collection from MongoDB database using find() function and return the data as a cursor. There is no filtering condition. The _id field wont be retrieved. An SPL cursor can be retrieved and processed in batches. That avoids memory overflow due to a data volume exceeding memory capacity. Here the data volume is relatively small, so we can fetch all records from the cursor at a time.

A3: Add same column and diff column to A2 to store the maximum equals and maximum unequals respectively, and convert each value to a string.

A4: Loop over A3 to perform operations in B4-D10.

B4: Get the value in the current round of loop.

C4: Split each value into a sequence of individual characters and remove duplicates.

B5: An inner loop over A3 to perform operations in C5-D10.

C5: If the member at the current position in the inner loop is the same as that in the outer loop, the two values are treated equal. Then the program will move on to the next round of inner loop.

C6: Get the value in the inner loop.

C7: Define variable same and variable diff to store equals and unequals respectively. Both their initial values are 0.

C8: The loop function finds each member in the value-converted sequence in the outer loop in every value in the inner loop. Add 1 to same column if it can be found and add 1 to diff column if not.

C9, C10: Compare with A4s same column and diff column and assign the larger values to them.

Heres the final result:



In summary, SPL retrieves data and split each numeric string into a sequence of unique characters, run an outer loop and an inner loop to compare each member with the character of same position in another value to get the maximum number of equals and unequals. Its clear, simple and efficient to handle computations involving set-oriented operations in SPL.


Note: To access MongoDB in esProc, you need to put the necessary Java driver (like mongo-java-driver-3.9.1.jar) into the databases external library directory (extLib\MongoCli) in esProc.


esProc doesnt limit the computations only within its IDE, it enables integration with a Java application. To do this, you just need to write another line of code, which is return A3, to output the result in the form of a data set to the Java application. Similarly, to call an SPL script from Java to access MongoDB, you should first put MongoDB Java driver into the Java applications classpath.