* Perform Distinct Count over MongoDB Nested Subdocuments

Nested structures are common in MongoDB data. The database uses nested subdocuments to represent the one-to-many relationship. But, when you need to group nested subdocuments and perform distinct, you should first expand the nested structure to flat structure before using $group to perform aggregation.
Below is data in fund collection. The task is to group documents by id and get the number of unique fundcode under shares and trade fields respectively.

{
       "id" : 1,
     "shares" : [
       {
        "fundcode" :  "000001",
        "lastshares" : 1230.2,
          "agencyno" :  "260",
        "netno" :   "260"
         },
       {
        "fundcode" :  "000002",
        "lastshares" : 213124,
        "agencyno" :  "469",
        "netno" :  "001"
       },
       {
        "fundcode" :  "000001",
        "lastshares" : 10000.8,
        "agencyno" :   "469",
        "netno" :  "002"
       }
     ],
     "trade" : [
       {
         "fundcode" :  "000001",
         "c_date" :   "20180412",
 
 

"agencyno" :  "260",
          "netno" :  "260",
  "bk_tradetype" : "122",
        "confirmbalance" :  1230.2,
        },
       {
        "fundcode" :  "000002",
        "c_date" : "20180506",
        "agencyno" :  "469",
        "netno" :  "001",
        "bk_tradetype" :  "122",
        "confirmbalance" :  213124,
       },
       {
        "fundcode" :  "000003",
        "c_date" : "20190502",
        "agencyno" :  "469",
        "netno" :  "002",
        "bk_tradetype" :  "122",
          "confirmbalance" :  10000.8,
        "netvalue" : 1,
        }

]
  }
   …….

The MongoDB way of doing this: Use $group to group documents by id, extract fundcode under each document, use $unwind to split documents by shares and trade to convert to flat structure, use $addToSet to perform distinct during the second grouping, and then count fundcode under shares and trade respectively. The whole process is complicated.

All can be easy with esProc SPL. You can rearrange the trade subdocument into a table sequence, group records and perform distinct and count. It’s much easier.
Download esProc installation package
HERE.

Implementation directions:
1. Write SPL script fund.dfx in esProc:

A

B

1

=mongo_open("mongodb://localhost:27017/local")

/ Connect to MongoDB database

2

=mongo_shell(A1,"fund.find(,     {_id: 0})").fetch()

/ Perform conditional   filtering on fund collection

3

=A2.trade.conj(if (#==1,   t=~.fname(), t=t^~.fname())).id()

/ Get intersection of fields   under trade subdocument

4

=A2.run(    trade=trade.new( ${A3.(A3(#)).concat@c()}))

/ Rearrange data of common   fields into a table sequence

5

=A2.group(id;     ~.conj(shares.(fundcode)).id().count(): shares,~.conj(trade.(fundcode)).id().count():trade)

/ Group records by id and   perform distinct count on fundcode udner shares and trade fields

6

>A1.close()

/ Close database connection

2. Debug and execute the script. Below is A3’s value:

A3

Member

agencyno

bk_tradetype

……

3. Execute the script to return the final result:

A5

id

shares

trade

1.0

4

6

2.0

3

3

Since the fields under subdocument trade have different structures, we convert them into a table sequence for the convenience of subsequent processing.
esProc offers JDBC interface, so you can easily integrate the script into a Java program:

public static void doWork() {
  Connection con = null;
  java.sql.Statement st;

try{
    Class.forName("com.esproc.jdbc.InternalDriver");
    con = DriverManager.getConnection("jdbc:esproc:local://");
     // Call script fund.dfx
     st=con.createStatement();
     ResultSet rst = st.executeQuery("call fund");
     System.out.println(rst);
   }catch(Exception e){
     System.out.println(e);
   }finally{
    // Close database connection
    if (con!=null) {
      try {
        con.close();
      }catch(Exception e) {
         System.out.println(e);
       }
    }
  }
}

When fields under a certain nested subdocument have different structures, we can dynamically get the desired fields using union, intersection or other operation as needed. An easier alternative is to use a user-defined field to format data into a table sequence. The latter method makes it more convenient to further process data. Read How to Call an SPL Script in Java to learn more about integration of esProc SPL script into a Java program.