Export MongoDB as a CSV File

 

MongoDB is used for storing unstructured data. It is especially great at storing JSON data. But users who are accustomed to database tables and who need structured data are probably want to convert the unstructured data to structured data for their computations. However, its not very convenient to perform a standard conversion with the mainstream languages. In this article, well illustrate how we can do this efficiently in esProc SPL (Structured Process Language) for the NoSQL database through an example.

Heres part of collection carInfo:

 

{

  "_id" : ObjectId("5518f6f8a82a704fe4216a43"),

  "id" : "No1",

  "cars" : {

    "name" : "Putin",

    "car" : ["porche", "bmw"]

  }

}

 

{

  "_id" : ObjectId("5518f745a82a704fe4216a44"),

  "id" : "No2",

  "cars" : {

    "name" : "jack",

    "car" : ["Toyota", "Jetta", "Audi"]

  }

}

……

 

Now you need to export the collection as a CSV file in the following format:

undefined 

 

 

Below is the SPL script:


A

1

=mongo_open("mongodb://localhost:27017/local?user=test&password=test")

2

=mongo_shell(A1,"carInfo.find(,{_id:0})")

3

=A2.conj((t=~,~.cars.car.new(t.id:id, t.cars.name:name, ~:car)))

4

=file("D:\\data.csv").export@t(A3;",")

5

>mongo_close(A1)

 

A1: Connect to MongoDB; the connection string syntax is mongo://ip:port/db?arg=value&.

A2: Use find() function to retrieve all fields except for _id from collection carInfo without any filtering condition and return result as a cursor.

A3: Get desired fields, concatenate into a structured two-dimensional table and return it as a cursor. The ~ represents each of A2s document; cars.car field is split into two rows of the table sequence; conj() function joins up rows vertically.

A4: Export A3s table as a CSV file. @t option enables an export with column headers. esProc computing engine manages buffers automatically. It retrieves a batch of records from A3s cursor into the memory for processing.

A5: Close the MongoDB connection.

 

If you want to customize the data retrieval buffer, just use the following SPL script:


A

B

1

=mongo_open("mongodb://localhost:27017/local?user=test&password=test")

2

=mongo_shell(A1,"carInfo.find(,{_id:0})")

3

for A2,1000

=A3.cars.car.new(A3.id:id, A3.cars.name:name, ~:car)

4


=file("D:\\data.csv").export@ta(B3;",")

5

>mongo_close(A1)


A3: Loop through A2s cursor to retrieve 1000 records each time into the memory. The for statements working range is the indented B3-B4, within which A3 references the loop variable to do the job. Below is A3s result:
undefined
B3: Convert the current batch of records into a structured two-dimensional table:

undefined 

B4: Export the current batch processing result set into the specified CSV file. @a option enables appending.

 

In a nutshell, SPL retrieves fields as a cursor, splits each document into rows by car field and concatenate them into a table sequence and appends each table sequence into a specified file. The whole process is clear, concise and efficient.