"The multi-layer structure of JSON is more complex than the two-dimensional structure, and it is .."

Hoo RaqForum 19 No.
1 Reply • 225 View • 2 Years ago

From JsonPath to SPL

The multi-layer structure of JSON is more complex than the two-dimensional structure, and it is very difficult to calculate. The early class libraries can only parse JSON without computing power; Although JsonPath provides the original computing language, its computing power is weak; SPL is a professional computing language that supports various basic calculations, simplifies the calculations of multi-layer JSON and complex calculation objectives, and provides a variety of data source interfaces and JDBC integration interface.

The multi-layer structure of JSON can store rich information, coupled with small volume and high transmission efficiency, it is widely used in microservices, inter program communication, configuration files and other scenarios. However, the format of multi-layer structure is more complex than two-dimensional structure, and it is very difficult to calculate, which puts forward higher requirements for JSON class libraries. Among them, class libraries such as Gson\Jackson do not support JSON computing language, but only provide functions to parse JSON strings into JAVA\C# objects. These class libraries have no computing power. Even to implement the simplest conditional query, a lot of code needs to be written, the development efficiency is low and the practicability is poor.

The good news is that JsonPath appears.

Different from the class libraries mentioned earlier, JsonPath follows the XPath syntax and provides the original JSON calculation language. It can query qualified nodes with expressions and support some aggregation calculations. Here are a few examples.

The file data.json stores employee records and employee’s orders. Part of the data is as follows:

[  {"EId":2,"State":"NewYork  ","Dept":"Finance","Name":"Ashley","Gender":"F",

"Salary":11000,"Birthday":"1980-07-19",

"Orders":[]

},

{"EId":3,"State":"New   Mexico","Dept":"Sales","Name":"Rachel","Gender":"F",

"Salary":9000,"Birthday":"1970-12-17",

"Orders":[

{"OrderID":32,"Client":"JFS","SellerId":3,"Amount":468.0,"OrderDate":"2009-08-13"},

…

{"OrderID":99,"Client":"RA","SellerId":3,"Amount":1731.2,"OrderDate":"2009-11-05"}

]

},

…

]

Conditional query: find all orders of employee Rachel. The JsonPath code is as follows:

File file = new File("D:\\json\\data.json");

Long fileLength = file.length();

byte[] fileContent = new byte[fileLength.intValue()];

FileInputStream in = new FileInputStream(file);

in.read(fileContent);

in.close();

String JsonStr= new String(fileContent, "UTF-8")

Object document = Configuration.defaultConfiguration().jsonProvider().parse(JsonStr);

ArrayList l=JsonPath.read(document,   "$[?(@.Name=='Rachel')].Orders");

The above code first reads the strings from the file, then turns it into a JSON object, and finally performs the query. The specific query expression is $[?(@.Name=='Rachel')].Orders, where $ represents the root node, that is, employee records (including order field), Orders represents the order records of the lower layer (that is, order field), and the upper and lower layers are separated by dot. Query conditions, such as[?(…)], can follow the nodes in each layer. Please refer to the official website for specific usage.

Combined query: find out all orders with prices ranging from 1000 to 2000 and customer name including business. The key codes are as follows:

……//The JSON object generation process is omitted

l=JsonPath.read(document, "$\[*\].Orders\[?((@.Amount>1000 && @.Amount<2000) && @.Client =~ /.*?business.*?/i )\]");

(@.Amount>1000 && @.Amount<2000) is the interval query condition, @.Client =~ /.*?business.*?/i is the query condition of regular expression，&& is the logical operator “and” (|| is “or”).

Aggregation calculation: calculate the total amount of all orders. The key codes are as follows:

……

Double d=JsonPath.read(document, "$.sum($\[*\].Orders\[*\].Amount)");

sum in the code is the function to get the sum. Similar functions include average, max, min and count.

It can be seen from these examples that the syntax of JsonPath is intuitive and easy to understand. It can easily access the multi-layer structure with dots, perform conditional query with relatively short code, and carry out simple aggregation calculation. From Gson\Jackson to JsonPath, we have achieved a breakthrough in computing power from scratch, which is due to the JSON computing language.

It does not necessarily mean that JsonPath has strong computing power. In fact, JsonPath's JSON computing language is still relatively primitive and its computing power is very weak. JsonPath only supports the two simplest calculations, query and aggregation, and does not support most other basic calculations, and is far from arbitrary and free calculations. To implement most basic calculations, JsonPath still needs to hard code.

Take group aggregation as an example: group all orders by customer and calculate the order amount of each group. The key codes are as follows:

……

ArrayList orders=JsonPath.read(document,   "$[*].Orders[*]");

Comparator<HashMap> comparator = new   Comparator<HashMap>() {

    public int   compare(HashMap record1, HashMap record2) {

        if   (!record1.get("Client").equals(record2.get("Client"))) {

              return   ((String)record1.get("Client")).compareTo((String)record2.get("Client"));

        } else {

              return   ((Integer)record1.get("OrderID")).compareTo((Integer)record2.get("OrderID"));

        }

    }

};

Collections.sort(orders, comparator);

ArrayList<HashMap> result=new   ArrayList<HashMap>();

HashMap currentGroup=(HashMap)orders.get(0);

double sumValue=(double) currentGroup.get("Amount");

for(int i = 1;i < orders.size(); i ++){

    HashMap   thisRecord=(HashMap)orders.get(i);

      if(thisRecord.get("Client").equals(currentGroup.get("Client"))){

          sumValue=sumValue+(double)thisRecord.get("Amount");

    }else{

        HashMap   newGroup=new HashMap();

          newGroup.put(currentGroup.get("Client"),sumValue);

          result.add(newGroup);

        currentGroup=thisRecord;

          sumValue=(double) currentGroup.get("Amount");

    }

}

System.out.println(result);

The above code first gets the order list with JsonPath, then sorts the order list by client, gets the first item as the initial value of the current group, and then loops through the remaining orders in turn. If the client of the current order remains unchanged compared with the current group, the amount of the current order will be accumulated to the current group; If the client changes, the current group has finished aggregation.

JsonPath's computing power is very weak. It does not support group aggregation. It can only hard code to achieve most calculations, which requires programmers to control all details. The code is lengthy and error prone. If you change a grouping field or aggregation field, you need to modify multiple codes. If you group or aggregate multiple fields, you need to modify a lot of codes, which makes it difficult to write universal codes. In addition to group aggregation, JsonPath does not support basic calculations such as renaming, sorting, de duplication, association calculation, set calculation, Cartesian product, merge calculation, window function, ordered calculation, etc. JsonPath also does not support the mechanism of splitting large calculation objectives into basic calculations, such as sub query, multi-step calculations, etc. In fact, for most calculations, JsonPath has to hard code.

In addition to its computing power, JsonPath has another problem, that is, it does not have its own data source interface. Even the simplest file JSON needs to hard code. JSON generally comes from http Restful, and some special ones come from MongoDB or ElasticSearch. Data can only be obtained from these interfaces by introducing third-party class libraries or hard coding, which leads to a complex framework, increased instability and reduced development efficiency.

JsonPath's JSON computing power is very weak, essentially because its computing language is too primitive. To improve JSON computing power, we must use a more professional computing language.

esProc SPL is a better choice.

esProc SPL is an open-source structured data / semi-structured data computing language, which provides rich class libraries and concise syntax. It can realize all basic calculations with short code, split large calculation objectives into basic calculations, support a variety of data source interfaces, and provide JDBC integration interface.

For the same condition query, SPL code is as follows:

	A
1	=json(file("d:\\json\\data.json").read())
2	=A1.select(Name=="Rachel").Orders

A1 code reads the strings from the file and converts it to a table sequence. A table sequence is a general structured / semi-structured data object, and JSON is a kind of semi-structured data. The select function in A2 gets the employee records that meet the condition. Orders represents the order field (order list) of the records, and the upper and lower layers are separated by dot.

For the same combined query, the SPL code is as follows:


1	…. // The JSON object (table sequence) generation process is omitted
2	=A1.conj(Orders)
3	=A2.select((Amount>1000 && Amount<=2000) && like@c(Client,"business"))

A2 concatenates the orders of all employees, A3 performs conditional query, and the like function is used to find the string, @c indicating that it is not case sensitive. Multi step calculation is used here to make the code logic clearer. A2 and A3 can also be combined into one statement.

For the same aggregation calculation, the SPL code is as follows:

	A
1	….
2	=A1.conj(Orders).sum(Amount)

sum in the code is the function to get the sum. Similar functions include avg\sum\min\count.

This code can be debugged / executed in SPL's IDE, or can be saved as a script file (such as getSum.dfx). It can be called in JAVA through JDBC interface. The specific code is as follows:

package Test;
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.ResultSet;
  import java.sql.Statement;
  public class test1 {
      public static void main(String[]   args)throws Exception {
          Class.forName("com.esproc.jdbc.InternalDriver");
          Connection connection   =DriverManager.getConnection("jdbc:esproc:local://");
          Statement statement =   connection.createStatement();
          ResultSet result =   statement.executeQuery("call getSum()");
          printResult(result);
          if(connection != null)   connection.close();
      }

…

}

The above usage is similar to that of stored procedures. In fact, SPL also supports the usage similar to SQL, that is, the SPL code is directly embedded in Java without a script file. The code is as follows:

…

ResultSet result =   statement.executeQuery("=json(file(\\"D:\\\data\\\data.json\\").read()).conj(Orders).sum(Amount)");

…

SPL provides rich library functions and supports various basic calculations. The above query and aggregation are only part of them. More basic calculations are as follows:

	A
1	….
2	=A1.conj(Orders).groups(Client;sum(Amount))
3	=A1.groups(State,Gender;avg(Salary),count(1))
45	=A1.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate)
6	=A1.sort(Salary)
7	=A1.id(State)

A2: Group aggregation

A3: Multi-fields group aggregation

A4: Association

A6: Sorting

A7: Deduplication

SPL has strong computing power and can often simplify the calculations of multi-layer JSON. For example, the runners field of the file JSONstr.json is a sub document. The sub document has three fields: horseId, ownerColours and trainer. The trainer contains the subordinate field trainerId, and ownerColours is a comma separated array. Part of the data is as follows:

[

   {

      "race": {

            "raceId":"1.33.1141109.2",

            "meetingId":"1.33.1141109"

      },

      ...

      "numberOfRunners": 2,

      "runners": [

        {       "horseId":"1.00387464",

                "trainer": {

                    "trainerId":"1.00034060"

                },

            "ownerColours":"Maroon,pink,dark blue."

          },

          {     "horseId":"1.00373620",

                "trainer": {

                    "trainerId":"1.00010997"

                },

            "ownerColours":"Black,Maroon,green,pink."

          }

      ]

   },

...

]

Now, we need to group by trainerId and count the number of members of ownerColours in each group. We can use the following SPL script.

	A
1	=json(file("/workspace/JSONstr.json").read())
2	=A1(1).runners
3	=A2.groups(trainer.trainerId; ownerColours.array().count():times)

SPL has strong computing power and can often simplify complex JSON calculations. For example, we can get the daily duty records sorted by time through http restful. Part of the data is as follows:

[{"Date":"2018-03-01","name":"Emily"},

{"Date":"2018-03-02","name":"Emily"},

{"Date":"2018-03-04","name":"Emily"},

{"Date":"2018-03-04","name":"Johnson"},

{"Date":"2018-04-05","name":"Ashley"},

{"Date":"2018-03-06","name":"Emily"},

{"Date":"2018-03-07","name":"Emily"}

……]

A person will usually be on duty for several continuous working days, and then someone else. Now we need to calculate the continuous duty state of each person in turn, and output the results as a two-dimensional table. Part of the results are as follows:

name	begin	end
Emily	2018-03-01	2018-03-03
Johnson	2018-03-04	2018-03-04
Ashley	2018-03-05	2018-03-05
Emily	2018-03-06	2018-03-07
…	…	…

To obtain the above results, we should first do an ordered grouping by name, that is, if the names of several consecutive records are the same, these records are divided into the same group until the name changes. Note that this grouping may divide the same employee into multiple groups, such as Emily. After grouping, get the first and last records of each group according to the date, that is, the required start date and end date of duty. This involves calculations that are relatively difficult, such as ordered grouping, calculation after grouping (i.e., window function), getting values by position, etc. The common computing language will be very cumbersome to code, and it is much easier to use SPL. The code is as follows:

	A
1	=json(httpfile("http://127.0.0.1:6868/api/getDuty").read())
2	=duty.group@o(name)
3	=A2.new(name,~.m(1).date:begin,~.m(-1).date:end)

In addition to strong computing power, SPL also provides rich data source interfaces. In addition to file and Restful mentioned above, it also supports MongoDB, ElasticSearch, etc. For details, please refer to the official website.

From Gson\Jackson to JsonPath, JSON computing language has grown from scratch, and from JsonPath to SPL, JSON computing power has grown from weak to strong. Every qualitative leap has driven the significant improvement of development efficiency.

SPL Official Website 👉 http://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProc

SPL Learning Material 👉 http://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/ydhVnFH9

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc