From XPath to SPL

 

The multi-layer structure of XML is difficult to calculate, and the computing power of XPath is seriously insufficient. It can only play a role in the early Internet applications with weak computing needs. Today's computing needs are increasingly complex and changeable, and SPL, a more powerful XML computing language, is needed.

XML can express data flexibly and has the characteristics of cross platform. It is widely used in web exchange services and data services, such as WebService. However, the multi-layer XML structure is complex and inconvenient to calculate. In this case, XML computing language needs to be used. In the second year after the XML standard was proposed by W3C, the first XML computing language XPath (XQuery) came into being. XPath can significantly improve the development efficiency in XML computing, which makes it quickly popular among developers and soon introduced into the mainstream XML parsing class libraries, such as XOM / Xerces-J / Jdom / Dom4J.

Let's take a look at two examples to experience the computing power of XPath in those years.

File Employees_Orders.xml stores a batch of employee information and multiple orders belonging to employees. Part of the data is as follows:

<?xml version="1.0" encoding="UTF-8"?>

<xml>

<row>

         <EId>2</EId>

         <State>"New   York"</State>

         <Dept>"Finance"</Dept>

         <Name>"Ashley"</Name>

         <Gender>"F"</Gender>

         <Salary>11000</Salary>

         <Birthday>"1980-07-19"</Birthday>

<Orders>[]</Orders>

</row>

<row>

         <EId>3</EId>

         <State>"New   Mexico"</State>

         <Dept>"Sales"</Dept>

         <Name>"Rachel"</Name>

         <Gender>"F"</Gender>

         <Salary>9000</Salary>

         <Birthday>"1970-12-17"</Birthday>

         <Orders>

                  <OrderID>32</OrderID>

                  <Client>"JFS"</Client>

                  <SellerId>3</SellerId>

                  <Amount>468.0</Amount>

                  <OrderDate>"2009-08-13"</OrderDate>

         </Orders>

         <Orders>

                  <OrderID>39</OrderID>

                  <Client>"NR"</Client>

                  <SellerId>3</SellerId>

                  <Amount>3016.0</Amount>

                  <OrderDate>"2010-08-21"</OrderDate>

                  </Orders>

         <Orders>

</row>

…

<xml>

Conditional query: for the XML file, use the XPath of Dom4J parsing class library to find all orders with prices ranging from 1000 to 3000 and client names containing bro. The key codes are as follows:

…

         SAXReader saxReader = SAXReader.createDefault();

         Document doc = saxReader.read("file:\\\D:\\\xml\\\Employees_Orders.xml");

         List<Node> list=doc.selectNodes("/xml/row/Orders\[Amount>1000 and Amount<=3000 and contains(Client,'bro')\]")

         int i=0;

          System.out.println("--------------count of the current resultSet="+list.size());

        for(Node n:list){

            String OrderID=n.selectSingleNode("./OrderID").getText();

            String Client=n.selectSingleNode("./Client").getText();

            String SellerId=n.selectSingleNode("./SellerId").getText();

            String Amount=n.selectSingleNode("./Amount").getText();

            String OrderDate=n.selectSingleNode("./OrderDate").getText();

System.out.println(++i+":"+OrderID+"\\t"+Client+"\\t"+SellerId+"\\t"+Amount+"\\t"+OrderDate);

        }

In the above XPath codes, /xml/row/Orders is the query range, Amount>1000 and Amount<=3000 and contains(Client,'bro') is the query condition. XPath functions are divided into four categories. Mathematical functions include abs and floor, string functions such as compare and substring, date functions such as year-from-date and timezone-from-time, and aggregation functions, which will be discussed below.

Aggregation calculation: calculate the total order amount for this XML file. The key codes are as follows:

…

list=doc.selectNodes("sum(/xml/row/Orders/Amount)");

Object sumResult=list.get(0);

System.out.println((Double)sumResult);

The aggregation function sum is used in the code, and XPath has four other similar functions, namely count \ max \ min \ avg.

From the above two examples, we can see the advantages of XPath in XML Computing: the code is short and intuitive, and the multi-layer structure can be easily accessed with dots; The support for conditional query and aggregate calculation is relatively good, and some library functions are provided.

In the early days of Internet applications with weak computing needs, XPath was deeply sought after by developers with the above advantages. However, with the increasing diversification and complexity of computing needs, the shortcomings of XPath are gradually revealed.

Lack of computing power is the most fatal disadvantage of XPath. As mentioned earlier, XPath supports conditional query and aggregation. In other words, XPath only supports these two simplest calculations, while it doesn’t support a large number of other conventional calculations, such as sorting, merging, uniqueness, group aggregation, association calculation, calculation after grouping (including window functions), etc. In addition, although there are a lot of XPath library functions seemingly, there are only five aggregate functions that can be used for calculation, which can be described as very few. Because XPath does not support subquery and step-by-step calculation, it is powerless for more complex calculation targets. In fact, for the XML computing needs in recent years, XPath can only play an auxiliary role, and a large number of calculations have to be completed by hard coding.

In addition to computing power, XPath also has the problem of few data source interfaces. XPath only has a file data source interface and does not support WebService / HTTP, which is the main source of XML.

Today's computing needs are becoming more and more diverse and complex. Can developers only tolerate XPath with insufficient computing power, and there is no XML computing language with stronger computing power?

esProc SPL is a better choice.

esProc SPL is a professional structured / semi-structured data computing language with built-in rich computing functions. It can realize all conventional calculations with short code, split large computing objectives into multiple small steps, support multiple data source interfaces, and provide JDBC driver for integration. SPL can calculate all kinds of data sources with unified syntax and data structure, including XML.

For the previous conditional query, you only need the following SPL code:


A

1

=xml(file("D:\\xml\\Employees_Orders.xml").read(),"xml/row")

2

=A1.conj(Orders)

3

=A2.select(Amount>100 &&   Amount<=3000 && like@c(Client,"*bro*"))

The above code first reads the XML as a multi-layer table sequence object, then combines all orders with the conj function, and then completes the conditional query with the select function.

This code can be debugged/executed in SPL's IDE, or can be saved as a script file (such as condition.dfx), which is called in JAVA through SPL's JDBC interface. The specific code is as follows:

package Test;
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.ResultSet;
  import java.sql.Statement;
  public class test1 {
      public static void main(String[]   args)throws Exception {
            Class.forName("com.esproc.jdbc.InternalDriver");
          Connection connection   =DriverManager.getConnection("jdbc:esproc:local://");
          Statement statement =   connection.createStatement();
          ResultSet result =   statement.executeQuery("call condition()");
          printResult(result);
          if(connection != null)   connection.close();
      }

…

}

Let’s look at a few more examples. Aggregate calculation:

=A2.sum(Amount)

Sorting

=A1.sort(Dept,-Salary)

Group aggregation

=A2.groups(year(OrderDate);sum(Amount))

Association calculation

=A1.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate)

As can be seen from the above code, SPL has stronger computing power, can not only complete common calculations, but also the code is short and easy to understand, and has lower coupling when integrating with Java. In particular, the table sequence naturally supports multi-layer data, and can intuitively express multi-layer relations with dots, and is especially suitable for XML.

SPL is more powerful and can often simplify the calculation of multi-layer XML. For example, the file book1.xml stores book information, in which the author node has two attributes: author name and nationality, and some books have multiple authors. Part of the data is as follows:

<?xml version="1.0"?>

<library>

    <book   category="COOKING">

        <title>Everyday Italian</title>

        <author   name="Giada De Laurentiis" country="it" />

          <year>2005</year>

        <info>Hello   Italian!</info>

    </book>

    <book   category="CHILDREN">

        <title>Harry Potter</title>

        <author name="J K.   Rowling" country="uk"/>

        <year>2005</year>

        <info>Hello   Potter!</info>

    </book>

    <book   category="WEB">

        <title>XQuery Kick Start</title>

          <author name="James McGovern" country="us"   />

          <author name="Per Bothner" country="us"/>

          <year>2005</year>

        <info>Hello   XQuery</info>

    </book>

    <book   category="WEB">

        <title>Learning XML</title>

        <author name="Erik   T. Ray" country="us"/>

          <year>2003</year>

        <info>Hello   XML!</info>

    </book>

</library>

Organize the XML into a structured two-dimensional table, in which the author field is presented in the format of "author name [nationality]". If a book has multiple authors, they are separated by comma. Finally, query the table and select the book information in 2005. The results should be as follows:

title

category

year

Author

info

Everyday Italian

COOKING

2005

Giada De Laurentiis[it]

Hello Italian!

Harry Potter

CHILDREN

2005

J K. Rowling[uk]

Hello Potter!

XQuery Kick Start

WEB

2005

James McGovern[us],Per Bothner[us]

Hello XQuery

This task is a bit difficult. It can be obviously simplified by using SPL. The specific code is as follows:


A

1

=file("D:\\xml\\book1.xml")

2

=xml@s(A1.read(),"library/book").library

3

=A2.new(category,book.field("year").ifn():year,book.field("title").ifn():title,book.field("lang").ifn():lang,book.field("info").ifn():info,book.field("name").select(~).concat@c():name,book.field("country").select(~).concat(","):country)

4

=A3.new(title,category,year,(lang,name.array().(~+"[")++country.array().(~+"]")).concat@c():author,info)

5

=A4.select(year==2005)

In addition to stronger computing power, SPL also supports WebService / HTTP and other data sources. For example, get the interface description of weather forecast from WebService, query the list of provinces according to the interface description, and return XML results:


A

1

=ws_client("http://www.webxml.com.cn/WebServices/WeatherWebService.asmx?wsdl")

2

=ws_call(A1,"WeatherWebService":"WeatherWebServiceSoap":"getSupportProvince")

 As the first XML computing language, XPath has made a breakthrough contribution, but the lack of computing power is always a fatal disadvantage. In today's increasingly changeable computing requirements, only SPL, a more powerful XML computing language, can continue to provide high development efficiency.