Which open source package is best for Java to parse and process XML?

For a long time, Java has provided built-in implementations of SAX and DOM, and no external libraries are required to parse XML. To simplify the XML processing requirements, advanced declarative languages such as XQuery and XPath came into being, which can meet the XML conditional query requirements, but the conditional query is only a small part of the data calculation. The actual work should also include sorting, de-duplication, grouping, aggregation. If it is for the needs of XML data calculation, it is more convenient to use Open-esProc. But Open-esProc is different from general Java packages. It encapsulates the data types and calculation methods in a scripting language called SPL and then calls the SPL script in the Java program to return a ResultSet object.

Let me give a simple example to illustrate how to use Open-esProc. The file Employees_Orders.xml stores a batch of employee information and multiple orders belonging to employees. Part of the data is as follows. Find out all the orders whose price is between 1000-3000 and the customer name contains the word bro.

<?xml version="1.0"   encoding="UTF-8"?>
<xml>
<row>
         <EId>2</EId>
         <State>"New   York"</State>
         <Dept>"Finance"</Dept>
         <Name>"Ashley"</Name>
         <Gender>"F"</Gender>
         <Salary>11000</Salary>
         <Birthday>"1980-07-19"</Birthday>
<Orders>[]</Orders>
</row>
<row>
         <EId>3</EId>
         <State>"New   Mexico"</State>
         <Dept>"Sales"</Dept>
         <Name>"Rachel"</Name>
         <Gender>"F"</Gender>
         <Salary>9000</Salary>
         <Birthday>"1970-12-17"</Birthday>
         <Orders>
                  <OrderID>32</OrderID>
                  <Client>"JFS"</Client>
                  <SellerId>3</SellerId>
                  <Amount>468.0</Amount>
                  <OrderDate>"2009-08-13"</OrderDate>
         </Orders>
         <Orders>
                  <OrderID>39</OrderID>
                  <Client>"NR"</Client>
                  <SellerId>3</SellerId>
                  <Amount>3016.0</Amount>
                  <OrderDate>"2010-08-21"</OrderDate>
                  </Orders>
         <Orders>
</row>
…
<xml>

Query with SPL as follows:


A

1

=xml(file("D:\\xml\\Employees_Orders.xml").read(),"xml/row")

2

=A1.conj(Orders)

3

=A2.select(Amount>100 && Amount<=3000   && like@c(Client,"*bro*"))

The above code first reads the XML as a multi-layered table sequence object, uses the conj function to merge all orders, and then uses the select function to complete the conditional query.

This code can be debugged/executed in the IDE of esProc and saved as a script file (such as condition.dfx). It can be easily integrated with the Java program by calling it in JAVA through the JDBC interface of esProc. The specific code is as follows:

package Test;
  import java.sql.Connection;
  import java.sql.DriverManager;
  import java.sql.ResultSet;
  import java.sql.Statement;
  public class test1 {
      public static void main(String[]   args)throws Exception {
          Class.forName("com.esproc.jdbc.InternalDriver");
          Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
          Statement statement =   connection.createStatement();
          ResultSet result =   statement.executeQuery("call condition()");
          printResult(result);
          if(connection != null)   connection.close();
      }
…
}

Similarly, esProc can implement grouping and summarization, the code is as follows:

=A2.groups(year(OrderDate);sum(Amount))

Or associated calculation:

=A1.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate)

As can be seen from the above code, esProc has a strong grammatical expression ability. Not only can it complete commonly used calculations, but the code is short and easy to understand. It decouples XML calculations. When the requirements change, only the SPL script needs to be changed. The table sequence type of esProc supports multiple layers of data and supports intuitive point operators. When implementing association calculations, values can be directly obtained from multiple layers of data, and the code is more concise. For more calculation examples, refer to XML data parsing and calculation