How Reporting Tools Access JSON/XML Data

JSON/XML is multilevel structured data, for which most reporting tools have no direct supports and a related third party, like JsonPath/Dom4J, is required. Yet those third-party class libraries have insufficient computing abilities, producing roundabout and lengthy code, and are difficult to integrate. Some reporting tools encapsulate them, making a user-friendly data retrieval wizard and getting rid of the hassle of integration. Yet computing problems still exist, and are passed to reporting tools. The latter focuses on data presentation, giving only adequate abilities to manage simple computations and turning to user-defined data sets for handling complicated ones. The code is still long and complicated.

A better choice is the open-source, Java-based class library SPL. SPL has convenient analytic functions for parsing JSON/XML, powerful computational capabilities for processing multilevel data, and easy to integrate JDBC drive.

SPL has built-in JSON/XML parsing functions that can retrieve data conveniently. Suppose we have a multilevel JSON file that stores employee records where one field stores orders records, and are trying to get all employee records in SPL:


A

1

=json(file("d:/Orders.json").read())

2

=json(A1)

And get all orders records:


A

1

=json(file("d:/Orders.json").read())

2

=json(A1).conj()

The way of parsing XML is basically the same. The only difference is that the name of node path containing the record should be specified. To retrieve all orders records from an XML file having the same structure as the above JSON file, for instance, SPL has the following code:


A

1

=xml(file("d:/Orders.xml").read(),"xml/row")

2

=json(A1).conj()

 

SPL has a wealth of built-in functions, enabling it to compute the multilevel data with simple and intuitive syntax, without the need of any complicated hardcoding. To perform a query on orders data according to interval-style conditions, for instance, we have the following SPL code, where p_start and p_end are parameters:


A

2

…// Get all orders records

3

=A2.select(Amount>p_start && Amount<=p_end)

SPL syntax for data computations applies to various data sources. The code for handling XML and JSON is only different in the data retrieval phase and the rest is same. So, we won’t explain their handling separately.

 

SPL provides JDBC driver to be conveniently integrated by reporting tools. Take the above SPL code as an example, we can first save it as a script file (intervalQuery.splx) and then invoke it in the reporting tool as we do with a stored procedure. Assume we are using JasperReport and the code can be as follows:

call intervalQuery($P{paramStart}, $P{paramEnd})

More examples:


A

B

2

…// Retrieve employee records


3

=A2.conj().select(Amount>1000 && like(Client,\"*S*\"))

/Fuzzy query

4

=A2.conj(Orders).groups(Client;sum(Amount))

/Multilevel data summarization

5

=A2.groups(State,Gender;avg(Salary),count(1))

/Single-level data summarization

6

=A2.sort(Salary)

/Sort

7

=A2.id(State)

/Distinct

8

=A2.new(Name,Gender,Dept,Orders.OrderID,Orders.Client,Orders.Client,Orders.SellerId,Orders.Amount,Orders.OrderDate)

/Join

 

SPL is able to retrieve JSON/XML from various sources, including local files, through LAN, FTP, HDFS, WebService, Restful, and so on. To read JSON strings from Restful and perform the above query on orders records, for instance:


A

1

=httpfile("http://127.0.0.1:8086/api/getOrders").read()

2

=json(A1).conj()

3

=A2.select(Amount>p_start && Amount<=p_end)

SPL also supports standard SQL syntax in order to help beginners adapt. The syntax is suitable for handling computing scenarios involving short and simple code. We can create a SQL data set and perform the above query based on interval-style conditions with the following SQL statement:

$select * from {json(file("d:/Orders.json").read()).conj()}
where Amount>$P{paramStart} and Amount<=$P{paramEnd}

Below is a join between two JSON files:

$select e.name, s.orderdate, s.amount from {json(file("d:/sales.json").read())} s left join { json(file("d:/employees.json").read())} e on s.sellerid= e.eid

More examples can be found in Examples of SQL Queries on Files.

SPL has remarkable computational capabilities to simplify computations with complex logic, including stepwise computations, order-based computations and post-grouping computations. It can deal with many computations that are hard to handle in SQL effortlessly. Here is an example. We are trying to find the first n big customers whose sales amount occupies at least half of the total and sort them by amount in descending order.


A

B

1

 / Retrieve data

2

=A1.sort(amount:-1)

/ Sort data by amount in descending order

3

=A2.cumulate(amount)

/ Get the sequence of cumulative totals

4

=A3.m(-1)/2

/ The last cumulative total is the final total

5

=A3.pselect(~>=A4)

/ Get position of record where the cumulative total reaches at least half of the total

6

=A2(to(A5))

/ Get values by position

SPL provides professional IDE equipped with a complete set of debugging functionalities and lets users to observe the result of each step with grid-style coding, making it particularly suited to implementing algorithms with complex logic.

undefined 

SPL also supports many more data sources, including file sources, such as txt, csv and xls, and various NoSQLs, such as MongoDB, Hadoop, Redis, ElasticSearch, SalesForce and Cassandra, as well as `mixed computations between any types of data sources or databases.