Can We Execute SQL on XLS in Java?

An indirect way is through a Java embedded database. We first load xls to it and then execute SQL. The advantage is that it makes full use of SQL’s computing ability, and disadvantages include extremely low real-time capability and a complicated loading process that involves checking if the same table name already exists, deleting and creating a table, parsing the file, importing data, creating the index, and so on. Moreover, data loading would fail if the source file has an irregular format. The direct way is to invoke the xls-based JDBC driver in Java, such as XLSJDBC, xlSQL and CDATA Excel JDBC. The merits are high overall efficiency and simple framework, and the demerit is bad computing ability, which does not support even the set-oriented calculations, subquery and joins.

By contrast, esProc SPL is the desired Java open-source library. It has great computational capabilities and can execute SQL directly on xls files.

SPL has simple basic uses. sOrders.xlsx is the tab-separated orders table, and we are trying to perform a conditional query on it. Below is the SPL code:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="$select * from d:/sOrder.xlsx where Client like'%S%'or (Amount>1000 and Amount<=2000)";
ResultSet result = statement.executeQuery(str);
…

SPL supports common SQL syntax to handle daily computational work. Below are some examples:

#sort  
$select * from sales.xls order by Client,Amont desc

#distinct  
$ select distinct(sellerid) from sales.xls       

#group by…having  
$select year(orderdate) y,sum(amount) s from sales.xls group by year(orderdate) having sum(amount)>=2000000

#join  
$select e.name, s.orderdate, s.amount from sales.xls  s left join employee.csv e on s.sellerid= e.eid

SPL also supports advanced SQL syntax for handling difficult computations that involve, for instance, set-oriented calculations, case when, with and nested subquery. More examples can be found in Examples of SQL Queries over Files.

 

SPL provides extension functions to conveniently handle xls files of irregular formats that most SQL tools find hard to manage. For example:

\# To import a headerless file where the detail data begins directly from   the first row:

$select * from {file("D:/Orders.xlsx").xlsimport()}

\# To import a file by skipping the first two rows containing the title:

$select * from {file("D:/Orders.xlsx").xlsimport@t(;,3)}

\# To import data of the specific sheet named "sheet3":

$select * from   {file("D:/Orders.xlsx").xlsimport@t(;"sheet3")}

The extension functions can parse files of more complex formats, such as importing rows from Mth to Nth and opening a file using password to name a few.

Sometimes the computing logic is so complicated that it is hard to achieve even in the stored procedure. Yet, SPL can implement them effortlessly with the computing capabilities enabled by its extension functions. For example, the dates when goods are moved out and moved in are usually not continuous, and we are trying to get the inventories of a number of products in a specified continuous time period. To do this, we write code in SPL and save it as a script file. Playing the same role as the stored procedure, the script file can loose the coupling of Java code and does not need to be recompiled whenever there are any modifications.


A

1

=T("d:/inout.xls").group(product;~.align(A2,date):g)

2

=periods(argBeginDate,argEndDate)

3

=A2.news(g;A1.product,A2(#):date,ifn(in,0):in, ifn(out,0):out,   stock[-1]+in-out:stock)

4

$select * from {A3}

Then we call the script file from Java as we call a stored procedure:

…

Class.forName("com.esproc.jdbc.InternalDriver");  
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");  
CallableStatement statement = conn.prepareCall("{call getStock(?, ?)}");  
statement.setObject(1, "2020-01-01");  
statement.setObject(2, "2020-01-31");  
statement.execute();  
…

Don’t worry if you have script files containing complex computing logics, SPL provides a professional IDE with a complete set of debugging functionalities and letting users to observe the intermediate result of each step in the form of a table, as shown below:

Besides the remarkable computational capabilities, SPL also supports diverse/multiple source mixed computing, involving csv, XML, JSON, all types of NoSQL databases and relational databases, as well as the handling of big xls files that exceed the memory space. There are more you can find out on SPL Official Website.