Can We Execute SQL on TXT/CSV Files in Java?

Some file JDBC drivers, such as simoc csvjdbc and xiao321 csvjdb, enable the direct execution of SQL statements on txt/csv files. The issue is that they have too weak computing capabilities and thus little uses. Java embedded databases, such as HSQLDB, Derby and H2, as well as SQLite, a C-language library equipped with JDBC driver, have some computing capabilities, but they cannot execute SQL statements directly on files. Instead, they require that data be loaded into the database. Data loading is an extremely process. It involves checking if the same table name already exists, deleting and creating a table, parsing the file, importing data and creating index, and so on. Besides, they cannot load files of uncommon formats into the database. The most intolerable aspect of data loading is that it is extremely slow, seriously dragging the overall performance down.

The best choice is esProc SPL. The Java open-source library boasts outstanding computing capabilities and support executing SQL directly on files.

The uses of SPL are simple and easy to learn. For example, we are trying to perform conditional query on tab-separated orders table Orders.txt:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="$select * from d:/sOrder.txt where Client like'%S%'or (Amount>1000 and Amount<=2000)";
ResultSet result = statement.executeQuery(str);
…

SPL supports common SQL syntax. Users can use it to achieve various daily data analysis tasks. Here are some examples:

sort

$select * from sales.txt order by Client,Amont desc

distinct

$select distinct(sellerid) from sales.csv 

group by…having

$select year(orderdate) y,sum(amount) s from sales.csv group by year(orderdate) having sum(amount)>=2000000

join

$select e.name, s.orderdate, s.amount from sales.txt s left join employee.xlsx e on s.sellerid= e.eid

SPL also supports advanced SQL syntax for handling complex and difficult computations that involve, for instance, case when, with and nested subquery. More examples can be found in Examples of SQL Queries over Files.

Generally, it is hard to parse files of uncommon formats with SQL tools. SPL offers extension functions to do the parsing effortlessly. To parse a file separated by double vertical lines ("||"), for instance, SPL is purely simple:

$select * from {file("sep.txt").import@t(;,"||")}

You can also conveniently parse data types of non-default formats that SQL tools find hard to handle with SPL extension functions. To parse a date value in the format "01-01-2012" instead of commonly used format "2012-01-01", SPL generates simple code as follows:

$select * from {file("style.txt").import@t(orderid,client,sellerid,amount,orderdate: date: "dd-MM-yyyy")}

You can also use extension functions to parse files of more complex formats.

Sometimes the computing logic is so complicated that it is hard to achieve even in database stored procedure. Yet, SPL can implement the algorithms effortlessly thanks to the computing capabilities enabled by its extension functions. For example, the dates when goods are moved out and moved in are usually not continuous, and we are trying to get the inventories of a number of products in a specified continuous time period. To do this, we write code in SPL and save it as a script file. Playing the same role as the stored procedure, the script file can reduce the couplingness of Java code and does not need to be recompiled whenever there are any modifications.


A

1

=T("d:/inout.txt").group(product;~.align(A2,date):g)

2

=periods(argBeginDate,argEndDate)

3

=A2.news(g;A1.product,A2(#):date,ifn(in,0):in, ifn(out,0):out, stock[-1]+in-out:stock)

4

$select * from {A3}

Then we call the script file from Java as we call a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");  
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");  
CallableStatement statement = conn.prepareCall("{call getStock(?, ?)}");  
statement.setObject(1, "2020-01-01");  
statement.setObject(2, "2020-01-31");  
statement.execute();  
...

Catering to script files containing complex computing logics, SPL provides a professional IDE with a complete set of debugging functionalities and letting users to observe the intermediate result of each step in the form of a table, as shown below:

undefined 

SPL supports various data sources, including Excel files, XML data, and JSON data, as well as high-performance big file processing and diverse source mixed computing between, such as, file and database/NoSQL database/RESTful. You can find relative discussions in our series of essays.