Is There Any Java Open-source Library Like SQLite?

Among the open-source libraries that can be embedded into and invoked by a Java program and that have the data computing ability, HSQLDB, Derby and H2 are relational databases. They share same issues with SQLite. Extremely inconvenient data loading procedure, which involves checking if the same table name already exists, creating a table, loading the source file, parsing fields, inserting records, and creating index, etc., inability to handle files of uncommon formats and special data sources, and crude JSON processing functionality. Speaking of JSON processing, an increasingly important computing functionality in recent years, these open-source libraries are not even as good as the embedded databases though the latter is as rudimentary. Tablesaw, Joinery and Morpheus are DataFrame-style libraries. Compared with databases, they are convenient when reading and computing the multi-level JSON data (semi-structured data), but perform far worse when computing two-dimensional structured data.

As a better alternative, esProc SPL, the open-source embedded Java library, has powerful abilities to handle two-dimensional structured data and multilevel JSON data, supports diverse data sources, and can deal with text files of uncommon formats well.

SPL has JDBC driver to be invoked by a Java program. The method is easy to learn. Here’s an example. We are trying to sort the tab-separated Orders.txt by Client column in ascending order and by Amount column in descending order. Below is the SPL code:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
String str="=T(\"D:/Orders.txt\").sort(Client,- Amount)";
ResultSet result = statement.executeQuery(str);
…

The whole procedure of data loading with an embedded database is replaced by a single T function in SPL. This is really convenient.

Considering that most database programmers have been accustomed to SQL, SPL specifically offers support of SQL syntax. The above sorting operation can thus be written as follows:

str="$select * from d:/Orders.txt order by Client, Amount desc"

There are hundreds of basic computing functions in SPL. Users can use them to handle most common computations directly. Here are some of them besides sort():

// Conditional query

str="=T(\\"D:/Orders.txt\\").select(Amount>1000 && Amount<=3000 && like(Client,\\"\*S\*\\"))";

//Grouping & aggregation

str ="=T(\\"D:/Orders.txt\\").groups(year(OrderDate);sum(Amount))";

//Join

str ="=join(T (\"D:/Orders.txt\"):O,SellerId; T(\"D:/data/Employees.txt\"):E,EId).new(O.OrderID,O.Client,O.SellerId,O.Amount,O.OrderDate,   E.Name,E.Gender,E.Dept)";

The processed data can be permanently stored as a btx file, which is performant and compact. To write two text files of same structure into one btx file and remove the duplicates, for instance, we have the following SPL code:


A

B

1

=[T("d:/orders1.csv"), T("d:/orders2.csv")].merge@u()

/Merge records of the two files

2

file("d:/fast.btx").export@b(A1)

 /Write A1’s result to a btx file

 

 

The btx file is used as an ordinary file:

str="=T(\"D:/fast.btx\").sort(Client,-Amount)"
str="$select * from d:/fast.btx order by Client, Amount desc"

We can also pre-sort a btx file to obtain higher performance, such as the use of binary search. Read Bin Files to learn more about btx files.

 

An SPL algorithm can be saved as a separate file independent of the Java program to reduce the couplingness. For example, we are trying to find the first n big customers whose orders amount takes up at least half of the total and sort records of them by amount in descending order. First, we save the SPL script as a file (bigCustomer.dfx):


A

B

1

= T("D:/data/sales.csv").sort(amount:-1)

Retrieve records and sort them in descending order

2

=A1.cumulate(amount)

Get the sequence of cumulative amounts

3

=A2.m(-1)/2

The final cumulative amount is the total

4

=A2.pselect(~>=A3)

Get the position of the records where the amount takes up at least half of the total

5

=A1(to(A4))

Get the records according to A4’s position

Then we call the script file from Java in the way of calling a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");
Connection connection =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery("call bigCustomer ()");
...

 

Sometimes the computing logic is so complicated that it is hard to phrase even in database stored procedure. Such scenarios include post-grouping computations, order-based computations and set-oriented operations. SPL, with its expressive syntax and rich library functions, can significantly simplify the computing logic. To calculate the largest number of days when a given stock rises in a row, for example, SPL only needs two lines of code:


A

B

1

=T("d:/AAPL.xlsx")

Retrieve records from the Excel file, reading the first row as column headers

2

=a=0,A1.max(a=if(price>price[-1],a+1,0))

Count the longest consecutive rising days

Some computations have long code or involve branches and loops. SPL offers the special IDE for convenient trace and debug, as well as for observing result of each step, as shown below:

undefined 

SPL is convenient in dealing with text files of uncommon formats. A simple task, such as reading data as double-bars-separated (||), SPL has the following statement:

=file("D:/Orders.txt").import@t(;,"||")

Read SPL: reading, writing, and parsing of unstructured text to learn more about handling text files with complicated formats.

 

SPL supports various data sources, including text files, Excel files, relational databases, NoSQL databases, WebService, and RESTful data. For example, RESTful returns a multilevel JSON file describing multiple employees and the multiple orders of each employee, and we are trying to find all orders whose amounts are between 500 to 2000 and where client names contain characters bro. The SPL code is as follows:


A

1

=json(httpfile("http://127.0.0.1:6868/api/getEmpOrders").read())

2

=A1.conj(Orders)

3

=A2.select(Amount>500 &&   Amount<=2000 && like@c(Client,"*bro*"))

As a powerful open-source embedded Java computing library, esProc SPL outperforms databases in computing structured data, is more convenient than DataFrame-style libraries in processing multilevel data, and outshines both in data source support.