esProc SPL, the challenger of SQLite

 

Many small and micro applications also require certain data processing and computing abilities. Integrating a database will make the application too heavy. In this case, SQLite is a good choice for its simple framework, easy-integration, ability of storing data persistently and, it provides SQL to implement calculation.

However, for some more complex scenarios, SQLite still falls short.

The shortcomings of SQLite in the face of complex scenarios

Data source support

SQLite is like a database, and can provide good support for its own database files, but sometimes applications need to process other forms of data, such as text files, Excel, other databases, and data from the web like Restful. SQLite only supports the reading of csv files and does not support other data sources unless hard-coding. Moreover, although SQLite supports csv files, the process is cumbersome.Specifically, it needs to create a database in command line first, then use thecreatecommand to create a structured table, next use theimportcommand to import data, and finally query data in SQL.

In addition to conventional structured data, modern applications often encounter the data of complex formats such as Json and XML. SQLite has the ability to calculate Json strings, but does not support reading multi-layer data sources directly, including Json files/RESTful, so it has to hard code or resort to third-party libraries to piece together aninsertstatement to insert into the data table. As a result, the code is very cumbersome. SQLite cannot calculate XML strings, let alone read XML files/WebServices.

Sometimes, applications need to write data as the files of general format to export, transfer and exchange, and sometimes applications need to actively write data to other data sources. However, SQLite can only persist data to its own database file and cannot directly write to external data sources, including basic csv files.

Complex calculation

Since SQLite uses SQL statements to compute, both the advantages and disadvantages of SQL are inherited. SQL is close to natural language, making it easy to learn and implement simple calculations. However, SQL is not good at complex calculations, often causing cumbersome and difficult to understand code.

Even for less complex calculations, it is not easy to implement in SQL. For example, calculate the top 3 orders by sales for each customer:

select * from (select *, row_number() over (partition by Client order by Amount desc) as row_number from Orders) where row_number<=3

Since this calculation task is to compute the in-group top N records, it needs to use window function to generate an in-group pseudo sequence number column first, and then filter on the pseudo column, the code thus appears complicated.
For complex calculations, SQL code will be longer and more difficult to understand. For example, calculate the maximum consecutive days that a stock keeps rising:

select max(continuousdays)
from (
    select count(*) continuousdays
from (
        select sum(risingflag) over (order by day) norisingdays
        from (
            select day, case when price>lag(price) over (order by day) then 0 else 1 end risingflag
            from tbl
        )
) group by norisingdays
)

Since it is difficult to directly express the concept of consecutive rising in SQL, we have to take an indirect way, that is, calculate the number of consecutive rising days through cumulative number of days that the stock does not rise. This method requires strong skill, making it difficult to code and understand. Moreover, it is difficult to debug SQL, resulting in the maintenance difficulty.
Another example, find out the top n customers whose cumulative sales account for half of the total sales, and sort them by sales in descending order:

with A as
(select client,amount,row_number() over (order by amount) ranknumber
from sales)
select client,amount
from (select client,amount,sum(amount) over (order by ranknumber) acc
     from A)
where acc>(select sum(amount)/2 from sales)
order by amount des

It is difficult to handle the customer who is exactly in the critical position in SQL, so we have to adopt an indirect way, that is, first calculate the cumulative sales in ascending order, and then find out the customers whose cumulative sales is not in the second half of calculation result. This method also requires strong skill, and the code is long and difficult to debug.

In addition, SQLite doesn’t provide rich date and string functions. For example: SQLite lacks the function of getting the date before or after a specified number of quarters, and the function of getting the date after N workdays, etc., which limit SQLite and make it unsuitable for scenarios with complex computing requirements.

Flow processing

SQL itself lacks flow processing capability, so the database will resort to stored procedures to achieve a complete business logic. However, since SQLite does not support stored procedures, it cannot directly achieve a complete business logic, and has to resort to main application. To be specific, first convert SQL’s data objects to the data objects in application (such as Java’s resultSet/List<EntityBean>), then use the for/ifstatements of main application to process the flow, and finally convert back to SQL’s data objects. As a result, the code is very cumbersome. When the business logic is very complex, it needs to convert between SQL object and main application object many times, which is more troublesome and related SOL code will not be shown here.

Use esProc SPL to solve the difficulties of SQLite

To provide data processing and computing abilities for small- and micro-Java applications, there is a better choice: esProc SPL.

esProc SPL is an open-source data processing engine in Java, and simple in framework and easy to integrate. In addition, esProc SPL is able to store data persistently, and has sufficient computing ability. Such characteristics are similar to those of SQLite.

Simple framework: there is no need to configure server, nodes and cluster. As long as the SPL jars are imported, SPL can be deployed in Java environment.

SPL provides JDBC driver, making it easy to integrate into Java applications. For simple query tasks, the coding complexity is similar to SQL.

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = conn.createStatement();
ResultSet result = statement.executeQuery("=T(\"D:/Orders.csv\").select(Amount>1000 && like(Client,\"*s*\"))");

SPL supports data persistence and can save data to its own data format (bin file). For example, add records in batches:


A

1

=create(OrderID,Client,SellerID,Amount,OrderDate)

2

=A1.record([201,"HDR",9,2100.0,date("2021-01-01"),

202,"IBM",9,1900,date("2021-01-02"),

203,"APPLE",4,1900,date("2021-01-03")])

3

=file("d:/Orders.btx").export@ab(A2)

For the export@abin A3, @ameans appending, @bmeans bin file format.

In addition to the ability to directly persist data, SPL also allows us to process the in-memory table sequence (SPL’s structured data object, similar to SQL’s result set), and then overwrite it to bin file. The specific approach is to change export@abto export@b. Although the performance of this method is not as good as that of SQLite, the overwriting speed is usually acceptable as the amount of data involved in small and micro application is generally small.

The composite table is another own data format of SPL, and supports inserting, deleting and updating records at high performance, and is suitable for high-performance computing with large data amount (this is not the point of this article).

Besides own data formats, SPL can also save data to csv files. We just need to change A3to:

file("d:/Orders.csv").export@tc(A2)

SPL has enough computing ability, and supports various SQL-style calculations, including calculation after grouping (window function):


A

B

1

=Orders.new(Client,Amount)

// Select some of fields

2

=Orders.select(Amount>1000 && like(Client,\"*s*\"))

// Fuzzy query

3

= Orders.sort(Client,-Amount)

// Sorting

4

= Orders.id(Client)

// Distinct

5

=Orders.groups(year(OrderDate):y,Client;sum(Amount):amt).select(amt>3000)

// Grouping and aggregating

6

=[Orders.select(Amount>3000),A1.select(year(OrderDate)==2009)].union()

// Union

7

=Orders.groups(year(OrderDate):y,Client;sum(Amount):amt).select(like(Client,\"*s*\"))

// Subquery

8

=A5.derive(amt/amt[-1]-1: rate)

// Cross-row calculation

SPL provides basic SQL syntax, for example, grouping and aggregating:

$select year(OrderDate) y,month(OrderDate) m, sum(Amount) s,count(1) c from {Orders}
Where Amount>=? and Amount<? ;arg1,arg2

Besides these basic abilities, SPL also overcomes various shortcomings of SQLite. SPL fully supports various data sources and has stronger computing ability and is convenient in flow processing, making it possible to handle more complex application scenarios.

Data source support

It only takes one step to read csv files in SPL: embed the following SPL code in Java:

T("d:/Orders.csv").select(Amount>2000 && Amount<=3000)

The function Tcan read both the bin file and csv file, and generate a table sequence. When importing data in SPL, the data type is parsed automatically, and there is no need to specify manually. The whole process does not require extra code, which is much more convenient than SQLite.

If the format of csv file is not regular, we can use the importfunction to specify the separator, field type, number of skipped lines, and process the escape character, quotation marks, parentheses, etc. Thus, SPL has much more functions than SQLite.

SPL has built-in multiple data source interfaces, including tsv, xls, Json, XML, RESTful, WebService and other databases, and even supports special data sources such as Elasticsearch and MongoDB.

All these data sources can be accessed directly, so it is very convenient. For other data sources not listed above, SPL provides interface specification. Just exporting data as the structured data objects of SPL according to the specification, subsequent calculations would be ok.

SPL can directly parse multi-layer data sources. For example, read and calculate Json file:

json(file("d:/xml/emp_orders.json").read()).select(Amount>2000 && Amount<=3000)
json(httpfile("http://127.0.0.1:6868/api/orders").read()).select(Amount>2000 && Amount<=3000)

XML file:


A

1

=file("d:/xml/emp_orders.xml").read()

2

=xml(A1,"xml/row")

3

=A2.select(Amount>1000 && Amount<=2000 && like@c(Client,"*business*"))

WebService:


A

1

=ws_client("http://127.0.0.1:6868/ws/RQWebService.asmx?wsdl")

2

=ws_call(A1,"RQWebService":"RQWebServiceSoap":"getEmp_orders")

3

=A2.select(Amount>1000 && Amount<=2000 && like@c(Client,"*business*"))

SPL’s table sequence supports multi-layer structured data, and is easier to express Json/XML than the two-dimensional structure of SQL database table. Moreover, the calculation code is also simpler. This part of content is not the focus of this article, so we skip it over.

Cross-source computing

SPL has good openness and can directly calculate multiple data sources; we can perform cross-source computing between these data sources and SPL bin file. For example, the following code is to perform an inner join on a bin file and a csv file first, and then group and aggregate:

join(T("d:/Orders.btx"):o,SellerId; T("d:/Emp.csv"):e,EId).groups(e.Dept;sum(o.Amont))

Cross-source computing between external data sources can also be easily performed in SPL. For example, the following code is to left join on csv file and RESTful:

join@1(json(httpfile("http://127.0.0.1:6868/api/orders").read()):o,SellerId; T("d:/Emp.csv"):e,EId)

It is easier to read in multi-step form:


A

1

=Orders=json(httpfile("http://127.0.0.1:6868/api/orders").read())

2

=Employees=T("d:/Emp.csv")

3

=join@1(Orders:o,SellerId;Employees:e,EId)

In short, we can implement cross-source computing in SPL only, without resorting to Java or command line, the SPL code is short and easy to understand, and the development efficiency of SPL is much higher than that of SQL.

Persisting data to any data source

In addition to supporting the persistence of data to its own formats, SPL also supports the persistence of data to other data sources. Likewise, SPL uses table sequence as medium. For example:

file("d:/Orders.csv").export@t(A2)			//csv file
file("d:/Orders.xlsx").xlsexport@t(A2)			//xls file
file("d:/Orders.json").write(json(A2))			//json file

In particular, SPL supports the persistence of data to any database. Let's take Oracle as an example:


A

B

1

=connect("orcl")

/Connect to external oracle

2

=T=A1.query("select * from salesR where SellerID=?",10)

/Batch query, table sequence T

3

=NT=T.derive()

/Copy and generate new table sequence NT

4

=NT.field("SELLERID",9)

/Batch modify new table sequence

5

=A1.update(NT:T,sales;ORDERID)

/Persist the modification to database

Persisting data to database still uses table sequence as medium, whichhas very obvious advantages: the function updatecan automatically compare the table sequence before and after modifications (inserting, deleting and updating), making it easy to persist batch data.

Computing ability

SPL supports ordered computing, set computing, stepwise computing and association computing. By means of these abilities, complex structured data computation can be simplified.

For a simple example: find out the top 3 orders by sales for each customer:

Orders.group(Client).(~.top(3;Amount))

ThisSPL code is intuitive, it first groups by Client, then calculates the TopNof each group (represented by the symbol ~). The reason why SPL code is simple is apparently because SPLdirectly supports the top function that SQL doesn’t provide, butthe essential reason is that SPL has real row number field, in other words, SPL supports the ordered set. Another reason is that the set orientation operations in SPL is more thorough, which can implement real grouping (grouping only, not aggregating). In this way, the in-group data can be calculated intuitively.

For more complex calculations, it is also not difficult to implement in SPL. For example, calculate the maximum consecutive days that a stock keeps rising:


A

1

=tbl.sort(day)

2

=t=0,A1.max(t=if(price>price[-1],t+1,0))

It is easy to express consecutive rising concept in SPL, which takes only two steps: sort by date, then traverse the records. If it is found the stock price rises, increment the counter by 1. In this code, both the loop function maxand the ordered set are utilized, and [-1] represents the previous record, which is the representation method of relative position, and price[-1]represents the stock price of the previous trading day, which is more intuitive than shifting the whole row (lag function).

Another example: find out the top n customers whose cumulative sales account for half of the total sales:


A

B

2

=sales.sort(amount:-1)

/Sort records by sales in descending order (can be done in SQL)

3

=A2.cumulate(amount)

/Get a sequence of cumulative amounts

4

=A3.m(-1)/2

/Calculate the final accumulative amount, i.e., the total

5

=A3.pselect(~>=A4)

/Get position where the amount reaches half of the total

6

=A2(to(A5))

/Get records by position

The set orientation in SPL is more thorough, as it can use variables to express the set conveniently, and continue to use variables to reference the set in the subsequent calculation, therefore, SPL is particularly suitable for multi-step calculation. Dividing a big problem into multiple small steps can easily achieve complex calculation objective, and the code is not only short, but easy to understand. In addition, multi-step calculation naturally supports debugging, which virtually improves development efficiency.

This code utilizes SPL’s abilities of ordered computing, set computing and stepwise computing. By means of these abilities, SPL is able to implement both the simple and complex calculations very well. Moreover, SPL supports discrete records, and can use the dot notation to reference the associated table intuitively, thereby simplifying complex association calculations.

SPL also offers richer date and string functions, far exceeding conventional database in terms of number and functionality.

It is worth mentioning that in order to further improve development efficiency, SPL invents unique function syntax.

Flow processing

SPL comes with flow control statements. Using such statements together with the table sequence objects built in SPL can conveniently achieve complete business logic.

Branch structure:


A

B

2


3

if T.AMOUNT>10000

=T.BONUS=T.AMOUNT*0.05

4

else if T.AMOUNT>=5000 && T.AMOUNT<10000

=T.BONUS=T.AMOUNT*0.03

5

else if T.AMOUNT>=2000 && T.AMOUNT<5000

=T.BONUS=T.AMOUNT*0.02

Loop structure:


A

B

1

=db=connect("db")


2

=T=db.query@x("select * from sales where SellerID=? order by OrderDate",9)

3

for T

=A3.BONUS=A3.BONUS+A3.AMOUNT*0.01

4


=A3.CLIENT=CONCAT(LEFT(A3.CLIENT,4), "co.,ltd.")

5


In addition to the above code, SPL has also more flow processing functions for structured data, which can further improve the development efficiency. For example, taking a batch of records rather than one record in each round of loop; loop one round when the value of a certain field changes.

The above business logic can be saved as script file, and placed outside the application, and called as a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
CallableStatement statement = conn.prepareCall("{call queryOrders()}");
statement.execute();

SPL is an interpreted code, and can run directly after modification without compiling or restarting application, making it possible to effectively reduce the maintenance cost. The external SPL script can not only effectively reduce the system coupling, but also has the characteristics of hot swap. In contrast, SQLite doesn’t support stored procedures, so the business logic cannot be placed outside main application, resulting in tight coupling and poor application framework.
SPL is significantly better than SQLite in Java application. But for non-Java applications, it becomes a bit troublesome. In that case, we have to resort to independent ODBC or HTTP server, which will make the framework heavy and reduce the integration. It should be noted that SPL can run normally on android, which belongs to the Java system, but SPL doesn’t support iOS till now due to the absence of a relative mature JVM environment on it.