Is There Any Simple and Lightweight In-memory Database Technology?

 

HANA, SPARK and Redis are commonly used in-memory databases, but they all have complex and heavy frameworks, compromising their ranges of applications. The popular simple and lightweight in-memory technology is SQLite, which is nimble and simple-framed and can be directly embedded in a Java program. Its disadvantages include lack of independent services and support for stored procedures, unstable environment and slow execution, weak computing abilities, and complicated data loading process for external data use.

The technology has better and more powerful embodiment – esProc SPL.

The open-source Java class library has simple and agile framework, enabling it to be embedded into Java code directly and have independent services. SPL offers various performance optimization techniques, and boasts outstanding computational capabilities, including the direct access and use of external data.

 

SPL’s framework is simple, being able to operate without having to run independently. Users just need to import the SPL jars to embed it into the Java code for computation. First, load external data, such as data in Oracle, to the memory:

A

1

=connect("orcl")

2

=A1.cursor@x("select OrderID,Client,SellerID,OrderDate,Amount from orders order by OrderID")

3

=A2.memory(OrderID).index()

4

>env(orders,A3)

Then perform in-memory computations using the simple and intuitive SPL syntax.

A

1

=Orders.select(Amount>=arg1 && Amount<arg2)

2

=A1.groups(year(OrderDate):y,month(OrderDate):m; sum(Amount):s,count(1):c)

SPL offers JDBC driver to be conveniently integrated in a Java program. The size of data is generally large and data loading is usually performed once and for all at the start of the application. Now store the above loading process as a SPL script file and invoke it in Java code using the way of calling a stored procedure:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
CallableStatement statement = conn.prepareCall("{call init()}");
statement.execute();
...

In-memory computations are usually executed repeatedly because of high concurrency. The external script file is used more often for a long piece of code that needs not to be compiled after any changes. A short piece of code, on the other hand, can be combined into one statement, as a SQL one, to be written within the Java program:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://");
Statement statement = conn.createStatement();
String arg1="1000";
String arg2="2000"
ResultSet result = statement.executeQuery(=Orders.select(Amount>="+arg1+" && Amount<"+arg2+").groups(year(OrderDate):y,month(OrderDate):m; sum(Amount):s,count(1):c)");
...

SPL offers a wealth of functions to carry off in-memory computation effortlessly. For example:

A

B

1

=Orders.find(arg_OrderIDList)

//Multi-key-value-based search

2

=Orders.select(Amount>1000 && like(Client,\"*S*\"))

//Fuzzy query

3

= Orders.sort(Client,-Amount)

//Sort

4

= Orders.id(Client)

//Distinct

5

=join(Orders:O,SellerId;Employees:E,EId).new(O.OrderID, O.Client,O.Amount,E.Name,E.Gender,E.Dept)

//Join

SPL supports standard SQL syntax, including a lot of string functions and date functions, join operations, set-oriented computations and subqueries. The previous computation, for instance, can be rewritten as the following SQL statement:

$select year(OrderDate) y,month(OrderDate) m, sum(Amount) s,count(1) c
from {Orders}
Where Amount>=? and Amount<? ;arg1,arg2

 

SPL’s agile framework enables both embedded computations and independent services. This is suitable for applications having demand for high stability. It is worth mentioning that users just need to mark “service-side execution” without changing the SPL code and integration mode:

Class.forName("com.esproc.jdbc.InternalDriver");
Connection conn =DriverManager.getConnection("jdbc:esproc:local://onlyServer=true");
Statement statement = conn.createStatement();
String arg1="1000";
String arg2="2000"
ResultSet result = statement.executeQuery(=Orders.select@m(Amount>="+arg1+" && Amount<"+arg2+").groups(year(OrderDate):y,month(OrderDate):m; sum(Amount):s,count(1):c)");
...

 

SPL possesses outstanding computational capabilities to simplify computations with complex logic, including stepwise computations, order-based computations and post-grouping computations. SPL can effortlessly deal with computations that are hard to handle in SQL and stored procedures. Here is an example. We are trying to find the first n big customers whose sales amount occupies at least half of the total and sort them by amount in descending order:

A

B

2

=sales.sort(amount:-1)

/Sort by amount in descending order

3

=A2.cumulate(amount)

/Get the sequence of cumulative totals

4

=A3.m(-1)/2

/ The last cumulative total is the final total

5

=A3.pselect(~>=A4)

/Get position of record where the cumulative total reaches at least half of the total

6

=A2(to(A5))

/ Get values by position

 

In addition to regular primary key and index-based methods, SPL offers a lot of high-performance data structures and algorithms, achieving much better performance than SQL-based in-memory databases and occupying less memory space. It takes a single machine in SPL to accomplish certain computations that need HANA/Spark clusters generally.

SPL supports parallel processing to boost performance by making use of the multi-core CPU. For example:

A

1

=Orders.select@m(Amount>=arg1 && Amount<arg2)

2

=A1.groups(year(OrderDate):y,month(OrderDate):m; sum(Amount):s,count(1):c)

SPL supports pointer-style data reuse to remarkably reduce memory usage. SPL data engages in computations in the form of pointers, enabling one table to be repeatedly used in stepwise algorithms and different algorithms. SQL, however, needs to copy desired records for each time of computation.

SPL supports using the pre-join technique to increase computing performance. SQL achieves the technique by copying records, and that consumes too much memory. SPL implements it through the pointer-style data reuse that uses little memory during the loading phase. It uses the point "." to reference a related field value during an in-memory computation, which is convenient and intuitive:

=callRecord.sum(OUTID.BRANCHID.OUTCOST+INID.BRANCHID.INCOST+OUTID.AGENTID.OUTCOST+INID.AGENTID.INCOST)

SPL supports memory compression that allows loading more data into the memory without modifying the code.

 

SPL supports various external data sources, including RDBs, files like txt\csv\xls, NoSQL databases such as MongoDB, Hadoop, Redis, ElasticSearch, Kafka and Cassandra, and multilevel data such as WebService, XML, Restful and Json, avoiding troubles of data loading and format conversion. To load an HDSF file to the memory, for instance:

A

1

=hdfs_open(;"hdfs://192.168.0.8:9000")

2

=hdfs_file(A1,"/user/Orders.csv":"GBK")

3

=A2.cursor@t()

4

=hdfs_close(A1)

5

=A3.memory(OrderID).index()

6

>env(orders,A5)

It is convenient to achieve mixed computations involving data stored both on storage device and in memory in SPL, enabling the language to compute a large volume of data that cannot fit into the memory together with a memory table. Suppose we have the primary table orders that is already loaded to the memory and a large detail table orderdetail that is stored in a text file, and we are trying to associate the two tables. We have the following SPL code:

A

1

=file("orderdetail.txt").cursor@t()

2

=orders.cursor()

3

=join(A1:detail,orderid; A2:main,orderid)

4

=A3.groups(year(main.orderdate):y;sum(detail.amount):s)