Can Java Stream API replace SQL?

Java SE 8 introduces the Stream API. This new way of writing code is quite different from the previous way of processing collections. With only a filter, map, reduce, and iterate, you can write concise and expressive data processing queries. For this reason, many programmers try to replace SQL with Stream. But in fact, the level of professionalism of Stream is far less than that of SQL.

When the collection members are simple data types (integer, floating-point, string, date), Stream can easily implement collection calculations. However, the data object of structured computing is not a simple data type but a record (Map\ entity\ record). Once the data object becomes a record, Stream is not so convenient. Stream does not directly support associative calculations. The associative calculations implemented by hard-coding are lengthy and complex in logic and more difficult to write. It is a challenge for professional JAVA programmers.

In fact, for most programmers who are accustomed to using SQL to manipulate data, if you use the Open-esProc computing package, you can get the benefits of SQL without SQL and make up for the shortcomings of Stream. Its usage is similar to using SQL to calculate the data set and calling its encapsulated SPL calculation script in the program. For example, “find out classes with an average English score of less than 70”, most databases operate like this, “select CLASS, avg(English) as avg_En from students_scores group by CLASS having avg(English)<70”, which is implemented with SPL code.


A

1

=file(“E:/txt/Students_scores.csv”).import@tc()

2

=A1.groups(CLASS;avg(English):avg_En)

3

=A2.select(avg_En<70)

It is not complicated to use Stream directly to accomplish the above tasks, but SPL is simpler and easier to learn. Store the script file (such as condition.dfx) and Java together and call it in JAVA through the JDBC interface. The usage is similar to a stored procedure. In this way, the SPL calculation process is independent, and it is very convenient to change when the demand changes.

…
 ResultSet result = statement.executeQuery("call condition.dfx");
…

SPL also supports SQL-like usage without script files, directly embedding it in Java.

…
 ResultSet result = statement.executeQuery("
=file(\“E:/txt/Students_scores.csv\”).import@tc()
.groups(CLASS;avg(English):avg_En).select(avg_En<70)");
…

SPL also provides a method of querying data with SQL, convenient for programs familiar to us directly. For example, state, department, and employee information are stored in three text files (the same for replacing the three files with three SPL table sequence objects) and query employees in New York state whose manager is in California.


A

1

$select   e.NAME as ENAME
from   E:/txt/EMPLOYEE.txt  as e
     join E:/txt/DEPARTMENT.txt as d on   e.DEPT=d.NAME
     join E:/txt/EMPLOYEE.txt  as emp on d.MANAGER=emp.EID
where   e.STATE='New York' and emp.STATE='California'

When the data is large, the memory size will be a processing bottleneck. SPL can read through cursors, similar to the cursors in the database, reads in small batches, and then binds calculations on the cursors to achieve sorting, association, and grouping calculation to use small memory to process the big data.


A

1

=file("E:/txt/Employees.txt").cursor@t().sortx(EId)

2

=file("E:/txt/Orders.txt").cursor@t().sortx(SellerId)

3

=joinx(A2:O,SellerId; A1:E,EId)

4

=A3.groups(E.Dept;sum(O.Amount))

Big data processing often needs to add parallel computing to improve computing efficiency. Each thread processes a piece of data and finally summarizes each thread’s processing results.

A
1 =file(“E:/txt/user_info_reg.csv”).cursor@tcm(;4)
2 =A1.groups(id_province;count(~):cnt)

It is very easy to use parallel speedup in SPL. @m means parallel computing, and parameter 4 means 4-way parallel. Compared with single-threaded code, there is only one more cursor option and parameter, making it very convenient for users to use parallelism.

Using SPL can greatly simplify the calculation of structured data in Java programs. Examples are summarized as follows:

Loop operations

Accessing members of data set by sequence numbers

Locate operations on ordered sets

Alignment operations between ordered sets

TopN operations

Existence checking

Membership test

Unconventional aggregation

Alignment grouping

Select operation

More calculation examples: Use SPL in applications