Then let’s talk about discreteness, and what are the advantages of Java over SQL?

We discussed the advantages of SQL over Java, which is its set-oriented feature. Now let’s take a look at what advantages Java has over SQL.

The code in Java may seem long and messy, but upon closer examination, one will find that the computational logic it describes is not difficult, and it basically follows a step-by-step approach to implement business goals. That is to say, Java is tedious to write, not difficult to think.

But SQL is different. It is not difficult to understand the technical significance of each subquery, but it is difficult for you to understand what it really wants to do and how it serves the ultimate business goals. That is to say, SQL is written more concisely, but the difficulty of thinking is even greater.

Why is this?

We have previously discussed a topic of “Simple SQL Statements Only Exist in Coursebooks and Training Courses”, which pointed out that SQL has problems such as incomplete set orientation and lack of support for order. These problems, as well as other problems with SQL, all have a common root cause, which leads to that although SQL is less cumbersome than Java, its difficulty is more significant.


Let’s start with an example to calculate the median of a sequence.

We will not use structured data here, and only use a simple array, otherwise it will hit the soft spot of Java, and if the Java code is too long, it will cover up the key issue.

SQL:

WITH TN AS (SELECT ROWNUMBER() OVER (ORDER BY V) RN,V FROM T),
   N AS (SELECT COUNT(*) N FROM T)
SELECT AVERAGE(V) FROM TN,N
WHERE RN>=N.N/2 AND RN<=N.N/2+1

Java:

Array.sort(v);
return (v[(v.length-1)/2] + v[v.length/2])/2;

When structured data and Lambda syntax are not involved, Java often appears more concise than SQL. Moreover, a careful understanding of the calculation process of these two pieces of code reveals that Java code is not only concise but also more efficient.

On the surface, the difficulty with SQL is that it cannot directly retrieve members using sequence numbers, and it does not have the concept of sequence numbers, so it is necessary to create a sequence number column, and a window function is also used here, otherwise the sequence is difficult to create. Java, on the other hand, can easily retrieve members from an array using sequence numbers for calculation.


The root cause here lies in the difference in data models between Java and SQL.

The data in high level languages such as Java is based on some atomic data that cannot be further split, such as numbers and strings. Atomic data can form more complex data such as sets and records, which are also some kinds of data, and can also be combined to form larger sets and records. The data that constitutes sets and records is not attached to them and can exist independently and participate in calculations. Naturally, it provides the operation of disassembling members from sets and records (using sequence numbers and fields to retrieve members). This free form of data organization is called discreteness. After supporting discreteness, it is easy to construct a set of sets and records where field values are records.

SQL treats tables (i.e. sets of records) as atomic data, which is not composed of more basic atomic data. SQL data also lacks composability, and the set of sets and record of records do not exist in SQL. In SQL, records must be attached to tables and cannot exist independently, which means members must be attached to sets. Disassembling set members is meaningless because there is no corresponding data type to carry them. The so-called disassembly actually is conducted by the filtering operation like WHERE, which seems a bit convoluted. Filtering operation is essentially calculating subsets, and the result is still a table (set), and a single record is essentially a table with only one record (set with only one member). This data organization method of SQL is very inflexible and lacks discreteness.

Almost all high-level languages naturally support discreteness, while SQL does not.


Discreteness is a natural characteristic, and things naturally develop from simplicity to complexity, which is in line with people’s natural thinking. The business logic is not entirely focused on the overall set, but there are many operations that target specific set members or separated data outside the set. The lack of discreteness will increase the difficulty of operations on these data that are not the overall set, which means there will be “detours”.

Take a simple example of structured data: for example, to list employees whose age and income are both greater than John, and to write SQL naturally, it would be as follows:

WITH A AS (SELECT * FROM employee WHERE name=...)
SELECT * FROM employee WHERE age>A.age AND salary>A.salary

But unfortunately, this SQL is illegal, and the latter half needs to be written using JOIN:

SELECT T.* FROM employee T,A WHERE T.age>A.age AND T.salary>A.salary

A bit convoluted, right? This can be used to understand the difference between tables and records in SQL, as there is no data type that can hold records.

Java’s support for set operations on structured data is not good, so we will use SPL with better set characteristics to implement the same operation:

a = employee.select@1(name==...)
employee.select( age>a.age && salary>a.salary )

This is natural thinking.


Set orientation is a syntactic form that corresponds to the complexity of the code; Discreteness is a data model that corresponds to the difficulty of the code; The code written in Java without set orientation is very complex, while the code written in SQL without discreteness may not be very long, but it will be very convoluted and more difficult.

The difficulty of SQL is almost always caused by the lack of discreteness. Without a set of sets, SQL cannot maintain grouped subsets during grouping and must force aggregation, and SQL’s set-orientation is not thorough. There are no separate records and sets composed of them, and only foreign keys can be used to represent the association relationships between data. The code is cumbersome and difficult to understand, and the computational performance is poor. SQL, which lacks discreteness, cannot use intuitive reference mechanisms to describe associations. Specifically, without the support of discreteness, it is difficult for SQL to describe ordered calculation. Ordered calculation is a typical combination of discreteness and set: the order of members is only meaningful when they are in a set, this requires set; During ordered calculation, it requires to differentiate each member from adjacent members, and this emphasizes discreteness.

Due to space limitations, we will not delve into these issues in detail here. We will discuss them one by one in the future.

We need a language that combines set orientation and discreteness, while having both the advantages of SQL and Java. Well, that’s why we have invented esProc SPL, and also why the theoretical system of SPL is called discrete datasets.