How to handle set operations easily in Java

  As a Java programmer, I need to write a lot of code to achieve SQL-style set operations. I always wondered if there was an independent tool in which I write a SQL-like script, call it directly from a Java application and return the result set to Java until I encountered esProc. It’s awesome. The Java-based product is SPL (Structured Process Language) driven. The language generates concise and intuitive yet effective and efficient script that is integration-friendly. A set-based language as SQL claims, it provides limited support for performing order-based set operations and thus generates difficult to understand code. SPL, however, is based on discrete data set model. This enables it to handle order-based set operations effortlessly. Now let me show you how SPL manipulates set operations for Java through examples.

Basic uses

Concatenation

  Example1: Calculate the number of days in overlapping time periods.

MySQL8:

with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9'
    union all select date'2010-01-15',date'2010-01-16'
    union all select date'2010-01-07',date'2010-01-12'
    union all select date'2010-01-08',date'2010-01-11'),
t1(d,end) as (select start,end from t
    union all select d+1,end from t1 where d
select count(distinct d) from t1;

  Note: First get dates within each time period and then count the number of different dates.

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select date'2010-01-07'start,date'2010-01-9'end   union all select date'2010-01-15',date'2010-01-16'union all select date'2010-01-07',date'2010-01-12'union   all select date'2010-01-08',date'2010-01-11'")

3

=A2.(periods(start,end))

4

=A3.conj()

5

=A4.icount()

  A3: Generate a sequence of dates for each time period in A2 from start to end.
  A4: Concatenate sequences in A3.
  A5: Count the number of different dates in A4.
  undefined

  Save the SPL script as SumSet.dfx (for integration into the Java application).

Difference

  Example1: Find countries where both English-speaking population and French-speaking population exceed 5% of the whole population.

MySQL8:

with t1(lang) as (select 'English' union all select 'French')
    select name from world.country c
    where not exists(select * from t1 where lang not in (select language from world.countrylanguage 
        where percentage>=5 and countrycode=c.code
    )
);

  Note: The SQL just shows how it gets empty difference using two not operators.

SPL script:


A

1

=connect("mysql")

2

=A1.query("select CountryCode,Name,Language,Percentage   from world.countrylanguage cl join world.country c on cl.countrycode=c.code   where percentage>5")

3

=A2.group(CountryCode)

4

=A3.select(["English","French"]\~.(Language)==[])  

5

=A4.new(~.Name:name)

  A4: Find groups that the difference between them and [“English”,”French”] is empty, that is, getting groups where languages include English and French.
  undefined

  Save the SPL script as DifferenceSet.dfx (for integration into the Java application).

Intersection

  Example 1: Get codes of countries where the English-speaking population, French-speaking population and Spanish-speaking population respectively exceed 0.3%, 0.2% and 0.1%.

MySQL8:

with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3),
     t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2),
     t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)
select countrycode
from t1 join t2 using(countrycode) join t3 using(countrycode);

  Note: The above SQL is just an example of calculating intersection of sets.

SPL script:


A

1

=connect("mysql")

2

[English,French,Spanish]

3

[0.3,0.2,0.1]

4

=A2.(A1.query@i("select   countrycode from world.countrylanguage where language=? and percentage>?",~,A3(#)))  

5

>A1.close()

6

=A4.isect()

  A4: Respectively get codes of countries where the English-speaking population exceeds 0.3%, where French-speaking population is over 0.2% and where Spanish-speaking population takes up over 0.1% and return each as a sequence.
  A6: Calculate intersection of A4’s sequences.
  undefined

  Save the SPL script as IntersectionSet.dfx (for integration into the Java application).

Java invocation

  It’s convenient to embed an SPL script into a Java application. Java will load the script via esProc JDBC in the manner of calling a stored procedure. Take the SumSet.dfx as an example:

    ...
    Connection con = null;
    Class.forName("com.esproc.jdbc.InternalDriver");
    con= DriverManager.getConnection("jdbc:esproc:local://");
    // Call stored procedure, where SumSet is the dfx file name
    st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call SumSet()");
    //Execute stored procedure
    st.execute();
    //Get result set
    ResultSet rs = st.getResultSet();
    ...

  Just change the file name to call another dfx file. See How to Call an SPL Script in Java to learn details. esProc also supports the ODBC style integration through same process.

Advanced uses

Retrieve data by sequence numbers of records

  Example 1: Get transaction information on the 3rd day and the third-to-last day in 2017 for stock 600036.

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')
    select tdate,open,close,volume from t where rn=3
union all
    select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select *   from stktrade where sid='600036'and tdate between'2017-01-01'and  '2017-12-31'order by tdate")

3

=A2(3)|A2.m(-3)

  A3: Calculate concatenation of the 3rd record and the third-to-last record.
  undefined


  Example 2: Calculate average closing price in the past 20 days for stock 600036.

MySQL8:

with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')
select avg(close) avg20 from t where rn<=20;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select *   from stktrade where sid='600036'order by tdate")

3

=A2.m(-20:)

4

=A3.avg(close)

  A2: Sort transaction records of stock 600036 by transaction dates.
  A3: Get records from the 20th-to-last to the last.
  A4: Calculate average closing price over A3’s records.
  undefined

Find sequence number of records meeting the specified condition

  Example1: Find how many days stock 600036 uses to reach the closing price of 25 CNY.

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')
select min(rn) from t where close>=25;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select * from   stktrade where sid='600036'and tdate between'2017-01-01'and'2017-12-31'order   by tdate")

3

=A2.pselect(close>=25)

  A3: Find the position of the record where the closing price reaches 25 CNY for the first time from back to front.
  undefined


  Example 2: Calculate the growth rate in 2017 for stock 000651 (Take delisting into consideration).

MySQL8:

with t as (select * from stktrade where sid='000651'),
     t1(d) as (select max(tdate) from t where tdate<'2017-01-01'),
     t2(d) as (select max(tdate) from t where tdate<'2018-01-01')
select s2.close/s1.close-1 rise
from (select * from t,t1 where tdate=d) s1,
     (select * from t,t2 where tdate=d) s2;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select * from   stktrade where sid='000651'and tdate<'2018-01-01'order by tdate")

3

=A2.pselect@z(tdate <   date("2017-01-01"))

4

=A2(A3).close

5

=A2.m(-1).close

6

=A5/A4-1

  A2: Sort records by transaction dates in descending order.
  A3: Find the sequence number of the record on the last transaction date before 2017-01-01.
  A4: Calculate the last closing price in 2016.
  A5: Calculate the last closing price in 2017. A2.m(-1) gets the last record, that is, the record on the last transaction date in 2017.
  undefined


  Example 3: Get transaction records where the volume is greater than 25 million in 2017 and their daily growth rate for stock 300469.

MySQL8:

with t as (select *, row_number() over(order by tdate) rn
    from stktrade where sid='300469' and tdate<=date '2017-12-31'),
t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000)
    select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise
    from t1 join t on t1.rn=t.rn+1;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select *   from stktrade where sid='300469' and tdate<= date'2017-12-31'order   by tdate")

3

=A2.pselect@a(tdate>=date("2017-01-01")   && volume>2500000)

4

=A3.new(A2(~).tdate:tdate,   A2(~).close:close, A2(~).volume:volume, A2(~).close/A2(~-1).close-1:rise)

  A3: Find sequence numbers of records where the transaction volume is greater than 25 million in 2017.
  A4: Calculate date, closing price, transaction volume and growth rate according to sequence numbers.
  undefined

Get sequence number of the record holding the max/min value

  Example 1: Find how many transaction dates between the first lowest price and the first highest price in 2017 for stock 600036.

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),
     t1 as (select * from t where close=(select min(close) from t)),
     t2 as (select * from t where close=(select max(close) from t))
select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) inteval
from t1,t2;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select * from   stktrade where sid='600036'and tdate between'2017-01-01'and'2017-12-31'order   by tdate")

3

=A2.pmax(close)

4

=A2.pmin(close)

5

=abs(A3-A4)

  A3: Find the sequence number of the highest closing price from back to front.
  A4: Find the sequence number of the lowest closing price from front to back.
  undefined


  Example 2: Find how many days between the last lowest closing price and the last highest closing prices in 2017 for stock 600036.

MySQL8:

with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'),
     t1 as (select * from t where close=(select min(close) from t)),
     t2 as (select * from t where close=(select max(close) from t))
select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) inteval
from t1,t2;

SPL script:


A

1

=connect("mysql")

2

=A1.query@x("select *   from stktrade where sid='600036'and tdate between'2017-01-01'and'2017-12-31' order   by tdate")

3

=A2.pmax@z(close)

4

=A2.pmin@z(close)

5

=abs(A3-A4)

  A3: Find the sequence number of the highest closing price from back to front.
  A4: Find the sequence number of the lowest closing price from back to front.
  undefined

Alignment between ordered sets

  Example 1: Calculate daily relative return of stock 399006 contrasted to stock 399001 between Mar. 3 -6, 2018.

MySQL8:

with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'),
     t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')
select t1.rise-t2.rise
from t1 join t2 using(tdate)
where t1.rise is not null;

SPL script:


A

1

=connect("mysql")

2

=["399006","399001"].(A1.query("select   * from stktrade where sid=? and tdate between'2018-03-05'and'2018-03-08'",~))  

3

>A1.close()

4

=A2.(~.calc(to(2,4),close/close[-1]))  

5

=A4(1)--A4(2)

  A2: Get transaction records between Mar. 5 -8, 2018 for stock 399006 and stock 399001.
  A4: Calculate daily growth rate from the 2nd record to the 4th record in A2’s two table sequences respectively, that is, the daily growth rate in Mar. 5 -8, 2018 for stock 399006 and stock 399001.
  A5: Perform alignment subtraction to get the daily relative return.
  undefined

SPL strengths

  • A universal tool

  Data computing in Java involves a mountain of work that generates unreusable, lengthy code. And most of the time the source data even doesn’t come from a database. SPL equalizes all types of source to process data in a uniform and convenient way.

  • Free basic features

  Raqsoft offers esProc trial license every month for users to perform temporary and ad hoc data analytics for free. There is also esProc Basic Edition that can be integrated with a Java application and deployed on servers for permanent use.

  • Solid technical support

  You can find solutions to general problems in our Documentation. For users with esProc Basic Edition, you can ask any questions in Raqsoft Community, where we offer free technical supports.