Implement grouping easily in Java with embeded SPL

 

Introduction to problems

  It’s tedious to implement grouping and aggregation like groupBy in SQL in Java codes. Usually we first need to indicate the data structure (Java entity class), then iterate through the Java set, and finally add it to a subset according to grouping conditions. It’s simpler in Java 8 with Lambda(stream),but aggregation is usually needed after grouping, and we still need to write the aggregation functions like sum (), count (*), topN ()seperately. These are still the most common grouping and aggregation operations, in case of alignment grouping, enumeration grouping, multiple grouping and other unconventional grouping plus other aggregation functions (FIRST, LAST…), the code becomes very lengthy and not universal. If only there could be a middleware specially designed for this kind of calculation, using a similar SQL script to describe the algorithm, calling the script directly in Java and returning the result set, that will be perfect. EsProc and SPL script are just the product.Now we’ll illustrate how to use it by some examples.

SPL Implementation

Conventional grouping

  Everyone’s overtime record is kept in the duty.xlsx file:

workday name
2016-02-05 Ashley
2016-02-08 Ashley
2016-02-09 Ashley
2016-02-10 Johnson
2016-02-11 Johnson
2016-02-12 Johnson
2016-02-15 Ashley
2016-02-16 Ashley
2016-02-17 Ashley

  Summarize the number of days on duty for each person:

A
1 =file("/Users/test/duty.xlsx").importxls@tx()
2 =A1.groups(name;count(name):count)

  001jpg

  Save the script file CountName.dfx(will be used when embeded in Java).

TopN of each group

  Take the overtime records of each month, each person and the first three days.

A
1 =file("/Users/test/duty.xlsx").importxls@tx()
2 =A1.group(month(workday):mon,name;~.top(3):top3)

  002jpg

  Save the script file RecMonTop3.dfx(will be used when embeded in Java).

Java Incocation

  SPL is very convenient to be embedded in Java applications. It can be loaded by calling stored procedure method through JDBC. We’ll illustrate the invocation by the example CountName.dfx.

    ... 
    Connection con = null; 
    Class.forName("com.esproc.jdbc.InternalDriver"); 
    con= DriverManager.getConnection("jdbc:esproc:local://"); 
    //Call stored procedure,CountName is the dfx file name 
    st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call CountName()"); 
    //Execute stored procedure 
    st.execute(); 
    //Get the result set 
    ResultSet rs = st.getResultSet(); 
    ...

  It’s the same with RecMonTop3.dfx, and we only need to call RecMonTop3.dfx instead. We can also return two result sets at the same time. Only partial Java codes are used here to explain roughly how to embed SPL. For detailed steps, please refer to How to Call an SPL Script in Java. SPL also supports ODBC drivers and can be integrated into ODBC-enabled languages. The embedding process is similar.

Some examples

  In fact, there are many kinds of data grouping, such as alignment grouping, enumeration grouping and multiple grouping. Here we give some examples.

SPL alignment grouping

  Example 1:List in order the number of countries using Chinese, English and French as official languages.

  MySQL8:
    with t(name,ord) as (select 'Chinese',1
    union all select 'English',2
    union all select 'French',3)
    select t.name, count(countrycode) cnt
    from t left join world.countrylanguage s on t.name=s.language
    where s.isofficial='T'
    group by name,ord
    order by ord;

  Note: Character set of the table and database should be consistent.

  (1) show variables like ‘character_set_connection’. View the current session character set
  (2) show create table world.countrylanguage. View the character set of the table
  (3) set character_set_connection=[Character set]Update the current session character set

  esProc SPL:

A
1 =connect("mysql")
2 =A1.query@x("select * from world.countrylanguage where isofficial='T'")
3 Chinese,English,French]
4 =A2.align@a(A3,Language)
5 =A4.new(A3(#):name, ~.len():cnt)

  A1: Connect the database
  A2: Query the records of all official languages
  A3: the specified languages
  A4: Align all records to the relative position in A3 by Language
  A5: Create a table with the fields: language, the number of countries using the language as official languages
  003jpg

  Example 2:List in order the number of countries using Chinese, English, French and other languages as official languages.

  MySQL8:
    with t(name,ord) as (select 'Chinese',1 union all select 'English',2
        union all select 'French',3 union all select 'Other', 4),
            s(name, cnt) as (
                select language, count(countrycode) cnt
                from world.countrylanguage s
                where s.isofficial='T' and language in ('Chinese','English','French')
                group by language
        union all
            select 'Other', count(distinct countrycode) cnt
            from world.countrylanguage s
            where isofficial='T' and language not in ('Chinese','English','French')
        )

    select t.name, s.cnt
    from t left join s using (name)
    order by t.ord;

  esProc SPL:

A
1 =connect("mysql")
2 =A1.query@x("select * from world.countrylanguage where isofficial='T'")
3 [Chinese,English,French,Other]
4 =A2.align@an(A3.to(3),Language)
5 =A4.new(A3(#):name, if(#<=3,~.len(), ~.icount(CountryCode)):cnt)

  A4: Align all records to the relative position in A3.to(3) by Language, and add a group to store non-aligned records.
  A5: Calculate the number of countries in A4
  004jpg

SPL Enumeration grouping

  Example 1:List in order the number of different types of cities.

  MySQL8:
    with t as (select * from world.city where CountryCode='CHN'),
    segment(class,start,end) as (select 'tiny', 0, 200000
        union all select 'small',?? 200000, 1000000
        union all select 'medium', 1000000, 2000000
        union all select 'big', 2000000, 100000000
    )

    select class, count(1) cnt
    from segment s join t on t.population>=s.start and t.population<s.end
    group by class, start
    order by start;

  esProc SPL:

A
1 =connect("mysql")
2 =A1.query@x("select * from world.city where CountryCode='CHN'")
3 =${string([20,100,200,10000].(~*10000).("?<"/~))}
4 [tiny,small,medium,big]
5 =A2.enum(A3,Population)
6 =A5.new(A4(#):class, ~.len():cnt)

  A3: ${…}Macro replacement.The result of the expression in brackets is used as the new expression to calculate. The result is the sequence [“?<200000”,“?<1000000”,“?<2000000”,“?<100000000”].
  A5: For each record in A2, find the first condition in A3 and add it to the corresponding group.
  005jpg

  Example 2:List the number of large cities in East China, the number of large cities in other areas, and the number of Non-large cities.

  MySQL8:
    with t as (select * from world.city where CountryCode='CHN')
        select 'East&Big' class, count(*) cnt
        from t
        where population>=2000000
            and district in ('Shanghai','Jiangshu', 'Shandong','Zhejiang','Anhui','Jiangxi')
    union all
        select 'Other&Big', count(*)
        from t
        where population>=2000000
            and district not in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
    union all
        select 'Not Big', count(*)
        from t
        where population<2000000;

  esProc SPL:

A
1 =connect("mysql")
2 =A1.query@x("select * from world.city where CountryCode='CHN'")
3 [Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4 [?(1)>=2000000 && A3.contain(?(2)), ?(1)>=2000000 && !A3.contain(?(2))]
5 [East&Big,Other&Big, Not Big]
6 =A2.enum@n(A4, [Population,District])
7 =A6.new(A5(#):class, A6(#).len():cnt)

  A5: enum@n stores records that do not meet all the conditions in A4 in the last additional group.
  006jpg

  Example 3:List the number of large cities in all regions, the number of large cities in East China and the number of Non-large cities.

  MySQL8:
    with t as (select * from world.city where CountryCode='CHN')
        select 'Big' class, count(*) cnt
        from t
        where population>=2000000
    union all
        select 'East&Big' class, count(*) cnt
        from t
        where population>=2000000
        and district in ('Shanghai','Jiangshu','Shandong','Zhejiang','Anhui','Jiangxi')
    union all
        select 'Not Big' class, count(*) cnt
        from t
        where population<2000000;

  esProc SPL:

A
1 =connect("mysql")
2 =A1.query@x("select * from world.city where CountryCode='CHN'")
3 [Shanghai,Jiangshu, Shandong,Zhejiang,Anhui,Jiangxi]
4 [?(1)>=2000000, ?(1)>=2000000 && A3.contain(?(2))]
5 [Big, East&Big, Not Big]
6 =A2.enum@rn(A4, [Population,District])
7 =A6.new(A5(#):class, A6(#).len():cnt)

  A6: If the records in A2 satisfy more than one condition in A4, enum@r adds them to the corresponding group respectively.
  007jpg

Advantage summary

  • Use SQL if database is available, otherwise use SPL
      It is cumbersome to implement grouping and aggregation by Java programing, the code is long and not reusable. Often the data is not stored in database. With SPL, it’s as convenient as using SQL in Java.

  • The free esProc Community Edition has been released. To use it permanently, users can deploy it on the server and integrate it with Java.

  • Technical documentation and community support
      Officially provided esProc technology documentation has many ready-made examples, and solutions to conventional problems can be found in the documentation. If you’ve got the introductory version, you can not only use the regular functions of SPL, but also go to the Raqsoft Community to consult if you encounter any problems. Raqsoft provides free technical support to the users of the introductory version through the community.