How to Write Simple & Powerful Script Data Sources for BIRT Reports

 

1. Preface: JVM-based SQL functions and stored procedures

  Some databases, such as MySQL, don’t have analytic functions. Some others, such as Vertica, don’t support stored procedures. They turn to external Python or R script, or other languages, to deal with complicated data computations. But the scripting languages and Java, the mainstream programming language, are integration-unfriendly. Often, a lengthy Java script that tries to replace SQL functions or stored procedures aims at achieving a certain computing goal, and is unreusable.

  It’s not easy to implement complicated logics even with analytic functions. Here’s a common computing task: Find the first N customers whose sales accounts for half of the total sum and sort them by amount in descending order. Oracle implements it this way:

with A as
  (selectCUSTOM,SALESAMOUNT,row_number() over (order by SALESAMOUNT) RANKING
  from SALES)
  select CUSTOM,SALESAMOUNT
  from (select CUSTOM,SALESAMOUNT,sum(SALESAMOUNT) over (order by RANKING) AccumulativeAmount
  from A)
  where AccumulativeAmount>(select sum(SALESAMOUNT)/2 from SALES)
  order by SALESAMOUNT desc

  The Oracle script sorts records by sales amount in ascending order, and then finds the customers whose sales amount to half of the total sum in an opposite direction according to the condition that the accumulated amount is greater than half of the total sum. In order to avoid window function’s mistake in handling same sales amounts when calculating the accumulated value, we calculate the sales amounts rankings in the first subquery.

  esProc script:

A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales").sort(SALESAMOUNT:-1) /Get the sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result

  Instead of the complicated nested SQL plus window function, esProc uses concise syntax to implement the computing logic. Being applicable to all databases (data sources), the code is more universal.

  esProc is driven by a JVM-based scripting language intended to handle structured data. As SQL functions and stored procedures, it can be integrated with a Java application to create migratable, versatile and database-independent computing logics. Such a computing logic run as a middle layer is separated from the data logic run in the database (data source) layer. The separation makes the overall application more scalable, more flexible and more maintainable.

2. Application scenario: Report data preparation

2.1 Reporting architecture

001png

  An esProc script embedded into the reporting layer is like a local logical database that doesn’t need deploying a server specifically. It stands as a data preparation layer between the reporting tool and data source for performing various complicated computations.

2.2 Integration

  Let’s look at how to integrate esProc as the data preparation layer (take Vertica and BIRT as the example).

I. Integration of basic jars

  esProc JDBC has three basic jars, which are situated in [installation directory]\esProc\lib :

esproc-bin-xxxx.jar                 esProc computing engine and JDBC driver
jdom-1.1.3.jar               Parse configuration files
icu4j-60.3.jar        Handle internationalization

  Besides, there are jars for achieving specific functionalities. To use databases as the data sources in esProc JDBC, their driver jars are required. As Vertica is the data source here, the corresponding jars are needed (Take Vertica 9.1.0 as an example).

  vertica-jdbc-9.1.0-0.jar   Download it from Vertica website

  Those jars should be copied and placed under BIRT’s [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.

II. Deploy the configuration file

  The configuration file, raqsoftConfig.xml, contains script file path, data source connection configuration information, and etc.

  It is located in [esProc installation directory]\esProc\config, and needs to be copied and placed under BIRT designer class path [installation directory]\plugins\org.eclipse.birt.report.data.oda.jdbc_4.6.0.v20160607212.

  The file’s name must not be changed.

2.3 BIRT development environment

  1. Copy all the required jars under BIRT’s WEB-INF\lib;

  2. Copy raqsoftConfig.xml under BIRT’s WEB-INF\classes.

2.3.1 Example 1: Normal call

  ♦1. Below is Sales table in Vertica database. (The table contains data of the years 2013, 2014 and 2015, and queried via vsql)
002png

  ♦2. Create an esProc script

  (1) Put Vertica JDBC driver jars into esProc designer path
  Download JDBC driver jar (vertica-jdbc-9.1.0-0.jar, for instance) from Vertica website, and put it under 【esProc installation directory】\common\jdbc.

  (2) Add Vertica data source
  Open esProc designer, click Tool -> Datasource to add the Vertica data source in JDBC way.
003png

  Click OK to save the configuration and then Connect to connect to the data source.
004png

  The data source is successfully connected once the data source name turns pink.

  (3) Create an algorithm script (saved as VerticaExternalProcedures.dfx) through File – >New.

A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales").sort(SALESAMOUNT:-1) /Get the sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result to BIRT as the report source data set

  ♦3. Deploy the script

  Put the script file under the script file main directory configured in raqsoftConfig.xml.
birtpng

  ♦4. Configure data source connection: verticaLink, in raqsoftConfig.xml

  <DB name="verticaLink">
    <property name="url" value="jdbc:vertica://192.168.10.10:5433/ForEsprocTestDB"/>
    <property name="driver" value="com.vertica.jdbc.Driver"/>
    <property name="type" value="0"/>
    <property name="user" value="dbadmin"/>
    <property name="password" value="runqian"/>
    <property name="batchSize" value="0"/>
    <property name="autoConnect" value="false"/>
    <property name="useSchema" value="false"/>
    <property name="addTilde" value="false"/>
    <property name="needTransContent" value="false"/>
    <property name="needTransSentence" value="false"/>
    <property name="caseSentence" value="false"/>
  </DB>

  ♦5. Create a new report BIRT report designer and add esProc data source: esProcConnection.
006png

  The Driver class is com.esproc.jdbc.InternalDriver(v1.0), which needs esproc-bin-xxxx.jar and other jars. Database URL is jdbc:esproc:local://

  ♦6. BIRT calls esProc data set (Vertica’s external stored procedure)

  Create a new data set; select the esProc data source (esProcConnection); the data set type is SQL Stored Procedure Query.
007png

  Next, enter {call VerticaExternalProcedures()} under Query Text. VerticaExternalProcedures is esProc script file name.
008png

  Now we can preview the computing result with Preview Results.
009png

  That’s the process of how to use esProc script as Vertica’s external stored procedure to prepare data source for a report.

  ♦7. Web presentation

  Take a grid report as an example. Below is the report design:
010png

  Publish preview:
011png

2.3.2 Example 2: Parameter-based call

  We change the above computing task a bit. Find the first N customers whose sales accounts for half of the total sum by year and sort them by amount in descending order. The task requires a parameter filtering.

  ♦1. Add a year parameter for filtering.

  Open esProc designer, and click Program –> Parameter –> Add to add parameter qyear (the name can be different from a report parameter).
012png

  Modified script:

A B
1 =connect("verticaLink") /Connect to Vertica database
2 =A1.query("select * from sales where year(subscriptiondate)=?",qyear).sort(SALESAMOUNT:-1) /qyear is the parameter receiving a typed year to find the corresponding sales records and sort them by sales amount in descending order
3 =A2.cumulate(SALESAMOUNT) /Calculate a sequence of accumulated values; the function is a replacement of database window function
4 =A3.m(-1)/2 /Calculate half of total sales amount
5 =A3.pselect(~>=A4) /Find the position in the accumulated value sequence where half of total sales amount falls
6 =A2(to(A5)) /Get the record where half of total sales amount falls and records before it
7 >A1.close() /Close database connection
8 return A6 /Return A6’s result to BIRT as the report source data set

  A2 performs conditional filtering.

  ♦2. Define a year parameter for the report

  Define an input parameter named qyear for the report.
  Open the report, click Data Explorer –> Report Parameter –> New parameter to add the parameter.
013png

  The second red box is the default value of parameter qyear.

  ♦3. Add a data set parameter and link it with the report parameter

  Create data set VerticaExternalProcedures.
014png

  There is a bit different about the Query Text, which is {call VerticaExternalProcedures(?)}. The question mark (?) is a placeholder for an input year parameter. Under Parameters, add data set parameter qyear and link it with report parameter qyear.
015png

  Under Preview Results, query data of the year 2013 according to the default value of qyear.
016png

  After passing the value “2015” to the parameter:
017png

  ♦4. Web presentation
018png

  Query data of the year 2015:
019png

  After modifying the URL or passing “2013” to qyear:
020png