SQL-style Operations on Excel Files

 

Excel files are widely used. We can use Apache Poi, an open-source API package, to read data in Excel cells. Powerful as it is, the package has only one single skill of reading data by writing a program. That is not a light job, and for different formats, you need to write different programs. Another inconvenience is that usually you need to load the data into the database for performing SQL-style structured computations. SPL (Structured Process Language) is convenient for handling Excel data by contrast, and can be easily integrated with the Java application.

 

1.  Data reading

SPL has the file function to open an Excel file, and the xlsimport function to read data in the file as a table sequence for further computations. Here are some examples:

1 Read simple rowwise files

In a rowwise Excel file, each row is a record and all rows have same columns. The file is similar to a database table.

undefined 

=file("e:/scores.xlsx").xlsimport@t()

The parameter in file function can be the target file’s absolute path or relative path (which is relative to the mainPath configured in SPL configuration file). The xlsimport function, by default, reads all rows and columns in the file’s first Sheet to generate a table sequence. @t option enables reading the first row as headers, and column values in this row are column names in the table sequence. The absence of @t option means no header row, and columns will be referenced using their ordinals, such as #1 and #2.

 

2Read part of the Excel file

=file("e:/scores.xlsx").xlsimport@t(No,Name,Class,Maths)

The above statement specifies column name parameters to read data from No, Name, Class and Maths columns only.

 

=file("e:/scores.xlsx").xlsimport@t(;,1:100)

The statement reads all columns from row 1 to row 100 in the file’s first Sheet.

 

=file("e:/scores.xlsx").xlsimport@t(;,1:-10)

The statement reads all columns from the first row to the 10th from the last row in the file’s first Sheet. A negative number represents a row ordinal counted backwards.

 

3Read a specified Sheet

=file("e:/scores.xlsx").xlsimport@t(;"School2")

The statement reads all columns in a Sheet named School2.

 

4Read Sheet information

As the above Excel file has more than one Sheets, SPL xlsopen function opens the file and generates an Excel object, which is a table sequence containing information of the Sheets.

=file("e:/scores.xlsx").xlsopen()

Below is data in the Excel object:

undefined 

Each row records information of a Sheet. The column names (stname, nrows and ncols) represents sheet name, number of rows and number of columns respectively.

 

Read Reading and Writing Excel Files in SPL to learn more about data retrieval.

 

2.   Operations

The retrieved data is stored as a table sequence in the memory, on which any operations, such as filtering, aggregate, grouping & summarization, sorting, joins, distinct, and getting computed column, can be performed. Based on the above student scores table, we are trying to perform some operations:


A

Description

1

=file("e:/scores.xlsx").xlsimport@t()

Import data as a table sequence; @t option reads the first row as column headers

2

=A1.select(Sex=="M")

Get records of male students

3

=A1.groups(CLASS;min(English),max(Chinese),sum(Maths))

Calculate the lowest English score, the highest Chinese score, and the total math score in each class

4

=A1.sort(CLASS,-Maths)

Sort records by class number in ascending order and then by math score in descending order

 

Read General SQL-style Operations in SPL to learn how to perform more operations in SPL.

 

3.   Returning result set to Java

The SPL script can be called by a Java program through JDBC using the following code:

Class.forName("com.esproc.jdbc.InternalDriver");

Connection con = DriverManager.getConnection("jdbc:esproc:local://");

Statement st = con.createStatement ();

ResultSet rs = st.executeQuery("=file(\"e:/scores.xlsx\").xlsimport@t().select(Sex==\"M\")");

 Read How to Call an SPL Script in Java to learn more.

 

4.   Using SQL

We can also use SQL directly on an Excel file, for instance:


A

1

$select * from E:/scores.xlsx where Sex="M"

2

$select min(English),max(Chinese),sum(Maths) from E:/scores.xlsx group by CLASS

3

$select * from E:/scores.xlsx order by CLASS, Maths desc

The file must have column headers for the use of SQL.

 

We can also execute the above SQL statement directly through JDBC in Java and return the result.

ResultSet rs = st.executeQuery("select * from E:/scores.xlsx where Sex=\"M\"");

 

Read Examples of SQL File Queries to learn more related information.

 

5.   Big file handling

SPL handles big Excel files that cannot fit into the memory with the cursor. The way is similar to that of handling smaller Excel files. We just need to use @c option in xlsimport function to return a cursor. SPL also supports attaching operations on a cursor.

Here we still use the example in section 2:


A

Description

1

=file("E:/scores.xlsx").xlsimport@tc()

Import data as a cursor; @t option reads the first row as column headers

2

=A1.select(Sex=="M")

Get records of male students

3

=A1.groups(CLASS;min(English),max(Chinese),sum(Maths))

Group records by class and perform aggregates and use groups function to return a table sequence if the result set is not so large

 

Read Reading and Writing Excel Files in SPL to learn more about data retrieval.

Read General SQL-style Operations on Cursors in SPL to learn more about performing operations on large text files.

 

The final cursor result of a SPL script can be returned to a JAVA program for invocation. The method of calling it is the same as that calling one via JDBC interface.

Like handling small files, we can use SQL on large Excel files.