"Excel displays poor performance in handling huge xls files. Usually we load an Excel file to the .."

blackduckie RaqForum 28 No.
1 Reply • 417 View • 3 Years ago

How to Handle Huge XLS Files

sql(2) excel(115)

Excel displays poor performance in handling huge xls files. Usually we load an Excel file to the database and handle it using the database’s computing ability. Sometimes, however, the data can’t be wholly loaded to the database due to specific reasons. It would be great there was an application that can directly deal with massive xls files.

Take the employee information file emp.xls (Below is part of the source data):

EID	NAME	SURNAME	GENDER	STATE	BIRTHDAY	HIREDATE	DEPT	SALARY
1	Rebecca	Moore	F	California	1974-11-20	2005-03-11	R&D	7000
2	Ashley	Wilson	F	New York	1980-07-19	2008-03-16	Finance	11000
3	Rachel	Johnson	F	New Mexico	1970-12-17	2010-12-01	Sales	9000
4	Emily	Smith	F	Texas	1985-03-07	2006-08-15	HR	7000

And the state information file states.xls as an example (Below is part of the data):

STATEID	NAME	POPULATION	ABBR	AREA	CAPITAL	REGIONID
1	Alabama	4779736	AL	52419	Montgomery	6
2	Alaska	710231	AK	663267	Juneau	9
3	Arizona	6392017	AZ	113998	Phoenix	8
4	Arkansas	2915918	AR	52897	Little Rock	7

Task: Join the two tables through emp’s STATE column and states’ NAME column and get records where SALARY is above 5000 and POPULATION is below 5 million.

It’s easy to do this with esProc.
You can download esProc installation package and free DSK edition license HERE.

1. Get records of states where POPULATION is below one million:

Write script wherexls.dfx in esProc:

	A
1	$select * from states.xls where POPULATION<1000000

A1 gets records of states where POPULATION is below one million using simple SQL.

Below is A1’s result after execution:

undefined

2. Group emp records by genders and count employees in each group:

Write script groupxls.dfx in esProc:

	A
1	$select GENDER,count(*) as count from emp.xls group by GENDER

A1 groups emp by gender and count employees in each group using simple SQL.

Below is A1’s result after execution:

undefined

3. Join emp table and states table through emp.STATE and states.NAME and select records where state population is below 5 million and employee salary is above 5000:

Write script joinxls.dfx in esProc:

	A
1	$select * from emp.xls b join states.xls a on a.NAME=b.STATE where a.POPULATION<5000000 and b.SALARY >5000

A1 performs join filtering over two tables using simple SQL.

Below is A1’s result after execution:

undefined

We can convert an xls file into a bin file. The bin file is esProc’s built-in binary file format. The format uses simple compression mechanism to store same size of data in smaller space and thus enables less time in reading it.

Take orders file orders.xls as an example. To convert it to a bin file, we use the following script: