How to Split a Large Excel File into Smaller Files


Problem description

There’s an orders file orders.xlsx that has over a million rows. Below is part of the file:

undefined

Task: split the file into smaller files according to two methods:

1. Split by category, such as storing orders data of each state as a file.

2. Split by row, such as storing every ten thousand rows as a file.

 

Implementation directions:

1.       Start esProc

Download esProc installation package and free DSK edition license HERE. You will be prompted to load the license file when you run esProc for the first time.

2. Write a script:

(1) Split by category

undefined

It’s more convenient to view the code in this way:

A

B

C

D

1

>dir="E:/orders/"

2

=file(dir+"orders.xlsx").xlsimport@tc()

3

for A2,50000

=A3.group(state)

4

for B3

=file(dir+B4(1).state+".xlsx")

5

if C4.exists()

=C4.xlsexport@a(B4)

6

else

=C4.xlsexport@t(B4)

A1 Define the directory that will store the future files.

A2 Use @c option to open orders.xlsx with cursor since it’s probably the file cannot fit into the memory. @t option enables importing the first row as column headers.

A3 Retrieve rows from A2’s cursor, with 50000 rows at a time.

B3 Group the current batch of rows by state.

B4 Handle each group (rows of state) circularly.

C4 Define files for storing the groups and name them after states.

C5D5 If a namesake file already exists, use @a option to append the current batch of rows to it.

C6D6 If there isn’t a namesake file, use @t option to write the current batch of rows to it, with the first row being the column headers.

2Split by row

esProc script:

A

B

1

>dir="E:/orders/"

2

=file(dir+"orders.xlsx").xlsimport@tc()

3

for A2,100000

=file(dir+"orders_"/#A3/".xlsx").xlsexport@t(A3)

A1 Define the directory that will store the future files.

A2 Use @c option to open orders.xlsx with cursor since it’s probably the file cannot fit into the memory. @t option enables importing the first row as column headers.

A3 Retrieve rows from A2’s cursor, with 100000 rows at a time.

B3 Store the current batch of rows into a new file. #A3 represents the loop number in A3.

 

 

3. Press F9 to run the program. Then you can check the expected smaller files in E:/orders directory.

 

 

Reference split.zip