* Remove Duplicate Rows in an Excel File

In an Excel file, rows having equal column values are duplicate rows. Usually we need to delete the extra row(s) and retain one row only. How?

If there are dozens of rows, then it’s OK if you choose to delete them manually. But even Excel VBA will probably be stuck if there are mountains of data.

It’s feasible to write a program to do it. First you read the Excel file as a data set object, compare each row with the next using a nested loop, delete the duplicates and output the data to a new Excel file. The coding work is heavy. Moreover, next time when you handle a similar scenario with different Excel format that compares value in different columns, the existing program becomes useless.

There would be no more tiresome coding and wasted program if you used esProc SPL to remove the duplicate(s). It will be a simple 3-line script.

For example, below is an Excel file:

..

To remove duplicates by Col1, Col2 and Col3, SPL produces a script as follows:


A

1

=file("test.xlsx").xlsimport@t()

2

=A1.group@1(Col1,Col2,Col3)

3

=file("test1.xlsx").xlsexport@t(A2)

 

It’s also convenient to handle various Excel data query scenarios with SPL:

1. Query Excel files of all formats, such as free format spreadsheet, crosstab, main table & subtable.

2. Export and append Excel data to another Excel file to output an Excel file with a sophisticated format.

3. Read and write an Excel file containing a huge volume of data.

4. Combine and split an Excel file.

More information can be found in Parsing and Exporting Excel Data in SPL.

You can handle other post-reading operations, such as sorting, grouping and summarizing, easily with SPL too. Read SPL Grouping to find more examples.

esProc is the file processor that can conveniently handle data loading, database export and mixed computations over various types of files, including TXT, Excel, XML, JSON, CSV and INI. The desktop tool is ready to use, simple to configure and convenient to debug. It allows setting a breakpoint and step-by-step execution during which you can view the result of each step. Based on powerful yet simple syntax that agrees with human way of thinking, esProc is more convenient to use compared with high-level languages. See Data File Processor.

SPL is integration-friendly with a Java program. Read How to Call an SPL Script in Java to learn more.

To begin to work with esProc, see Getting Started with esProc.