Extract repeated cell areas in Excel

In an Excel file, a piece of data is recorded in an area, and we call this area a fragment.

Repeated cell fragment means that there are many data fragments in the file, all of which are exactly the same in format and size and reappear regularly. To extract this kind of data records, it is necessary to find the rule of repeated occurrences of the area and loop each area according to the rule. Inside the area, it is necessary to find the rule of the data to be extracted in the area, calculate the cell name or its row and column numbers accurately, and then use the xlscell function in the SPL to read the cell values.

 

Example

Part of the data in the employee information table employee.xlsx is shown in the following figure, in which each employee information fragment occupies nine rows and seven columns, and it repeats vertically.

..

Now we need to extract the ID, Name, Sex, Position, Birthday, Phone, Address, PostCode, and other information of each employee to form a data set.

 

The SPL script is:


A

B

C

1

=create(ID,Name,Sex,Position,Birthday,Phone,Address,PostCode)    

2

=file("E:/work/employee.xlsx").xlsopen()

3

[C,C,F,C,C,D,C,C]

[1,2,2,3,4,5,7,8]


4

for

=A3.(~/B3(#)).(A2.xlscell(~))


5


if   B4(1)==""

break

6


>A1.record(B4)


7


>B3=B3.(~+9)


A1 Create a table sequence with column names of ID, Name, Sex, Position, Birthday, Phone, Address, PostCode to store the extracted data.

A2 Open the employee.xlsx file as an Excel object.

A3 Define the column number sequence of the cell where the employee information is located.

B3 Define the row number sequence of the cell where the employee information is located.

A4 Use the for loop to read the employee information of each fragment.

B4 First use A3.(~/B3(#)) to calculate the position sequence of current employee information cells, and then read out these cell values to form the employee information sequence. It is [C1,C2,F2,C3,C4,D5,C7,C8] in the first loop, and [C10,C11,F11,C12,C13,D14,C16,C17] in the second loop, ..., The row number is increased by 9 after each loop.

B5 Identify whether the employee ID value is null. If so, exit the loop and end the operation.

B6 Append a piece of employee information to the end of the A1 table sequence.

B7 Add 9 to the row number sequence of the employee information to read the employee information in the next fragment.