Group & Summarize a Big Excel File Every N Rows

Title: Python / Excel 15 Minute Timestamps to 1 Hour and Sum Vlues

Source: https://stackoverflow.com/questions/64167990/python-excel-15-minute-timestamps-to-1-hour-and-sum-vlues

Question

I have a Dataset with 15 Minute Timestamps and Values

undefined

I need it hourly like this (summed values all 4 rows):

undefined

How to do this?

My Dataset has 127000 rows…3 Years, so manually is not an option.

Thanks for your help.

Answer

Instead of grouping rows by same column values, you give a special grouping condition. You want to group data every fixed number of rows (N) and then perform aggregation.

Excel doesn’t have a direct method to do this. It uses the combination of offset function and index function or the like. But those functions are irrelevant with handling fixed number of rows. So the solution is hard to understand.

I noticed that you want to do it in Python. As both the source Excel file and the result Excel file are big, they may not fit into the memory. It would be the best to process the file in a circular way. You read 4N rows each time rather than the whole data at once, group and summarize the rows, append the result to the target result Excel, and then read in the next 4N rows to process. But Python hasn’t a direct external computation mechanism. It’s complex to write code manually.

esProc SPL is also able to process Excel data. It is particularly good at handling various unusual grouping scenarios. It also provides the cursor mechanism to produce simple code. For your question, the SPL script is like this:

A

B

C

1

=file("D:\data.xlsx").xlsimport@c()

2

for A1,4*1000

=A2.group@i(#%4==1).new(~(4)._1,~.sum(_2))

/ Read in and process 4000   rows at a time

3

=file("D:\new.xlsx").xlsexport@a(B2)

/ Append result to the   target Excel file

SPL is good at handling operations on xls or csv files. Find more examples and learn more about SPL in http://www.raqsoft.com/p/script-over-csv-xls.