Excel Post-grouping Calculations

 

Excel supports grouping & aggregation on data, as well as offering pivot tables to perform multidimensional analysis. But Excel methods and operations are too simple to handle complicated tasks or manage them conveniently. Such tasks include getting ranks on each subset, and filtering and sorting by aggregates. In this article, well give examples of post-grouping calculations, analyze it and provide SPL code solution. SPL (Structured Process Language) is the language used by esProc, the specialized data computing engine. Its convenient to use for handling structured computations.

I Intra-grouping sorting

Below is the student scores table. We want to get rankings for each subject. To do this we need to first group records by subject and then perform sorting on each group.

Below is the Excel data:

undefined 

The expected result:

undefined 

A SPL script interacts with Excel through clipboard. We edit an SPL script in esProc designer, paste the to-be-analyzed Excel data to the clipboard, execute the script, and then paste result to Excel.

SPL provides stepwise coding in an easy way:


A

B

1

=clipboard().import@t()

/ Import student scores table with column headers

2

=A1.group(Subject)

/ Group records by Subject

3

=A2.(~.sort(Score:-1))

/ Sort each group by score in ascending order

4

=A3.(~.derive(#:Rank))

/ Add a Rank field for each group of records

5

=A4.conj()

/ Concatenate groups of sorted records

6

=A5.new(Subject,Rank,Score,Name)

/ Create a new table sequence with fields of different orders

7

=clipboard(A6.export@t())

/ Paste the sorting result to clipboard

Then we just paste the result to Excel to get the desired table.

To get records containing only the top 3 scores only for each subject, we just use top function to get the eligible records from each group.

The expected result:

undefined 

SPL uses top function to get records of top 3 scores:


A

B

1

=clipboard().import@t()

/ Import student scores table with column headers

2

=A1.group(Subject)

/ Group records by Subject

3

=A2.(~.top(-3;Score))

/ Get the top 3 scores by Score from each group

4

=A3.(~.derive(#:Rank))

/ Add a Rank field for each group of records

5

=A4.conj()

/ Concatenate groups of sorted records

6

=A5.new(Subject,Rank,Score,Name)

/ Create a new table sequence with fields of different orders

7

=clipboard(A6.export@t())

/ Paste the result to clipboard

Then we just paste the result to Excel to get the desired table.

 

II Sorting by sub-aggregate

Another scenario is to sort groups according to the sub-aggregates, such as ranking students by their total scores based on the student scores table. The expected result is as follows:

undefined 

To do this, SPL uses groups function to group records while calculating total for each group:


A

B

1

=clipboard().import@t()

/ Import student scores table with column headers

2

=A1.groups(Name;sum(Score):Total)

/ Group records by name and calculate each students total score

3

=A2.sort(Total:-1)

/ Sort total scores in descending order

4

=A3.derive(#:Rank)

/ Add a Rank field

5

=clipboard(A4.export@t())

/ Export the sorting result and paste it to clipboard

 

III Intra-group filtering

This time we want to find students whose performances are not satisfactory. Since exams of subjects are different in difficulty levels, its unsuitable to get them by Fail. Instead, we need to find those whose scores are below the average for each subject.

The expected result:

undefined 

So we group records by subject, calculate the average of each subject, and find students whose scores are below the average.

SPL script:


A

B

1

=clipboard().import@t()

/ Import student scores table with column headers

2

=A1.group(Subject)

/ Group records by Subject

3

=A2.((AVG=~.avg(Score),~.select(Score<AVG)))

/ Calculate average of each group, and find students whose scores are below the average

4

=A3.conj()

/ Concatenate records of eligible students

5

=clipboard(A4.export@t())

/ Export the result and paste it to clipboard

 

IV Filtering by sub-aggregate

In another scenario, we want to group records by name, calculate total of each group, and find students whose total scores are below the average total.

The expected result:

undefined 

To perform an operation, filtering here, over the subtotals, we use groups function to group records, calculate the subtotals, get the average total and find the eligible students.

SPL script:


A

B

1

=clipboard().import@t()

/ Import student scores table with column headers

2

=A1.groups(Name;sum(Score):Total)

/ Group records by name and calculate total score for each student

3

=A2.select(Total<A2.avg(Total))

/ Calculate average total score and find students whose scores are below the average

4

=clipboard(A3.export@t())

/ Export the result and paste it to clipboard

 

V Intra-group percentage calculation

The following Excel sheet stores GDP values of some countries in 2019. The task is to group records by continent and calculate the percentage of each countrys GDP in its continent.

Below is the 2019 GDP table (Unit: $billion):

undefined 

The expected result:

undefined 

We can group records by Area, sum the total GDP of each continent, and then calculate the percentage of each countrys GDP in every group.

SPL script:


A

B

1

=clipboard().import@t()

/ Import 2019 GDP table with column headers from clipboard

2

=A1.group(Area)

/ Group records by Area

3

=A2.((SUM=~.sum(GDP),~.derive(string(GDP/SUM,"0.00%"):Percentage)))

/ Sum the total GDP of each group and calculate percentage of each country in it

4

=A3.conj()

/ Concatenate result records of all groups

5

=clipboard(A4.export@t())

/ Export the result and paste it to clipboard

 

VI Calculating percentage of each group

Lets look at how to calculate the percentage of each continents GDP in the global total.

The expected result:

undefined 

Here we also use groups function to group records, calculate GDP of each continent, and then get the percentage of each continents GDP in the global total.

SPL script:


A

B

1

=clipboard().import@t()

/ Import 2019 GDP table with column headers from clipboard

2

=A1.groups(Area;sum(GDP):Total)

/ Group records by Area and calculate total GDP of each continent

3

=A2.derive(string(Total/A2.sum(Total),"0.00%"):Percentage)

/ Calculate percentage of each continents GDP in the global total

4

=A3.run(Total=string(Total,"0.00"))

/ Adjust the display accuracy of Total value

5

=clipboard(A4.export@t())

/ Export the result and paste it to clipboard

 

Read Excel Grouping & Aggregation Examples to learn more analyses and solutions about Excel grouping & aggregation.

Find more agile computing examples in SPL Cookbook