6.10 Enumerated grouping: put each member to the first matching group

 

One type of grouping operations is putting records meeting the same condition to one group. For instance, we group cities as big, medium-sized and small according to population and divide families into rich, middle-income, well-being and poor based on yearly income. We call them enumerated grouping as they, according to predefined enumerated conditions, put eligible members meeting a condition to the corresponding group.

According to order of a set of predefined enumerated conditions, an enumerated grouping operation calculates each of these conditions using members of a to-be-grouped set as parameters and place every member that first makes the condition true to the corresponding group.

Here our task is to classify cities and count the number in each group according to the urban population of certain Chinese cities. Cities with a population of more than two million are megacities, those with a population between one million and two million are large cities, those having a population between half a million and one million are medium-sized cities, and other are small cities. Below is part of the city data:

ID CITY POPULATION PROVINCE
1 Shanghai 12286274 Shanghai
2 Beijing 9931140 Beijing
3 Chongqing 7421420 Chongqing
4 Guangzhou 7240465 Guangdong
5 Hong Kong 7010000 Hong Kong Special Administrative Region

SPL offers A.enum() function to perform enumerated grouping operations that by default put each members to the first matching group.

SPL script:

A
1 =T(“UrbanPopulation.txt”)
2 [?>2000000,?>1000000,?>500000,?<=500000]
3 =A1.enum(A2,POPULATION)
4 [Megacity,Large city,medium-sized city,small city]
5 =A3.new(A4(#):CITY_TYPE,~.count():COUNT)

A1 Import UrbanPopulation table.
A2 Define grouping conditions.
A3 Use A.enum() function to group UrbanPopulation table according to the enumerated conditions by putting each member only to the first matching group.
A4 Define names for groups.
A5 Get the number of cities in each group based on the result set of the enumerated grouping operation.