Mark-based Grouping & Order-based Computation

Q       uestion

Based on the following Awk and Perl solutions, I need to understand how to add additional fields that don't repeat their output like"name"in these examples. For example, string devtwr1 only occurs once in the sample data. If I need to add additional fields that also occur once, how can I do? Tried extensively but can't work it out...

Here is the source file Ntab.txt. It is a sample file of two hosts. There are numerous hosts in the real-world file. Each host has multiple "displayName"s (addresses) with corresponding figures for each 'displayName'.

 

>cat Ntab.txt

name    devtwr1

displayName     00:67:BB

capacityInKB    104,857,600

consumedCapacityInKB    4,042,752

dpPoolID        20

displayName     00:7B:FD

capacityInKB    52,428,800

consumedCapacityInKB    14,880,768

dpPoolID        10

displayName     00:7C:28

capacityInKB    34,179,712

consumedCapacityInKB    29,804,544

dpPoolID        20

displayName     00:7C:29

capacityInKB    34,179,712

consumedCapacityInKB    5,462,016

dpPoolID        20

name    devtwr2

displayName     00:67:BB

capacityInKB    104,857,600

consumedCapacityInKB    4,042,752

dpPoolID        20

displayName     00:7B:FD

capacityInKB    52,428,800

consumedCapacityInKB    14,880,768

dpPoolID        10

displayName     00:7C:28

capacityInKB    34,179,712

consumedCapacityInKB    29,804,544

dpPoolID        20

displayName     00:7C:29

capacityInKB    34,179,712

consumedCapacityInKB    5,462,016

dpPoolID        20

 

I need to be able to produce the $2 data from rows into columns after each 'name' (host) and in CSV format, headings optional. I can’t use (,:) as separators as the data includes them, so either (tab or ;) would be used. The result would be like this:

 

name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID       

devtwr1;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

devtwr2;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

 

Here is an Awk solution but I couldn’t work out how to add an additional field, like "name", that doesn’t have data in all fields.

 

$ awk '$1=="name"{name=$2}

    $1 ~/^(displayName|capacityInKB|consumedCapacityInKB)$/{out=out";"$2}

    $1=="dpPoolID"{print name out";"$2; out=""}' tmp2.txt

devtwr1;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

devtwr2;00:67:BB;104,857,600;4,042,752;20

devtwr2;00:7B:FD;52,428,800;14,880,768;10

devtwr2;00:7C:28;34,179,712;29,804,544;20

devtwr2;00:7C:29;34,179,712;5,462,016;20

 

And here is a Perl solution but once again I couldn't work out how to add an additional field, such"name", that doesn’t always have data.

 

perl -lane '

BEGIN {

  @fields = qw(name displayName capacityInKB consumedCapacityInKB dpPoolID);

  print join ";", @fields;

}

if (/^(name|displayName)/ && $data{displayName}) {

  print join ";", @data{@fields};

  %data = (name => $data{name} );

}

$data{$F[0]} = $F[1];

END {

  print join ";",@data{@fields};

}' tmp2.txt > Report.csvv

 

Answer

Divide records into 2N groups (N is the number of names)according to whether the first field value is the same “name”, get names in the single records from groups numbered by odds, and concatenate M/4 records based on groups numbered by evens. It’s complicated to code the intuitive, order-based algorithm in Perl. So I choose to handle it in SPL (Structure Process Language), which generates simple code:

A

B

C

1

=file("/Ntab.txt").cursor()

>file("/Result.txt").write("name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID"

2

for  A1;#1=="name"

if  A2.#1=="name"

=A2.#2

3

=A2.(#2).to(2,).group((#-1)\4).new(C2,~(1),~(2),~(3),~(4))

4

>file("/Result.txt").export@a(B3;";")

A1: Import the file as a cursor. Default separator is tab. No limit on the size of the file.

B1: Write column headers into the first row.

A2: Loop over A1’s cursor to write data out by group according to whether the first field value is the same “name”. During a loop number that is an odd, "name devtwr1" and "name devtwr2" are read in; during a loop number that is an even, the other data in each group is read in.

B2-C2: If value of the first field of a loop variable is name (such as devtwr1), store value of the second field in C2 temporarily for later use. Then perform B3-B4.

B3: Get A2’s second field (multiple values), group them by sequence numbers and concatenate them into a record under same name. A2.(#2) means A2’s second field; the # sign in group() function represents a sequence number.

B4: Append B3’s result to a specified file.