Mark-based Grouping & Order-based Computation
【Q uestion】
Based on the following Awk and Perl solutions, I need to understand how to add additional fields that don't repeat their output like"name"in these examples. For example, string devtwr1 only occurs once in the sample data. If I need to add additional fields that also occur once, how can I do? Tried extensively but can't work it out...
Here is the source file Ntab.txt. It is a sample file of two hosts. There are numerous hosts in the real-world file. Each host has multiple "displayName"s (addresses) with corresponding figures for each 'displayName'.
>cat Ntab.txt
name devtwr1
displayName 00:67:BB
capacityInKB 104,857,600
consumedCapacityInKB 4,042,752
dpPoolID 20
displayName 00:7B:FD
capacityInKB 52,428,800
consumedCapacityInKB 14,880,768
dpPoolID 10
displayName 00:7C:28
capacityInKB 34,179,712
consumedCapacityInKB 29,804,544
dpPoolID 20
displayName 00:7C:29
capacityInKB 34,179,712
consumedCapacityInKB 5,462,016
dpPoolID 20
name devtwr2
displayName 00:67:BB
capacityInKB 104,857,600
consumedCapacityInKB 4,042,752
dpPoolID 20
displayName 00:7B:FD
capacityInKB 52,428,800
consumedCapacityInKB 14,880,768
dpPoolID 10
displayName 00:7C:28
capacityInKB 34,179,712
consumedCapacityInKB 29,804,544
dpPoolID 20
displayName 00:7C:29
capacityInKB 34,179,712
consumedCapacityInKB 5,462,016
dpPoolID 20
I need to be able to produce the $2 data from rows into columns after each 'name' (host) and in CSV format, headings optional. I can’t use (,:) as separators as the data includes them, so either (tab or ;) would be used. The result would be like this:
name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
Here is an Awk solution but I couldn’t work out how to add an additional field, like "name", that doesn’t have data in all fields.
$ awk '$1=="name"{name=$2}
$1 ~/^(displayName|capacityInKB|consumedCapacityInKB)$/{out=out";"$2}
$1=="dpPoolID"{print name out";"$2; out=""}' tmp2.txt
devtwr1;00:67:BB;104,857,600;4,042,752;20
devtwr1;00:7B:FD;52,428,800;14,880,768;10
devtwr1;00:7C:28;34,179,712;29,804,544;20
devtwr1;00:7C:29;34,179,712;5,462,016;20
devtwr2;00:67:BB;104,857,600;4,042,752;20
devtwr2;00:7B:FD;52,428,800;14,880,768;10
devtwr2;00:7C:28;34,179,712;29,804,544;20
devtwr2;00:7C:29;34,179,712;5,462,016;20
And here is a Perl solution but once again I couldn't work out how to add an additional field, such"name", that doesn’t always have data.
perl -lane '
BEGIN {
@fields = qw(name displayName capacityInKB consumedCapacityInKB dpPoolID);
print join ";", @fields;
}
if (/^(name|displayName)/ && $data{displayName}) {
print join ";", @data{@fields};
%data = (name => $data{name} );
}
$data{$F[0]} = $F[1];
END {
print join ";",@data{@fields};
}' tmp2.txt > Report.csvv
【Answer】
Divide records into 2N groups (N is the number of names)according to whether the first field value is the same “name”, get names in the single records from groups numbered by odds, and concatenate M/4 records based on groups numbered by evens. It’s complicated to code the intuitive, order-based algorithm in Perl. So I choose to handle it in SPL (Structure Process Language), which generates simple code:
A |
B |
C |
|
1 |
=file("/Ntab.txt").cursor() |
>file("/Result.txt").write("name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID" |
|
2 |
for A1;#1=="name" |
if A2.#1=="name" |
=A2.#2 |
3 |
=A2.(#2).to(2,).group((#-1)\4).new(C2,~(1),~(2),~(3),~(4)) |
||
4 |
>file("/Result.txt").export@a(B3;";") |
A1: Import the file as a cursor. Default separator is tab. No limit on the size of the file.
B1: Write column headers into the first row.
A2: Loop over A1’s cursor to write data out by group according to whether the first field value is the same “name”. During a loop number that is an odd, "name devtwr1" and "name devtwr2" are read in; during a loop number that is an even, the other data in each group is read in.
B2-C2: If value of the first field of a loop variable is name (such as devtwr1), store value of the second field in C2 temporarily for later use. Then perform B3-B4.
B3: Get A2’s second field (multiple values), group them by sequence numbers and concatenate them into a record under same name. A2.(#2) means A2’s second field; the # sign in group() function represents a sequence number.
B4: Append B3’s result to a specified file.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL