blackduckie RaqForum 28 No.
429 View • 5 Years ago

Mark-based Grouping & Order-based Computation

text(125)

【Q uestion】

Based on the following Awk and Perl solutions, I need to understand how to add additional fields that don't repeat their output like"name"in these examples. For example, string devtwr1 only occurs once in the sample data. If I need to add additional fields that also occur once, how can I do? Tried extensively but can't work it out...

Here is the source file Ntab.txt. It is a sample file of two hosts. There are numerous hosts in the real-world file. Each host has multiple "displayName"s (addresses) with corresponding figures for each 'displayName'.

>cat Ntab.txt

name devtwr1

displayName 00:67:BB

capacityInKB 104,857,600

consumedCapacityInKB 4,042,752

dpPoolID 20

displayName 00:7B:FD

capacityInKB 52,428,800

consumedCapacityInKB 14,880,768

dpPoolID 10

displayName 00:7C:28

capacityInKB 34,179,712

consumedCapacityInKB 29,804,544

dpPoolID 20

displayName 00:7C:29

capacityInKB 34,179,712

consumedCapacityInKB 5,462,016

dpPoolID 20

name devtwr2

displayName 00:67:BB

capacityInKB 104,857,600

consumedCapacityInKB 4,042,752

dpPoolID 20

displayName 00:7B:FD

capacityInKB 52,428,800

consumedCapacityInKB 14,880,768

dpPoolID 10

displayName 00:7C:28

capacityInKB 34,179,712

consumedCapacityInKB 29,804,544

dpPoolID 20

displayName 00:7C:29

capacityInKB 34,179,712

consumedCapacityInKB 5,462,016

dpPoolID 20

I need to be able to produce the $2 data from rows into columns after each 'name' (host) and in CSV format, headings optional. I can’t use (,:) as separators as the data includes them, so either (tab or ;) would be used. The result would be like this:

name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID

devtwr1;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

devtwr2;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

Here is an Awk solution but I couldn’t work out how to add an additional field, like "name", that doesn’t have data in all fields.

$ awk '$1=="name"{name=$2}

$1 ~/^(displayName|capacityInKB|consumedCapacityInKB)$/{out=out";"$2}

$1=="dpPoolID"{print name out";"$2; out=""}' tmp2.txt

devtwr1;00:67:BB;104,857,600;4,042,752;20

devtwr1;00:7B:FD;52,428,800;14,880,768;10

devtwr1;00:7C:28;34,179,712;29,804,544;20

devtwr1;00:7C:29;34,179,712;5,462,016;20

devtwr2;00:67:BB;104,857,600;4,042,752;20

devtwr2;00:7B:FD;52,428,800;14,880,768;10

devtwr2;00:7C:28;34,179,712;29,804,544;20

devtwr2;00:7C:29;34,179,712;5,462,016;20

And here is a Perl solution but once again I couldn't work out how to add an additional field, such"name", that doesn’t always have data.

perl -lane '

BEGIN {

@fields = qw(name displayName capacityInKB consumedCapacityInKB dpPoolID);

print join ";", @fields;

}

if (/^(name|displayName)/ && $data{displayName}) {

print join ";", @data{@fields};

%data = (name => $data{name} );

}

$data{$F[0]} = $F[1];

END {

print join ";",@data{@fields};

}' tmp2.txt > Report.csvv

【Answer】

Divide records into 2N groups (N is the number of names)according to whether the first field value is the same “name”, get names in the single records from groups numbered by odds, and concatenate M/4 records based on groups numbered by evens. It’s complicated to code the intuitive, order-based algorithm in Perl. So I choose to handle it in SPL (Structure Process Language), which generates simple code:

	A	B	C
1	=file("/Ntab.txt").cursor()	>file("/Result.txt").write("name;displayName;capacityInKB;consumedCapacityInKB;dpPoolID"
2	for A1;#1=="name"	if A2.#1=="name"	=A2.#2
3		=A2.(#2).to(2,).group((#-1)\4).new(C2,~(1),~(2),~(3),~(4))
4		>file("/Result.txt").export@a(B3;";")

A1: Import the file as a cursor. Default separator is tab. No limit on the size of the file.

B1: Write column headers into the first row.

A2: Loop over A1’s cursor to write data out by group according to whether the first field value is the same “name”. During a loop number that is an odd, "name devtwr1" and "name devtwr2" are read in; during a loop number that is an even, the other data in each group is read in.

B2-C2: If value of the first field of a loop variable is name (such as devtwr1), store value of the second field in C2 temporarily for later use. Then perform B3-B4.

B3: Get A2’s second field (multiple values), group them by sequence numbers and concatenate them into a record under same name. A2.(#2) means A2’s second field; the # sign in group() function represents a sequence number.

B4: Append B3’s result to a specified file.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/2bkGwqTj

Youtube 👉 https://www.youtube.com/@esProc_SPL

text(125)

Application

blackduckie • 429 View • 5 Years ago