Row-to-column Transposition of Non-header Text Files

Question
I have a tab delimited file that looks like this:

file.txt gives:

1 299

 

 1 150

 

 1 50

 

 1 57

 

 2 -45

 

 2 62

 

 3 515

 

 3 215

 

 3 -315

 

 3 -35

 

 3 3

 

 3 6789

 

 3 34

 

 5 66

 

 5 1334

 

 5 123

 

I’d like to use Unix commands to get a tab-delimited file that based on values in column#1, each column of the output file will hold all relevant values of column#2 (I’m using separator “|” instead of tab only to illustrate my desired output file):

299 | -45| 515| 66

 

150| 62| 215| 1334

 

50 | | -315 |

 

57 | | -35 |

 

| | 3 |

 

The corresponding Headers (1,2,3,5; based on column#1 values) could be a nice addition to the code (as shown below), but the main request is to split the information of the first file into separated columns. Thanks!

 

1| 2| 3| 5

 

299 | -45| 515| 66

 

150| 62| 215| 1334

 

50 | | -315 |

 

57 | | -35 |

 

| | 3 |

 

Below is a solution:

awk '

 

BEGIN{max=0;}

 

{

 

 d\[$1\]\[length(d\[$1\])+1\] = $2;

 

 if(length(d\[$1\])>max)

 

 max = length(d\[$1\]);

 

}

 

END{

 

 PROCINFO\["sorted\_in"\] = "@ind\_num_asc";

 

 line = "";

 

 flag = 0;

 

 for(j in d){

 

 line = line (flag?"\\t|\\t":"") j;

 

 flag = 1;

 

 }

 

 print line;

 

 for(i=1; i<=max; ++i){

 

 line = "";

 

 flag = 0;

 

 for(j in d){

 

 line = line (flag?"\\t|\\t":"") d\[j\]\[i\];

 

 flag = 1;

 

 }

 

 print line;

 

 }

 

}' file.txt

 

you get

 

1 | 2 | 3 | 5

 

299 | -45 | 515 | 66

 

150 | 62 | 215 | 1334

 

50 | | -315 |

 

57 | | -35 |

 

 | | 3 |

 

Or, you can use python …. for example, in split2Columns.py

import sys

 

records = \[line.split() for line in open(sys.argv\[1\])\]

 

import collections

 

records_dict = collections.defaultdict(list)

 

for key, val in records:

 

 records_dict\[key\].append(val)

 

from itertools import izip_longest

 

print "\\t|\\t".join(records_dict.keys())

 

print "\\n".join(("\\t|\\t".join(map(str,l)) for l in izip\_longest(*records\_dict.values(), fillvalue="")))

 

Answer
Group the text file by column#1 to generate the expected result set, and then enter members of each group into the result set in vertical form.

Here post-grouping operations and row-to-column transposition are involved. Below is the SPL script, which is simple and easy to understand:

A

1

=file("D:/file.txt").import()

2

=A1.group(#1)

3

=create(${A2.(#1).concat(",")}).paste(${A2.len().("A2("/~/").(#2)").concat(",")})