2012-07-25 87 views
1

您好我想使用AWK或Perl來獲取以下格式的輸出文件。我的輸入文件是一個空格分隔的文本文件。這與我的早期問題類似,但在這種情況下,輸入和輸出沒有格式。我列的位置可能改變,將不勝感激不引用技術列數使用AWK或Perl轉置

輸入文件

id quantity colour shape size colour shape size colour shape size 
1 10 blue square 10 red triangle 12 pink circle 20 
2 12 yellow pentagon 3 orange rectangle 4 purple oval 6 

所需的輸出

id colour shape size 
1 blue square 10 
1 red triangle 12 
1 pink circle 20 
2 yellow pentagon 3 
2 orange rectangle 4 
2 purple oval 6 

我使用此代碼由丹尼斯·威廉姆森。唯一的問題是我得到的輸出在轉置字段中沒有空間分隔。我需要一個空間分隔

#!/usr/bin/awk -f 
BEGIN { 
col_list = "quantity colour shape" 
# Use a B ("blank") to add spaces in the output before or 
# after a format string (e.g. %6dB), but generally use the numeric argument 

# columns to be repeated on multiple lines may appear anywhere in 
# the input, but they will be output together at the beginning of the line 
repeat_fields["id"] 
# since these are individually set we won't use B 
repeat_fmt["id"] = "%-1s " 
# additional fields to repeat on each line 

ncols = split(col_list, cols) 

for (i = 1; i <= ncols; i++) { 
    col_names[cols[i]] 
    forms[cols[i]] = "%-1s" 
} 
} 


# save the positions of the columns using the header line 
FNR == 1 { 
for (i = 1; i <= NF; i++) { 
    if ($i in repeat_fields) { 
     repeat[++nrepeats] = i 
     repeat_look[i] = i 
     rformats[i] = repeat_fmt[$i] 
    } 
    if ($i in col_names) { 
     col_nums[++n] = i 
     col_look[i] = i 
     formats[i] = forms[$i] 
    } 
} 
# print the header line 
for (i = 1; i <= nrepeats; i++) { 
    f = rformats[repeat[i]] 
    sub("d", "s", f) 
    gsub("B", " ", f) 
    printf f, $repeat[i] 
} 
for (i = 1; i <= ncols; i++) { 
    f = formats[col_nums[i]] 
    sub("d", "s", f) 
    gsub("B", " ", f) 
    printf f, $col_nums[i] 
} 
printf "\n" 
next 
} 

{ 
for (i = 1; i <= NF; i++) { 
    if (i in repeat_look) { 
     f = rformats[i] 
     gsub("B", " ", f) 
     repeat_out = repeat_out sprintf(f, $i) 

    } 
    if (i in col_look) { 
     f = formats[i] 
     gsub("B", " ", f) 
     out = out sprintf(f, $i) 
     coln++ 
    } 
    if (coln == ncols) { 
     print repeat_out out 
     out = "" 
     coln = 0 
    } 
} 
repeat_out = "" 
} 

輸出

id quantitycolourshape 
1 10bluesquare 
1 redtrianglepink 
2 circle12yellow 
2 pentagonorangerectangle 

我道歉,不包括對實際的文件之前的所有信息。我只是爲了簡單而做到了這一點,但並沒有達到我的所有要求。

在我的實際文件我期待轉,你的真實數據包含超過5000列N_CELL領域和n_bsc節點SITE兒童

NODE SITE CHILD n_cell n_bsc 

Here is a link to the actual file I am working on

+4

語言的名稱是 「Perl的」,而不是 「PERL」。 – ikegami 2012-07-25 22:32:15

+1

但是它是「AWK」。我對這個問題的回答將與[我對您以前的問題的回答]相同(http://stackoverflow.com/a/11454983/26428)。 – 2012-07-26 00:12:39

+1

[Transpose using AWK]的可能的重複(http://stackoverflow.com/questions/11447885/transpose-using-awk) – dgw 2012-07-26 07:40:48

回答

3
<>; 
print("id colour shape size\n"); 

while (<>) { 
    my @combined_fields = split; 
    my $id = shift(@combined_fields); 
    while (@combined_fields) { 
     my @fields = ($id, splice(@combined_fields, 0, 3)); 
     print(join(' ', @fields), "\n"); 
    } 
} 
+0

我該如何運行? – 2012-07-25 22:42:15

+0

'perl script.pl infile> outfile'或in-place:'perl -i script.pl文件' – ikegami 2012-07-25 23:28:49

+0

我的實際輸入文件有超過5k列,所以想要使用標題行參考固定列和列轉置列ID的問題 – 2012-07-26 07:50:07

0

你告訴我們,其專欄位置可能會改變,我恐怕這還不夠。

因此,如果沒有任何適當的信息,我已經寫了這個,它使用標題行來計算數據集的數量和大小,其中id列在哪裏,第一組在哪列開始。

它對您的示例數據正常工作,但我只能猜測它是否會在您的活動文件上工作。

use strict; 
use warnings; 

my @headers = split ' ', <>; 

my %headers; 
$headers{$_}++ for @headers; 

die "Expected exactly one 'id' column" unless $headers{id} // 0 == 1; 
my $id_index = 0; 
$id_index++ while $headers[$id_index] ne 'id'; 

my @labels = grep $headers{$_} > 1, keys %headers; 
my $set_size = @labels; 
my $num_sets = $headers{$labels[0]}; 

my $start_index = 0; 
$start_index++ while $headers[$start_index] ne $labels[0]; 

my @reformat; 

while (<>) { 
    my @fields = split; 
    next unless @fields; 
    my $id = $fields[$id_index]; 
    for (my $i = $start_index; $i < @fields; $i+=$set_size) { 
    push @reformat, [ $id, @fields[$i..$i + $set_size - 1] ]; 
    } 
} 

unshift @labels, 'id'; 
print "@labels\n"; 
print "@$_\n" for @reformat; 

輸出

id colour shape size 
1 blue square 10 
1 red triangle 12 
1 pink circle 20 
2 yellow pentagon 3 
2 orange rectangle 4 
2 purple oval 6