2016-08-16 25 views
-1

我對數據對齊有挑戰。在下面的例子中,有屬於上一行末尾的行,以及需要更改爲空的其他行。AWK多條記錄字段的總和,如果等於變量,則合併

[email protected]@[email protected]@81 1/[email protected]/[email protected]/[email protected]/[email protected]@7.40 5w BITE SLOW 
[email protected]@[email protected]@41/[email protected]/[email protected]@[email protected]/[email protected] 4w BITE SLOW 
[email protected]@[email protected]@31/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]* led 1/16p, BITE SLOW 
[email protected]@[email protected]@[email protected]/[email protected]@[email protected] 3/4 ins, BITE SLOW 
[email protected]@[email protected]@61/[email protected]/[email protected]/[email protected]/[email protected]@26.25 cut BITE SLOW 
[email protected]@[email protected]@[email protected]@81/[email protected]/[email protected]@13.10 5w BITE SLOW 
[email protected]@[email protected]@[email protected]/[email protected]@[email protected] 1/[email protected] 4w BITE FAST 
[email protected]@[email protected]@51/[email protected]@[email protected]/[email protected]@15.90 3w BITE FAST 
[email protected]@[email protected]@11/[email protected]/[email protected]/[email protected]@[email protected] in BITE FAST 
[email protected]@[email protected]@[email protected]@[email protected]@10 2 3/4 
19.50 6w upper, no response 
[email protected]@[email protected]@[email protected]@[email protected]@[email protected] off slow, no impact 

我迄今所做的就是創建一個輸出的所有記錄中最常用的字段計數的變量。我想在awk中使用該變量來查找缺少數據或格式不正確的行。

FLDCNT=$(sed '/^ *$/d' file | sed 's/^ *//g' |awk [email protected] '{print NF}' | sort | uniq -c | sort |awk 'END{print $NF}'); 

使用的線沿線的東西:

awk [email protected] -v x=$FLDCNT '{if(NF != x && [some code to check record and next record's combined field count = $FLDCNT]) [add the next row to the end of the current rows fields] print }' file 

我能找到我需要使用的行/記錄,但我對如何預製棒中的步驟「[]不確定「部分上面的代碼。

最後,輸出應該是:

[email protected]@[email protected]@81 1/[email protected]/[email protected]/[email protected]/[email protected]@7.40 5w BITE SLOW 
[email protected]@[email protected]@41/[email protected]/[email protected]@[email protected]/[email protected] 4w BITE SLOW 
[email protected]@[email protected]@31/[email protected]/[email protected]/[email protected]/[email protected]/[email protected]* led 1/16p, BITE SLOW 
[email protected]@[email protected]@[email protected]/[email protected]@[email protected] 3/4 ins, BITE [email protected] 
[email protected]@[email protected]@61/[email protected]/[email protected]/[email protected]/[email protected]@26.25 cut BITE SLOW 
[email protected]@[email protected]@[email protected]@81/[email protected]/[email protected]@13.10 5w BITE SLOW 
[email protected]@[email protected]@[email protected]/[email protected]@[email protected] 1/[email protected] 4w BITE FAST 
[email protected]@[email protected]@51/[email protected]@[email protected]/[email protected]@15.90 3w BITE FAST 
[email protected]@[email protected]@11/[email protected]/[email protected]/[email protected]@[email protected] in BITE FAST 
[email protected]@[email protected]@[email protected]@[email protected]@10 2 3/[email protected] 6w upper, no response 
[email protected]@[email protected]@[email protected]@[email protected]@[email protected] off slow, no impact 

我從例如知道,有一些更簡單的解決方案,就像在一個已知的字段格式使用if語句。但是,我正在處理數千個文件,並且所有這些文件的字段和記錄數都不相同。總之,我試圖找到所有行中最常見的列數,找到不匹配該常見數字的行,讓我們調用這些古怪的球,並檢查是否將下一行添加到古怪球中導致奇怪的球具有與普通數相同的列數,並且如果是這樣,則將這些行附加在一起。

+0

謝謝是信息將變爲空的行。我會更新我的問題以反映這一結果。 –

+0

然後他們一起有12個字段,如果這不是通用字段計數,那是在變量中聲明的,那麼'NULL'應該被追加到行的末尾。 –

回答

1

我注意到你的問題也被標記爲perl。所以,這裏是一個Perl的可讀的解決方案:

#!/usr/bin/perl 
use warnings; 
use strict; 

my %count; 
my $max = 0; 

open my $FH, '<', shift or die $!; 
while (<$FH>) { 
    my $c = split /@/; 
    $count{$c}++; 
    $max = $c if $c > $max; 
} 

warn "The max count ($max) different from the most common\n" 
    if grep $_ > $count{$max}, values %count; 

seek $FH, 0, 0; 
my $leftover = 0; 
while (<$FH>) { 
    my $c = $leftover + split /@/; 
    if ($leftover) { 
     print '@'; 
     if ($c > $max) { 
      $c -= $leftover; 
      print "NULL\n"; 
     } 
    } 

    if ($c != $max) { 
     $leftover += $c; 
     chomp; 
    } else { 
     $leftover = 0; 
    } 
    print; 
} 
0

着一種可能的方法與awk

awk [email protected] ' 
    NF>=10 { 
     if(p!="") { 
     print p "@NULL" 
     p="" 
     } 
     print 
    } 
    NF<10 { 
     if(!p || p=="") { 
     p=$0 
     } else { 
     print p "@" $0;p="" 
     } 
    }' file 

該腳本將打印具有10個或多個字段的所有行。如果連續的行少於10個字段,它將連接連續的行,並且在9個字段的情況下將附加@NULL

相關問題