2014-06-20 42 views
1

這是從一個跟進的問題在其他文本文件存在:Unexpected result comparing values of rows and columns in two text files如何檢查是否從一個文本文件中的一行用awk

我創建了一個結構,根據自己的行和列比較兩個文本文件。以下是文件結構:

FILE1.TXT

Name Col1 Col2 Col3 
----------------------- 
row1 1  4  7   
row2 2  5  8   
row3 3  6  9 

FILE2.TXT

Name Col1 Col2 Col3 
-----------------------   
row1 1  4  7 
row2 2  5  999 

這裏是代碼我迄今:

dos2unix ravi # 2>/dev/null 
dos2unix ravi2 # 2>/dev/null 

awk '  
    FNR < 2 {next}  # skips first two lines 
    FNR == NR {   
     for (i = 2; i <= NF; i++) { 
      a[i,$1] = $i;    
     }  
     b[$1];    
     next;      
    } 

    ($1 in b) {     # check if row in file2 existed in file1 
     for (i = 2; i <= NF; i++) { 
      if (a[i,$1] == $i) 
       printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i); 
      else 
       printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i); 
     } 
    } 

    !($1 in b) {     # check if row in file2 doesn't exist in file1. 
     for (i = 2; i <= NF; i++) 
      printf("%s->col%d: %s vs %s: Are Not Equal\n", $1, i-1, "blank", $i); 
    } 

    // pattern needed to check if row in file1 doesn't exist in file2. 

    ' $PWD/file1.txt $PWD/file2.txt 

沒有人有任何提示,建議或提示在awk語句中有一個模式來檢查file1中的行是否不存在文件2。請參閱下面的示例輸出以瞭解我的意思。 (即:基本上,我想打印file1中row3的值不存在於file2中)。謝謝!讓我知道是否需要進一步解釋。

所需的輸出:

row2->Col1: 1 vs 1: Equal 
row2->Col2: 4 vs 4: Equal 
row2->Col3: 7 vs 7: Equal 
row1->Col1: 2 vs 2: Equal 
row1->Col2: 5 vs 5: Equal 
row1->Col3: 8 vs 999: Not Equal 
row3->Col1: 3 vs (blank) : Not Equal 
row3->Col2: 6 vs (blank) : Not Equal 
row3->Col3: 9 vs (blank) : Not Equal 

實際輸出:

row2->Col1: 1 vs 1: Equal 
row2->Col2: 4 vs 4: Equal 
row2->Col3: 7 vs 7: Equal 
row1->Col1: 2 vs 2: Equal 
row1->Col2: 5 vs 5: Equal 
row1->Col3: 8 vs 999: Not Equal 
+0

你應該使用一個小python腳本對於這一點,但,這只是我的兩分錢。 –

回答

4

擴展你的答案:

$ cat script.awk 
FNR < 2 { next }  # skips first two lines 
FNR == NR { 
    for (i = 2; i <= NF; i++) { a[i,$1] = $i } 
    b[$1]; 
    next; 
} 
($1 in b) {     # check if row in file2 existed in file1 
    for (i = 2; i <= NF; i++) { 
     if (a[i,$1] == $i) 
      printf("%s->col%d: %s vs %s: Are Equal\n", $1, i-1, a[i,$1], $i); 
     else 
      printf("%s->col%d: %s vs %s: Not Equal\n", $1, i-1, a[i,$1], $i); 
    } 
    delete b[$1]; # delete entries which are processed 
} 

END { 
    for (left in b) { # look which didn't match 
     for (i = 2; i <= NF; i++) 
      printf("%s->col%d: %s vs (blank): Not Equal\n", left, i-1, a[i,left]) 
    } 
} 

運行它想:

$ awk -f script.awk file1 file2 
row1->col1: 1 vs 1: Are Equal 
row1->col2: 4 vs 4: Are Equal 
row1->col3: 7 vs 7: Are Equal 
row2->col1: 2 vs 2: Are Equal 
row2->col2: 5 vs 5: Are Equal 
row2->col3: 8 vs 999: Not Equal 
row3->col1: 3 vs (blank): Not Equal 
row3->col2: 6 vs (blank): Not Equal 
row3->col3: 9 vs (blank): Not Equal 
+1

+1在這個長代碼中花費這麼多時間 – anubhava

+1

@jaypal我很欣賞你在編碼這個(所以+1)方面花費的時間,但它仍然不會爲任何列打印「row3」。 – Alias

+2

@Nosscire確保他們沒有任何控制字符。我只是測試了這一點,它可以很好地處理給定的數據。 –

1

如果你知道每一行「名稱」(第一列)會出現在每個文件最多一次,那麼你可以delete b[$1]($1 in b)塊的末尾,將其上面的!($1 in b)塊移動,然後添加一個END塊,該塊將循環遍歷b中的所有內容並打印出您的行。

END { 
    for (r in b) { 
     for (i = 2; i <= NF; i++) { 
      printf("%s->col%d: %s vs %s: Are Not Equal\n", r, i-1, $i, "blank"); 
     } 
    } 
} 
相關問題