查找特定的列，並用gawk替換具有特定值的以下列

我正在嘗試查找我的數據有重複行並刪除重複行的所有位置。此外，我正在尋找第二列的值爲90，並用指定的特定號碼替換下面的第二列。查找特定的列，並用gawk替換具有特定值的以下列

我的數據是這樣的：

#  Type Response  Acc  RT  Offset  
    1  70 0 0 0.0000 57850 
    2  31 0 0 0.0000 59371 
    3  41 0 0 0.0000 60909 
    4  70 0 0 0.0000 61478 
    5  31 0 0 0.0000 62999 
    6  41 0 0 0.0000 64537 
    7  41 0 0 0.0000 64537 
    8  70 0 0 0.0000 65106 
    9  11 0 0 0.0000 66627 
    10  21 0 0 0.0000 68165 
    11  90 0 0 0.0000 68700 
    12  31 0 0 0.0000 70221

我希望我的數據是這樣的：

#  Type Response  Acc  RT  Offset 
    1  70 0 0 0.0000 57850 
    2  31 0 0 0.0000 59371 
    3  41 0 0 0.0000 60909 
    4  70 0 0 0.0000 61478 
    5  31 0 0 0.0000 62999 
    6  41 0 0 0.0000 64537 
    8  70 0 0 0.0000 65106 
    9  11 0 0 0.0000 66627 
    10  21 0 0 0.0000 68165 
    11  90 0 0 0.0000 68700 
    12  5 0 0 0.0000 70221

我的代碼：

BEGIN { 
priorline = ""; 
ERROROFFSET = 50; 
ERRORVALUE[10] = 1; 
ERRORVALUE[11] = 2; 
ERRORVALUE[12] = 3; 
ERRORVALUE[30] = 4; 
ERRORVALUE[31] = 5; 
ERRORVALUE[32] = 6; 

ORS = "\n"; 
} 

NR == 1 { 
print; 
getline; 
priorline = $0; 
} 

NF == 6 { 

brandnewline = $0 
mytype = $2 
$0 = priorline 
priorField2 = $2; 

if (mytype !~ priorField2) { 
print; 
priorline = brandnewline; 
} 

if (priorField2 == "90") { 
    mytype = ERRORVALUE[mytype]; 
    } 
} 

END {print brandnewline} 


##Here the parameters of the brandnewline is set to the current line and then the 
##proirline is set to the line on which we just worked on and the brandnewline is 
##set to be the next new line we are working on. (i.e line 1 = brandnewline, now 
##we set priorline = brandnewline, thus priorline is line 1 and brandnewline takes 
##on line 2) Next, the same parameters were set with column 2, mytype being the 
##current column 2 value and priorField2 being the same value as mytype moves to 
##the next column 2 value. Finally, we wrote an if statement where, if the value 
##in column 2 of the current line !~ (does not equal) value of column two of the 
##previous line, then the current line will be print otherwise it will just be 
##skipped over. The second if statement recognizes the lines in which the value 
##90 appeared and replaces the value in column 2 with a previously defined 
##ERRORVALUE set for each specific type (type 10=1, 11=2,12=3, 30=4, 31=5, 32=6).

我已經能夠成功地刪除然而，重複行，我無法執行我的代碼的下一部分，即代替B中指定的值EGIN作爲ERRORVALUES（10 = 1，11 = 2，12 = 3，30 = 4，31 = 5，32 = 6）與包含該值的實際列。實質上，我想用我的ERRORVALUE替換該行中的值。

如果有人能幫助我，我會非常感激。

來源

2012-03-14 user1269741

一個挑戰是，你不能只比較一行和前一行，因爲身份證號碼會不同。

awk ' 
    BEGIN { 
    ERRORVALUE[10] = 1 
    # ... etc 
    } 

    # print the header 
    NR == 1 {print; next} 

    NR == 2 || $0 !~ prev_regex { 
    prev_regex = sprintf("^\\s+\\w+\\s+%s\\s+%s\\s+%s\\s+%s\\s+%s",$2,$3,$4,$5,$6) 
    if (was90) $2 = ERRORVALUE[$2] 
    print 
    was90 = ($2 == 90) 
    } 
'

對於將第2列被改變線路，這破壞了行格式：

#  Type Response  Acc  RT  Offset 
    1  70 0 0 0.0000 57850 
    2  31 0 0 0.0000 59371 
    3  41 0 0 0.0000 60909 
    4  70 0 0 0.0000 61478 
    5  31 0 0 0.0000 62999 
    6  41 0 0 0.0000 64537 
    8  70 0 0 0.0000 65106 
    9  11 0 0 0.0000 66627 
    10  21 0 0 0.0000 68165 
    11  90 0 0 0.0000 68700 
12 5 0 0 0.0000 70221

如果這是一個問題，你可以管GAWK的輸出爲column -t，或者如果你知道行格式是固定的，在awk程序中使用printf（）。

來源

2012-03-14 19:47:36

首先：非常感謝你的回答就已經非常有幫助。此外，謝謝你這樣快速的答覆。第二：我有一個擔心的是，如果可能的情況是，在我看到$ 2的90美元后，我可以用線替代之前的$ 2兩行中的什麼？在這個例子中，第11行的$ 2中有90個是可以將第9行中的$ 2更改爲BEGIN中描述的格式，如果是的話，我該如何去做這件事？ – user1269741 2012-03-14 20:40:49

我可能需要2遍以上的文件：'awk'刪除重複的行'| tac | awk'如果之前的值2行是90'|，則替換$ 2 tac' - tac是從最後一行打印文件到第一行的方便工具。否則，awk腳本會變得有點混亂，因爲現在必須記住前兩行，注意2行之前沒有被刪除，等等。 – 2012-03-14 20:53:06

這可能會爲你工作：

v=99999 
sed ':a;$!N;s/^\(\s*\S*\s*\)\(.*\)\s*\n.*\2/\1\2/;ta;s/^\(\s*\S*\s*\) 90 /\1'"$(printf "%5d" $v)"' /;P;D' file 
#  Type Response  Acc  RT  Offset  
    1  70 0 0 0.0000 57850 
    2  31 0 0 0.0000 59371 
    3  41 0 0 0.0000 60909 
    4  70 0 0 0.0000 61478 
    5  31 0 0 0.0000 62999 
    6  41 0 0 0.0000 64537 
    8  70 0 0 0.0000 65106 
    9  11 0 0 0.0000 66627 
    10  21 0 0 0.0000 68165 
    11 99999 0 0 0.0000 68700 
    12  31 0 0 0.0000 70221

來源

2012-03-14 21:23:56 potong

這可能會爲你工作：

awk 'BEGIN { 
     ERROROFFSET = 50; 
     ERRORVALUE[10] = 1; 
     ERRORVALUE[11] = 2; 
     ERRORVALUE[12] = 3; 
     ERRORVALUE[30] = 4; 
     ERRORVALUE[31] = 5; 
     ERRORVALUE[32] = 6; 
    } 
    NR == 1 { print ; next } 
    { if (a[$2 $6]) { next } else { a[$2 $6]++ } 
     if ($2 == 90) { print ; n++ ; next } 
     if (n>0) { $2 = ERRORVALUE[$2] ; n=0 } 
     printf("% 4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) 
    }' INPUTFILE

See it in action here at ideone.com。

IMO BEGIN塊很明顯。然後會發生以下情況：

的NR == 1行打印的第一行（並切換到下一行，也該規則只適用於第一行）
，如果我們已經對任何看到然後檢查如果是這樣，切換到下一行，否則將其標記爲在數組中看到（使用連接的列值作爲indecies，，但請注意，如果您的值較大，這可能會失敗在第二個和第六個小（例如2 0020級聯是20020，它是相同的20 020），所以你可能想要在索引中添加一個列分隔符，如a[$2 "-" $6] ...並且您可以使用更多的列來更正確地檢查）
如果該行在第二列上有90，則打印它，在下一行上交換標誌，然後切換到下一行（在輸入文件中）
在下一行檢查ERRORVALUE中的第二列，如果找到，則替換其內容。
然後打印格式化的行。

來源

2012-03-14 22:08:33

我同意格倫兩次通過文件更好。您可以移除重複的，也許是不連續的，使用哈希像這樣的臺詞：

awk '!a[$2,$3,$4,$5,$6]++' file.txt

根據需要，您應該然後編輯你的價值觀。如果您希望在第二列更改值90到5000，嘗試這樣的事情：

awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }' file.txt

你可以看到，我偷了Zsolt的printf語句（感謝Zsolt的！）的格式，但你可以如有必要編輯此。也可通過管道從第一條語句輸出到第二一個不錯的一行：

cat file.txt | awk '!a[$2,$3,$4,$5,$6]++' | awk 'NR == 1 { print; next } { sub(/^90$/, "5000", $2); printf("%4i% 8i% 3i% 5i% 9.4f% 6i\n", $1, $2, $3, $4, $5, $6) }'

來源

2012-03-14 23:33:15 Steve

上述選項的大部分工作，但是這裏是我會做的方式，簡單而甜美。在回顧其他帖子後，我認爲這將是最有效的。另外，這也允許在註釋中添加OP的額外請求使90之後的行取代2行之前的變量。這一切都在一次通過。

BEGIN { 
    PC2=PC6=1337 
    replacement=5 
} 
{ 
    if($6 == PC6) next 
    if(PC2 == 90) $2 = replacement 
    replacement = PC2 
    PC2 = $2 
    PC6 = $6 
    printf "%4s%8s%3s%5s%9s%6s\n",$1, $2, $3, $4, $5, $6 
}

例輸入

1  70 0 0 0.0000 57850 
    2  31 0 0 0.0000 59371 
    3  41 0 0 0.0000 60909 
    4  70 0 0 0.0000 61478 
    5  31 0 0 0.0000 62999 
    6  41 0 0 0.0000 64537 
    7  41 0 0 0.0000 64537 
    8  70 0 0 0.0000 65106 
    9  11 0 0 0.0000 66627 
    10  21 0 0 0.0000 68165 
    11  90 0 0 0.0000 68700 
    12  31 0 0 0.0000 70221

示例輸出

1  70 0 0 0.000000 57850 
    2  31 0 0 0.000000 59371 
    3  41 0 0 0.000000 60909 
    4  70 0 0 0.000000 61478 
    5  31 0 0 0.000000 62999 
    6  41 0 0 0.000000 64537 
    8  70 0 0 0.000000 65106 
    9  11 0 0 0.000000 66627 
    10  21 0 0 0.000000 68165 
    11  90 0 0 0.000000 68700 
    12  21 0 0 0.000000 70221

來源

2012-03-23 12:44:16

查找特定的列，並用gawk替換具有特定值的以下列

回答

相關問題