比較後，每第二條線，打印線，但刪除重複

-1

id-of-item 

description of item 

id-of-item 

description of item 

id-of-item 

description of item 

id-of-item 

description of item 

id-of-item 

description of item

（各只有一個間行，這裏只是大空間）

我需要比較項目的描述，如果他們匹配，刪除該說明，但保持身份證（我需要製作一個表，引用ID爲組）

我不知道如何做到這一點，我已經嘗試了幾個awk與NR％2和uniq等，但顯然都只匹配一個而不是其他=/

來源

2011-12-02 Kieran Wilson

可以包括實際的輸入格式，而不是下面的描述，包括預期的輸出？ –

我不太明白「每條線之間只有一條線」。空白行是記錄分隔符，描述可能跨越多行？空白行是無意義的，奇數行有ID，甚至行有單行描述？ –

這可能很接近。 AWK的規則是，無論你想殺死複製到數組的索引地說：

BEGIN {title = ""} 
NF == 0 { print; next;} 
title == "" { 
    title = $0; 
    print; next; 
} 
{ 
    if (value[$0] == "") print; 
    value[$0] = $0; 
    title = "" 
}

覺得關聯數組的力量。

來源

2011-12-02 08:19:33 MeaCulpa

，實際上看起來非常完美，謝謝你這麼多=] 我印象非常深刻;接下來;我不知道你可以這樣做=] –

這可以幫助你（？）：

# cat input.txt 
id-of-item0 
id-of-item0 description of item0 
id-of-item1 
id-of-item1 description of item1 
id-of-item0 
id-of-item0 description of item0 
id-of-item3 
id-of-item3 description of item3 
id-of-item4 
id-of-item4 description of item4 
# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!/\n/' 
id-of-item0 
id-of-item0 description of item0 
id-of-item1 
id-of-item1 description of item1 
id-of-item3 
id-of-item3 description of item3 
id-of-item4 
id-of-item4 description of item4

如果你想刪除的描述：

# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!.*//' 
id-of-item0 
id-of-item1 
id-of-item3 
id-of-item4

說明：

讀input.txt 2線一次更換的新行帶有分隔符的\n（這裏是!!!）。分類並刪除重複項。用換行符\n替換分隔符!!!。或完全刪除說明。

編輯：

這可能會爲你工作（？）：

sed '/^$/d' input_file | # remove empty lines 
sed -n 'h;n;G;s/\n/\t/p' | # join id with description and swap tab separating 
sort |      # sort descriptions 
sed ':a;N;s/^\(\([^\t]*\)\t[^\n]*\)\n\2/\1/;ta;P;D' | # build index tab separated 
sed 's/\t/\n/g'   # translate tabs to newlines

來源

2011-12-02 09:07:08 potong

如果我錯了，我錯了！但是那些聰明的評論可能會促使我朝正確的方向發展。 – potong

我同意，我們不應該在沒有任何評論的情況下投票。 –

我要提出兩個簡化的假設：

描述只是一個線長。
您可以識別沒有出現在說明或ID中的字符。我將使用這個角色的標籤。

這兩種假設都不是很強，所以如果需要的話，不應該很難適應以下幾點。

有了這些假設，我將生成printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\n"的樣本數據。它看起來像這樣：

1 

item 1 

2 

item 2 

3 

item 2 

4 

item 1

爲了處理這些數據，我會：

擺脫空行
加入連續行，以製表分離的ID和描述
按描述字段排序新行
將排序後的行格式化爲表格

這裏有一個管道，做它：

grep -v '^[[:space:]]*$' |   
    awk 'NR%2 { printf("%s\t", $0) } !(NR%2)' | 
    sort -k2 | 
     awk -F"\t" 'desc != $2 { printf("-----\n%s\n", $2); desc = $2} { print $1 }'

管過它的樣本數據，你會得到

----- 
item 1 
1 
4 
----- 
item 2 
2 
3

來源

2011-12-02 09:07:37

將這項工作？

awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n" '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'

文件：

[jaypal:~/Temp] cat file 
id-of-item31 

description of item4 <--- Duplicate description 

id-of-item22 

description of item4 <--- Duplicate description 

id-of-item34 

description of item1 <--- Duplicate description 

id-of-item21 

description of item3 

id-of-item11 

description of item1 <--- Duplicate description

執行：

[jaypal:~/Temp] awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n" '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}' 

id-of-item31 

description of item4 

id-of-item22 

id-of-item34 

description of item1 

id-of-item21 

description of item3 

id-of-item11

來源

2011-12-02 10:21:40

什麼是'file1'？ –

哦，我的壞！我已經將主文件轉換爲用於測試的臨時文件。我會更新上面的答案。 –

比較後，每第二條線，打印線，但刪除重複

回答

相關問題