比較兩個或多個文件，並只打印第一個文件中包含第二個文件中不存在的單詞的那些行

我有以下問題，我試圖在bash/sed/awk中解決（有用的單行腳本）。比較兩個或多個文件，並只打印第一個文件中包含第二個文件中不存在的單詞的那些行

比較兩個或多個文件並僅打印第一個文件中包含的單詞（模式）不在第二個文件中的相同名稱中的行，並保留它們出現的順序並忽略區分大小寫。（天哪，聲音如此複雜和愚蠢......我不知道如何用另一個詞來表達）。

我有兩個不同的文件（文件1，文件2）contaning的信息列表如下所示：

文件1

Agents In The Court/No Love For The Empire 
Mercenary Armor 
Solo Han WB 
Obi-Wan's Journal 
Obi-Wan's Lightsaber 
No Questions Asked 
Do, or do Not 
Strike Blocked

文件2

Agents In The Court/No Love For The Empire BB -> (LiGHT SIDE -- Special Cards)  
Mercenary Armor BB -> (LiGHT SIDE -- Device) 
Obi-Wan's Journal BB -> (LiGHT SIDE -- Device) 
No Questions Asked BB -> (LiGHT SIDE -- Special Cards) 
Do, Or Do Not BB -> (LiGHT SIDE -- Defensive Shield) 
Strike Planning BB -> (LiGHT SIDE -- Effect) 
Alter (Obi-Wan) WB -> (LiGHT SIDE -- Used Interrupt) 
Solo Han BB -> (LiGHT SIDE -- Human and Human-Like Characters) 
Combined Attack BB -> (LiGHT SIDE -- Lost Interrupt)

結果應該是這樣的：

Solo Han WB 
Obi-Wan's Lightsaber 
Strike Blocked

我會很感激任何幫助（完整的解決方案，提示，鏈接到類似的問題等）。

來源

2013-11-04 Virtual_Lotos

只要文件2不是太大，任何一項都將在bash工作：

while read x; do if [[ -z "$(grep -Fi "$x" file2)" ]]; then echo "$x"; fi; done < file1 

cat file1 | while read x; do if [[ -z "$(grep -Fi "$x" file2)" ]]; then echo "$x"; fi; done

總體而言，讀取文件1和grep各行它在文件2，打印線只有在沒有匹配找到。

更詳細地說，while read x; do ...; done < file1從file1一次一行地讀取一行到變量x中。 "$(grep -Fi "$x" file2)"在文件2中搜索包含$x內容的行，當未找到匹配項時計算空字符串。 -F標誌告訴grep搜索固定字符串，因此它不會將$x的內容視爲正則表達式。 -i標誌表示在搜索時忽略大小寫。如果其字符串參數爲空（即，grep未找到匹配項），則測試-z的計算結果爲true。

來源

2013-11-05 03:00:37 traybold

我不認爲存在一個單線程;你可能會有一些臨時文件。主意：

# just some boilerplate for handling temp files 
t=`mktemp -d -t sort.XXXXXX` 
trap "rm -rf $t" EXIT 

# add two columns: file-id + line and sort by 3rd field (real data) 
nl -ba -nln < file1 | sed -e 's/^/1 /' | sort -k3 >$t/file1 
nl -ba -nln < file2 | sed -e 's/^/1 /' | sort -k3 >$t/file2 

# get unique lines, filter these from file1, sort by line and give out data 
uniq -f 2 $t/file1 $t/file2 | sort -n | cut -d ' ' -f 3-

（未經測試;可能需要對字段分隔符進行一些修復）。

上面的腳本需要比sed + gawk更多的工具，但應該在最近的GNU系統上工作。

來源

2013-11-04 23:26:33 ensc

東西是簡單的，應滿足您的需求是建立在差異

diff file1 file2

會打印出所有可在兩個文件之間不同

來源

2013-11-05 21:12:32 Jonnyboy

您也可以嘗試的線路：

awk -f print.awk file1

其中print.awk是

BEGIN { 
    while (getline < "file2") 
     line[i++]=toupper($0) 
} 

{ 
    for (j=0; j<i; j++) { 
     if (index(line[j],toupper($0))) { 
      f=1; break 
     } 
    } 
    if (!f) print 
    f=0 
}

來源

2013-11-05 23:41:27

比較兩個或多個文件，並只打印第一個文件中包含第二個文件中不存在的單詞的那些行

回答

相關問題