如何計算文件內容差異？

我有兩個文件large_input和subset_input文件及其內容可能是如何計算文件內容差異？

large_input

1 
34 
65 
7643 
hello 
we

subset_input

65 
we 
hello 
34

在這種情況下sort命令是不是非常有幫助，否則sort | uniq在這兩個文件以下diff會非常有用

問題在這種scenarion其中數據不能（因爲它的內容）進行排序，最新最好的辦法，找出

large_input - subset_input這將是

1 
7643

2012-11-12 daydreamer

爲什麼不'排序| uniq'工作？我做了正是你所說的，並得到這個：'0a1 > 1 2a4 > 7643'作爲差異。也許你想嘗試'排序-g' – jman

爲什麼'排序'有幫助？它會按字典順序排序，但這不重要;如果你只是想設置差異，只要它是一致的，那麼確切的順序應該不重要。 –

'sort -g'做了這個把戲，謝謝@skjaidev。 – daydreamer

diff <(sort file1) <(sort file2) | sed '/^[0-9][0-9]*[acd][0-9]*/d;s/^[<>] //'

適用於我，

輸出：

1 
7643

有些炮彈不支持<(sort fileX)，所以你可能需要文件預先分類的文件就地像sort -o file1 file1; sort file -o file2 file2; ....

戰略經濟對話的表情在diff取出輸出。要查看它在做什麼，請首先完全刪除sed，一次加回1部分（用分號分隔）。

我希望這會有所幫助。

2012-11-12 19:56:26 shellter

這是美妙的@shellter – daydreamer

你可以使用SED產生sed腳本，沒有工作：

sed -e 's#^#/^#' -e 's#$#$/d#' subset_input > sed_script

然後將這種sed腳本您large_input很簡單：

sed -f sed_script large_input

如果你有bash的，它可以沒有臨時文件：

sed -f <(sed -e 's#^#/^#' -e 's#$#$/d#' subset_input) large_input

此解決方案僅適用於'rea的subset_input sonable'大小雖然。

2012-11-12 23:43:27 jfg956

這正是comm是爲製作：

comm -23 <(sort large_input) <(sort subset_input)

2012-11-13 00:50:35

回答