如何忽略使用difflib.ndiff的行？

根據文檔，您可以提供一個linejunk函數來忽略certian行。但是，我無法讓它工作。這裏是一些討論的示例代碼：如何忽略使用difflib.ndiff的行？

from re import search 
from difflib import ndiff 
t1 = 'one 1\ntwo 2\nthree 3' 
t2 = 'one 1\ntwo 29\nthree 3' 
diff = ndiff(t1.splitlines(), t2.splitlines(), lambda x: search('2', x))

我的意圖是忽略第二行，差異將是一個不顯示任何差異的生成器。

感謝您的幫助。

來源

2010-01-08 behindalens

你的例子有一個問題：ndiff的前兩個參數應該是一個字符串列表;你有一個單獨的字符串，就像一個字符列表一樣對待。見the docs。使用例如t1 = 'one 1\ntwo 2\nthree 3'.splitlines()

但是，如以下示例所示，difflib.ndiff不會爲所有行調用linejunk函數。這是長期以來的行爲 - 用Python 2.2至2.6和3.1進行驗證。

示例腳本：從與Python 2.6運行

from difflib import ndiff 
t1 = 'one 1\ntwo 2\nthree 3'.splitlines() 
t2 = 'one 1\ntwo 29\nthree 3'.splitlines() 
def lj(line): 
    rval = '2' in line 
    print("lj: line=%r, rval=%s" % (line, rval)) 
    return rval 
d = list(ndiff(t1, t2 )); print("%d %r\n" % (1, d)) 
d = list(ndiff(t1, t2, lj)); print("%d %r\n" % (2, d)) 
d = list(ndiff(t2, t1, lj)); print("%d %r\n" % (3, d))

輸出：

1 [' one 1', '- two 2', '+ two 29', '?  +\n', ' three 3'] 

lj: line='one 1', rval=False 
lj: line='two 29', rval=True 
lj: line='three 3', rval=False 
2 [' one 1', '- two 2', '+ two 29', '?  +\n', ' three 3'] 

lj: line='one 1', rval=False 
lj: line='two 2', rval=True 
lj: line='three 3', rval=False 
3 [' one 1', '- two 29', '?  -\n', '+ two 2', ' three 3']

您可能希望報告這個bug。但請注意，文檔沒有明確說明「垃圾」行的含義。你期望什麼產出？

此外困惑：當使用默認linejunk功能是相同的不使用

4 [' one 1', '- ', '+ ', '+ #', '+ ', ' two 2'] 

5 [' one 1', '+ ', '- ', '- #', '- ', ' two 2'] 

6 [' one 1', '- ', '+ ', '+ #', '+ ', ' two 2'] 

7 [' one 1', '+ ', '- ', '- #', '- ', ' two 2']

換句話說結果：這些行添加到腳本：

t3 = 'one 1\n \ntwo 2\n'.splitlines() 
t4 = 'one 1\n\n#\n\ntwo 2\n'.splitlines() 
d = list(ndiff(t3, t4  )); print("%d %r\n" % (4, d)) 
d = list(ndiff(t4, t3  )); print("%d %r\n" % (5, d)) 
d = list(ndiff(t3, t4, None)); print("%d %r\n" % (6, d)) 
d = list(ndiff(t4, t3, None)); print("%d %r\n" % (7, d))

產生這個輸出在包含不同「垃圾」行（除初始散列之外的空白）的情況下，使用linejunk函數。

也許如果你能告訴我們你想要達到的目標，我們可能會建議一種替代方法。經過進一步的信息

編輯如果你的目的是在一般性忽略包含「2」，這意味着所有線路假裝它們不ndiff目的存在，所有你需要做的就是打開幌子成爲現實：

t1f = [line for line in t1 if '2' not in line] t2f = [line for line in t2 if '2' not in line] diff = ndiff(t1f, t2f)

來源

2010-01-08 23:21:16

我的意圖是忽略第二行，diff將是一個不會顯示任何差異的生成器。 – behindalens 2010-01-08 23:38:02

我最終這樣做了。我最初使用HtmlDiff函數，並且我想在創建html輸出時忽略（而不是刪除）某些行。因爲文件暗示兩者有關係，所以我被帶到了ndiff。我想知道linejunk函數實際上在這一點上做了什麼。 – behindalens 2010-01-11 21:38:13

@behindalens：我分享你的奇蹟。我可能會提交錯誤報告和/或文檔澄清請求。我甚至可能會閱讀源代碼:-) ...在此期間，您是否認爲您的問題已得到解答？ – 2010-01-11 21:58:23

我最近遇到了同樣的問題。

下面是我發現的：

cf. http://bugs.python.org/issue14332

的*垃圾參數的主要目的是加快匹配找到差異，不掩蓋分歧。

c.f. http://hg.python.org/cpython/rev/0a69b1e8b7fe/

的貼片提供了「垃圾」和更好的解釋在difflib文檔

這些垃圾過濾功能加快匹配來查找差異並且不會引起任何不同的線「忽略」的概念或字符被忽略。

來源

2014-07-05 16:59:45 caoanan

這應該是這個問題的接受答案。 – matthewatabet 2016-12-09 20:18:20

如何忽略使用difflib.ndiff的行？

回答

相關問題