生成和蟒蛇

應用的diff是否有蟒蛇的「外的即用」的方式來產生一個差異列表兩個文本之間，然後將這種差異，以一個文件來獲取其他的，以後呢？生成和蟒蛇

我想保持文本的修訂歷史記錄，但我不希望保存整個文本的每個版本，如果有隻是一個單一的編輯行。我看着difflib，但我看不到如何生成只是編輯的行的列表，仍然可以用來修改一個文本來獲取另一個文本。

來源

2010-02-21 noio

你看看從谷歌DIFF匹配補丁？大致google Docs使用這套算法。它不僅包含diff模塊，還包含修補程序模塊，因此您可以使用舊文件和差異文件生成最新的文件。

包含一個python版本。

http://code.google.com/p/google-diff-match-patch/

來源

2010-02-21 22:39:28

正是我在找的東西！我試着用google搜索「python」，「diff」，「patch」，「revision」的不同組合，但還沒有找到。 – noio 2010-02-22 13:08:36

谷歌差異匹配補丁似乎存儲整個文件。它將所有元素都保存在元組中：（0，'stuff'）表示'stuff'出現在兩個字符串中。該系統非常簡單，它存儲了字面上的每個字符，以便它可以遍歷它們並根據需要修改文本。 – Paragon 2012-05-05 00:54:05

我如何使用這個API與Python>？如果可以用例子 – qre0ct 2013-05-15 04:47:32

是否必須是python解決方案？
我對解決方案的第一個想法是使用版本控制系統（Subversion，Git等）或者對於unix系統標準的diff/patch實用程序，或者對於基於Windows的系統是cygwin的一部分。

來源

2010-02-21 21:18:06

它必須是純粹的python解決方案，因爲我想將它部署在AppEngine中。 'diff' /'patch'會很理想，但是在python中。 – noio 2010-02-22 11:24:48

請注意，這種計算速度通常較慢，因此可能會在較低級別上進行更深入的放大！ – Pithikos 2016-10-10 19:05:28

據我所知大多數差異算法使用簡單Longest Common Subsequence匹配，找到兩個文本，無論是離開被認爲是差的公共部分。編寫自己的動態編程算法來完成python中的代碼不應該太困難，上面的維基百科頁面也提供了該算法。

來源

2010-02-21 21:34:57 jai

是否difflib.unified_diff想得到你想要什麼？有一個例子here。

來源

2010-02-21 22:39:35 pwdyson

投票你的答案。內置的difflib看起來很強大，但有點令人困惑，只是學習曲線的問題。在這裏看到我的類似帖子：http://stackoverflow.com/questions/4743359/python-difflib-deltas-and-compare-ndiff/4743621#4743621 – NealWalters 2011-01-21 16:38:21

該庫沒有辦法應用'difflib.unified_diff'的輸出。它有'diff'，但沒有'patch'。因此，如果你想保持在python中，'difflib.unified_diff'是沒有用的。 – 2016-01-07 05:23:52

也許你可以使用unified_diff生成一個文件差異列表。只有文件中已更改的文本可以寫入新的文本文件，以便將來參考。這是幫助您僅將差異寫入新文件的代碼。我希望這是你要求的！

diff = difflib.unified_diff(old_file, new_file, lineterm='') 
    lines = list(diff)[2:] 
    # linesT = list(diff)[0:3] 
    print (lines[0]) 
    added = [lineA for lineA in lines if lineA[0] == '+'] 


    with open("output.txt", "w") as fh1: 
    for line in added: 
     fh1.write(line) 
    print '+',added 
    removed = [lineB for lineB in lines if lineB[0] == '-'] 
    with open("output.txt", "a") as fh1: 
    for line in removed: 
     fh1.write(line) 
    print '-',removed

在您的代碼中使用此選項可僅保存差異輸出！

來源

2016-03-09 11:13:15

我已經實現了一個純Python功能適用差異補丁，以恢復或者輸入字符串，我希望有人發現它是有用的。它使用分析Unified diff format。

import re 

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$") 

def apply_patch(s,patch,revert=False): 
    """ 
    Apply unified diff patch to string s to recover newer string. 
    If revert is True, treat s as the newer string, recover older string. 
    """ 
    s = s.splitlines(True) 
    p = patch.splitlines(True) 
    t = '' 
    i = sl = 0 
    (midx,sign) = (1,'+') if not revert else (3,'-') 
    while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines 
    while i < len(p): 
    m = _hdr_pat.match(p[i]) 
    if not m: raise Exception("Cannot process diff") 
    i += 1 
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0') 
    t += ''.join(s[sl:l]) 
    sl = l 
    while i < len(p) and p[i][0] != '@': 
     if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2 
     else: line = p[i]; i += 1 
     if len(line) > 0: 
     if line[0] == sign or line[0] == ' ': t += line[1:] 
     sl += (line[0] != sign) 
    t += ''.join(s[sl:]) 
    return t

如果有標題行("--- ...\n","+++ ...\n")它跳過它們。如果我們有一個統一的diff串diffstr代表oldstr和newstr之間的差異：使用difflib（標準庫的一部分）

# recreate `newstr` from `oldstr`+patch 
newstr = apply_patch(oldstr, diffstr) 
# recreate `oldstr` from `newstr`+patch 
oldstr = apply_patch(newstr, diffstr, True)

在Python可以生成兩個字符串的統一差異：

import difflib 
_no_eol = "\ No newline at end of file" 

def make_patch(a,b): 
    """ 
    Get unified string diff between two strings. Trims top two lines. 
    Returns empty string if strings are identical. 
    """ 
    diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0) 
    try: _,_ = next(diffs),next(diffs) 
    except StopIteration: pass 
    return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

在UNIX上：diff -U0 a.txt b.txt

代碼是在GitHub這裏用ASCII和隨機Unicode字符測試一起：https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

來源

2016-12-05 04:55:09

回答

相關問題