2017-09-27 38 views
3

從兩個多行字符串中獲得差異的最佳方式是什麼?Python - 獲取字符串之間的差異

a = 'testing this is working \n testing this is working 1 \n' 
b = 'testing this is working \n testing this is working 1 \n testing this is working 2' 

diff = difflib.ndiff(a,b) 
print ''.join(diff) 

這將產生:

t e s t i n g  t h i s  i s  w o r k i n g  
    t e s t i n g  t h i s  i s  w o r k i n g  1  
+ + t+ e+ s+ t+ i+ n+ g+ + t+ h+ i+ s+ + i+ s+ + w+ o+ r+ k+ i+ n+ g+ + 2 

什麼也正是得到的最好的方法:

testing this is working 2

將正則表達式在這裏的解決方案?

+3

'b.split(a)'。? –

+0

該死的@Chris_Rands。從來沒有想過這個!尼斯黑客。 –

+0

@Chris_Rands很好的黑客,但這不是一個高效的方式去做 –

回答

3
a = 'testing this is working \n testing this is working 1 \n' 
b = 'testing this is working \n testing this is working 1 \n testing this is working 2' 

splitA = set(a.split("\n")) 
splitB = set(b.split("\n")) 

diff = splitB.difference(splitA) 
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...' 

從根本上讓每串一組線,並採取了一系列的差異 - 即B的所有事情是不是A.然後把這個結果加入到一個字符串中。

編輯:這是說什麼@ShreyasG說的conveluded方式 - [X對於x如果x不以y] ...

0

大廈@Chris_Rands評論,你可以使用splitlines()操作太(如果你的字符串是多行,你想不存在之一,但其他行):

b_s = b.splitlines() 
a_s = a.splitlines() 
[x for x in b_s if x not in a_s] 

預期輸出是:

[' testing this is working 2'] 
2

最簡單的黑客,信貸@Chris,通過使用split()

注意:您需要確定哪一個是較長的字符串,並將其用於分割。

if len(a)>len(b): 
    res=''.join(a.split(b))    #get diff 
else: 
    res=''.join(b.split(a))    #get diff 

print(res.strip())      #remove whitespace on either sides 

#司機值

IN : a = 'testing this is working \n testing this is working 1 \n' 
IN : b = 'testing this is working \n testing this is working 1 \n testing this is working 2' 

OUT : testing this is working 2 

編輯:感謝@ekhumoro使用replace另一個黑客,而無需任何所需的join計算。

if len(a)>len(b): 
    res=a.replace(b,'')    #get diff 
else: 
    res=b.replace(a,'')    #get diff 
+1

'b.replace(a,'')'更簡單,更快速,更有意義。 – ekhumoro

+0

哈哈,喜歡這個。另一個好的黑客。我從未想過以這種方式使用'split'或'replace'。謝謝@ekhumoro !! –

0
import itertools as it 


"".join(y for x, y in it.zip_longest(a, b) if x != y) 
# ' testing this is working 2' 

或者

import collections as ct 


ca = ct.Counter(a.split("\n")) 
cb = ct.Counter(b.split("\n")) 

diff = cb - ca 
"".join(diff.keys()) 
2

這基本上是@ Godron629的答案,但因爲我不能發表評論,我在這裏稍作修改:將difference更改爲symmetric_difference,以便集合的順序無關緊要。

a = 'testing this is working \n testing this is working 1 \n' 
b = 'testing this is working \n testing this is working 1 \n testing this is working 2' 

splitA = set(a.split("\n")) 
splitB = set(b.split("\n")) 

diff = splitB.symmetric_difference(splitA) 
diff = ", ".join(diff) # ' testing this is working 2, some more things...' 
相關問題