2017-07-29 36 views
0

如何從字符串中刪除重複的行,然後打印刪除的行數量?計數刪除行

我得到這個:

import os 


sentence = """Sentence1 
Sentence1 
Sentence2 
Sentence3 
Sentence4 
Sentence4""" 


spaces = sentence.replace(" ", "\n") #Makes one word per line 
lines = os.linesep.join([s for s in spaces.splitlines() if s]) #Removes empty lines 
duplicate = "\n".join(set(lines.split('\n'))) #Removes duplicate lines 


numberlines = len(duplicate.split('\n')) #Counts lines 



print(duplicate) 
print'Lines:', numberlines 

這樣,輸出是:

Sentence4 
Sentence1 
Sentence2 
Sentence3 
Lines: 4 

我怎樣才能達到這個輸出:

Sentence4 
Sentence1 
Sentence2 
Sentence3 
Lines: 4 
Removed Lines: 2 

感謝:d

+0

你能算上之前的長度差' - after'? – fredtantini

+0

現在正在處理這個問題,或許實際上可以解決這個問題。雖然我是python的新手。 – CandyGum

回答

1

讓我們來分析你的代碼逐行:

spaces = sentence.replace(" ", "\n") #Makes one word per line 

到目前爲止,一切都很好。

lines = os.linesep.join([s for s in spaces.splitlines() if s]) #Removes empty lines 

OK,所以你刪除空行,但最好是離開的結果作爲一個列表,而不是一起把它粘成一個字符串,因爲...:

duplicate = "\n".join(set(lines.split('\n'))) #Removes duplicate lines 

...在這裏你再次分裂它,並再次將結果結合成一個字符串...

numberlines = len(duplicate.split('\n')) #Counts lines 

...只能再次分割它。一個更好的版本:

spaces = sentence.split()     # Makes one word per line 
lines = [s for s in spaces if s]   # Removes empty lines 
duplicate = set(lines)     # Removes duplicate lines 
numberlines = len(duplicate)    # Counts lines 
removed_lines = len(lines) - numberlines 
print '\n'.join(duplicate) 
print 'Lines:', numberlines 
print 'Removed:', removed_lines 
+0

謝謝先生,我會研究這一點,以確保我能理解所有的東西,我對python非常陌生,幾乎沒有任何課程,除了一兩次在線。此代碼完美工作。 – CandyGum

1

您可以使用set

Removed_lines = len(lines.split("\n")) - len(set(lines.split("\n"))) 
0
import os 



sentence = """Sentence1 
Sentence1 
Sentence2 
Sentence3 
Sentence4 
Sentence4""" 



spaces = sentence.replace(" ", "\n") 
lines = os.linesep.join([s for s in spaces.splitlines() if s]) 
duplicate = "\n".join(set(lines.split('\n'))) 

numberlinesprev = len(sentence.split('\n')) 
num1 = int(numberlinesprev) 

numberlines = len(duplicate.split('\n')) 
num2 = int(numberlines) 

sum = num1 - num2 



print(duplicate) 
print'Lines Removed:', sum 
print'Lines:', numberlines 

輸出:

Sentence4 
Sentence1 
Sentence2 
Sentence3 
Lines Removed: 2 
Lines: 4