Python：用空格替換文件中的重複行，但不是第一次/最後一次出現

我有一個文件，它有重複行<this is repeated>，我想用空格""替換。但是，重複行的第一次出現或最後出現不需要被替換。我之前試過replace()，但是這個函數會替換文件中的所有字符串。有什麼方法可以編寫它來獲得預期的結果嗎？ PS：這是一個大的文本文件Python：用空格替換文件中的重複行，但不是第一次/最後一次出現

的文件是如下：
<this is repeated>
second line
another lines
third line
<this is repeated>
<this is repeated>

來源

2016-03-24 wanderergirl

閱讀線然後比較這些行......如果當前行與上一行不同，請將其寫入輸出文件，否則跳過它並寫入空白行......沒有更多內容...... –

會用空行替換它還是刪除整行？ –

注：我發帖後意識到，如果最後一次出現是沒有01的最後一行這種技術將會使它和下一次發生一樣。

首先，你需要遍歷文件，直到找到第一次出現：

file = <OPEN FILE> 
rep_line = "<this is repeated>\n" 

beginning = "" #record all data until found 
while True: #broken when rep_line is found in file (or end of file is reached) 
    line = file.readline() 
    if not line: 
     raise EOFError("reached end of file before finding first occurence") 
    beginning+=line 
    if line == rep_line: 
     break 

rest = file.read() #you can read the rest after iterating over a few lines

然後，你將有beginning其中包含的一切直至幷包括第一次出現，而rest

因此，您需要做的所有與rest是count如何可能發生時間，並取代所有，但最後一個：

reps = rest.count(rep_line) 

new_text = beginning + rest.replace(rep_line,"",reps - 1) 
               # ^don't replace the last one

但是這種直接的方法將拿起與文本（如"hello <this is repeated>"例如）結束，並且這可以通過也檢查到有一個\ n中的行之前右邊被固定線：

reps = rest.count("\n"+rep_line) 

new_text = beginning + rest.replace("\n"+rep_line,"\n",reps - 1) 
                #^replace with a single newline

來源

2016-03-24 16:29:51

如果您想用空行替換它，而不是完全刪除，只需將其替換爲'「\ n」'或最後一個例子'「\ n \ n」'離開空行。 –

我得到了這個錯誤「混合迭代和讀取方法會丟失數據」在線rest = file.read（）。你知道爲什麼嗎？ – wanderergirl

以及我假設因爲你不只是做'open（「my_file.txt」）''或者你有'從庫的導入*'有一個具有'open'函數的函數庫，無論我如何編輯答案使用'file.readline（）'來代替，因爲無論如何這可能會更好。 –

Python：用空格替換文件中的重複行，但不是第一次/最後一次出現

回答

相關問題