2013-08-20 30 views
2

我有一個包含許多文本行的txt文件(myText.txt)。刪除一些單詞替換txt文件中的其他單詞

我想知道:

  • 如何創建需要刪除(我想成立的話我自己)
  • 如何創建單詞列表單詞的列表需要更換

舉例來說,如果myText.txt是:

The ancient Romans influenced countries and civilizations in the following centuries. 
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month. 
  • 我想刪除「的」「和」「中的」我想更換 「古」,由「老」
  • 我想通過「年更換「月」和「百年」 「

回答

3

你總是可以使用正則表達式:

import re 

st='''\ 
The ancient Romans influenced countries and civilizations in the following centuries. 
Their language, Latin, became the basis for many other European languages. They stayed in Roma for 3 month.''' 

deletions=('and','in','the') 
repl={"ancient": "old", "month":"years", "centuries":"years"} 

tgt='|'.join(r'\b{}\b'.format(e) for e in deletions) 
st=re.sub(tgt,'',st) 
for word in repl: 
    tgt=r'\b{}\b'.format(word) 
    st=re.sub(tgt,repl[word],st) 


print st 
+0

你好,是非常好的工作。有時候,我在文中會提到「+」和「 - 」符號。然而,似乎Python不接受刪除=('和','in','','+',' - ')是否有一種特殊的方式來輸入這些字符?謝謝 – S12000

+0

有一些字符對像'+'和'-'這樣的正則表達式有意義我的建議是花一些時間在正則表達式教程網站上學習這些字符。 [Regex101](http://www.regex101.com)是一個不錯的選擇。 – dawg

2

這應該可以做到。您可以使用列表來存儲要刪除的對象,然後遍歷列表並從內容字符串中刪除列表中的每個元素。然後,您使用字典來存儲您現在擁有的單詞以及要替換它們的單詞。你也循環這些,用替換的替換當前的單詞。

def replace(): 
    contents = "" 
    deleteWords = ["the ", "and ", "in "] 
    replaceWords = {"ancient": "old", "month":"years", "centuries":"years"} 

    with open("meText.txt") as f: 
    contents = f.read() 
    for word in deleteWords: 
    contents = contents.replace(word,"") 

    for key, value in replaceWords.iteritems(): 
    contents = contents.replace(key, value) 
    return contents 
+0

謝謝你的幫助。我剛剛收到一條錯誤消息「AttributeError:'dict'object has no attribute'iteritems'」我只是最新版本的Python。這是正常的嗎?謝謝。 – S12000

+0

如果你使用的是python 3,那麼請說replaceWords.items() –

+0

謝謝你的工作就像一個魅力 – S12000

2

使用列表刪除和字典進行更換。它應該是這個樣子:

def processTextFile(filename_in, filename_out, delWords, repWords): 


    with open(filename_in, "r") as sourcefile: 
     for line in sourcefile: 
      for item in delWords: 
       line = line.replace(item, "") 
      for key,value in repWords.items(): 
       line = line.replace(key,value) 

      with open(filename_out, "a") as outfile: 
       outfile.write(line) 



if __name__ == "__main__": 
    delWords = [] 
    repWords = {} 

    delWords.extend(["the ", "and ", "in "]) 
    repWords["ancient"] = "old" 
    repWords["month"] = "years" 
    repWords["centuries"] = "years" 

    processTextFile("myText.txt", "myOutText.txt", delWords, repWords) 

剛一說明,這是爲Python 3.3.2編寫的,爲什麼我使用的項目是()。如果使用Python 2.x,請使用iteritems(),因爲我認爲它更有效,尤其是對於大型文本文件。

+0

謝謝你的這段代碼。哇有很多方法來實現我的目標:) – S12000