2013-12-08 72 views
0

我的程序必須讀取有許多行的文本文件。然後它將 相同的文本複製到輸出文件,除了所有無用的單詞(如「the」,「a」和「an」)被刪除。問題是什麼?Python的讀取/寫入

​​
+0

' 「A.TXT」'將有初步+由於您沒有清除文件,因此附加的行。不知道這是否重要。此外,你能否告訴我們問題的**症狀是什麼**,即發生了什麼事情而不是你想要發生的事情? –

+1

您有文件中所有行的列表。您正在遍歷列表,檢查一行是否在stopList中,其中包含三個單詞'the','a','an'。這裏有什麼不對嗎? – aste123

回答

0

這裏亞去,只需使用str.replace

with open("a.txt","r") as fin, open("b.txt","w") as fout: 
    stopList=['the','a','an'] 
    for line in fin: 
     for useless in stopList: 
      line = line.replace(useless+' ', '') 
     fout.write(line) 

如果你不想保存整個文件到內存中,你需要到別的地方寫的結果。但是,如果你不介意的話,你可以把它改寫:

with open("a.txt","r") as fin, open("a.txt","w") as fout: 
    stopList=['the','a','an'] 
    r = [] 
    for line in fin: 
     for useless in stopList: 
      line = line.replace(useless+' ', '') 
     r.append(line) 
    fout.writelines(r) 

演示:

>>> line = 'the a, the b, the c' 
>>> stopList=['the','a','an'] 
>>> for useless in stopList: 
    line = line.replace(useless+' ', '') 


>>> line 
'a, b, c' 
+0

@alKid它複製三次一個元素 – Chingy

+0

「三次元素」是什麼意思? – aIKid

+0

@alKid例如它會寫「ABC」三次,比如「ABC ABC ABC」 – Chingy

0

使用regular expression

import re 

with open('a.txt') as f, open('b.txt','w') as out: 
    stopList = ['the', 'a', 'an'] 
    pattern = '|'.join(r'\b{}\s+'.format(re.escape(word)) for word in stopList) 
    pattern = re.compile(pattern, flags=re.I) 
    out.writelines(pattern.sub('', line) for line in f) 

# import shutil 
# shutil.move('b.txt', 'a.txt')