我Inputfile中是SAMPLE.TXT包含以下內容
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap
into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets con
的stopWords.txt中是
Lorem
simply
book
printing
的代碼是:
import re
fp1 = open('stopwords.txt','r')
lisOfStopWords = fp1.readlines()
fp1.close()
def passstopwords(lisOfStopWords):
stopwords = "|".join([x.strip() for x in lisOfStopWords])
print("Stopwords:" + stopwords)
fp = open('SAMPLE.TXT', 'r')
stopWordPattern = r"%(stopwords)s" % {'stopwords' : stopwords}
for line in fp.readlines():
print("ORIGINAL:" + line.strip())
line = re.sub(stopWordPattern, r'', line)
print("REPLACED:"+ line)
fp.close()
return;
passstopwords(lisOfStopWords)
輸出是:
Stopwords:Lorem|simply|book|printing
ORIGINAL:Lorem Ipsum is simply dummy text of the printing and typesetting industry.
REPLACED: Ipsum is dummy text of the and typesetting industry.
ORIGINAL:Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
REPLACED: Ipsum has been the industry's standard dummy text ever since the 1500s,
ORIGINAL:when an unknown printer took a galley of type and scrambled it to make a type
REPLACED:when an unknown printer took a galley of type and scrambled it to make a type
ORIGINAL:specimen book. It has survived not only five centuries, but also the leap
REPLACED:specimen . It has survived not only five centuries, but also the leap
ORIGINAL:into electronic typesetting, remaining essentially unchanged.
REPLACED:into electronic typesetting, remaining essentially unchanged.
ORIGINAL:It was popularised in the 1960s with the release of Letraset sheets con
REPLACED:It was popularised in the 1960s with the release of Letraset sheets con
正如你看到的Lorem
或simply
或book
或printing
將被替換。
請更正該函數的縮進。 – Kasramvd
是的,請。給定的腳本不會運行,因爲它沒有正確縮進。另外,你用什麼輸入? –
請使用更好的參數名稱。這會讓你的代碼更容易理解。 'getstopswords'感覺像一個函數,我會打電話來獲取停用詞。如果其模式使用'stop_word_pattern'。 –