在另一個文檔中搜索包含字符串的所有句子

我有一個包含200個單詞的文件，每個單詞放在一個新行中。我想在另一個文件中搜索所有這些單詞。我希望每個包含這些單詞之一的句子都被打印出來。現在，只有第一個單詞的匹配出現。之後，它停止。在另一個文檔中搜索包含字符串的所有句子

corpus = open('C:\\Users\\Lucas\\Desktop\\HAIT\\Scriptie\\Tweet-corpus\\Corpus.txt', 'r', encoding='utf8') 

with open('C:\\Users\\Lucas\\Desktop\\HAIT\\Scriptie\\Tweet-corpus\\MostCommon3.txt', 'r', encoding='utf8') as list: 
for line in list: 
    for a in corpus: 
     if line in a: 
      print(a)

來源

2013-10-04 Lucas1988

僅供參考，你應該避免重複使用內置插件功能，如'list'，因爲你的代碼中的變量。 – Moshe

'corpus'是一個文件對象。第一次通過外部循環時，內部for'in a corpus：'循環讀取整個文件。在外部循環的所有後續迭代中，「語料庫」仍處於文件結尾，因此內部循環從不進入其主體。這就是爲什麼只有第一個'線'有任何匹配的機會。例如，你可以將'corpus'讀入它的行列表（'.readlines（）'），然後遍歷該列表。 –

# Prepare the list of words 
word_file = open('wordfile', 'r', encoding='utf8') 
words = [word.strip() for word in word_file.readlines()] 
word_file.close() 

# Now examine each sentence: 
with open('sentencefile') as sentences: 
    for sentence in sentences: 
     found = False 
     for word in words: 
      if word in sentence: 
       found = True 
       break 
     if found: 
      print sentence

來源

2013-10-04 16:12:21 Moshe

在另一個文檔中搜索包含字符串的所有句子

回答

相關問題