2015-10-27 55 views
0

我定義了此函數以在文本文件中查找關鍵字,但現在我想要獲取包含文件中關鍵詞之前和之後單詞的元組我不知道該怎麼做如何從文本文件中獲取特定單詞之前和之後的單詞

def findProperWords(paragraphs, excludedWords): 
key = [] # an empty list 
count = 0 
for paragraph in paragraphs: #calling of every paaragraph in the textfile 
    count += 1 #counting each paragraph 
    words = list(paragraph.split(' ')) 
      # spliting each paragraph into a list of words 
    for keys in words: 
     if len(keys) > 0: 
      if keys[0] == keys[0].upper(): 
         #checking for words that start with capital letters 
       if keys.lower() not in excludedWords: 
        key.append(keys) # creating a list of the key words 
         index = paragraph.find(keys) 
         # finding the position of each key word in the textile 
+0

只有一個小提示:有一個內置的方法來檢查大寫字母:'if keys [0] .isupper():'。並且'如果len(keys)> 0:'可以簡單地寫成'if key:' – VPfB

回答

0

試試這個,但請注意,它只會在段落中找到前後兩個單詞。如果您希望它在前面/後面的段落中找到結果,請考慮創建一個大單詞列表(如果內存限制允許),或者在迭代到新段落時更新前一個段落,以及在最後一次迭代時保存最後一個單詞供以後使用。

def findProperWords(paragraphs, excludedWords): 
key = [] # an empty list 
count = 0 
for paragraph in paragraphs: #calling of every paaragraph in the textfile 
    count += 1 #counting each paragraph 
    words = list(paragraph.split(' ')) 
      # spliting each paragraph into a list of words 
    for idx,keys in enumerate(words): 
     if len(keys) > 0: 
      if keys[0] == keys[0].upper(): 
         #checking for words that start with capital letters 
       if keys.lower() not in excludedWords: 
        key.append(keys) # creating a list of the key words 
        index = paragraph.find(keys) 
        # finding the position of each key word in the textile 
        if idx > 0: 
         word_before = words[idx-1] 
        if idx < len(words) -2: 
         word_after = words[idx+1] 
相關問題