if ... in - 不匹配時它必須是

我有一個單詞數據庫和一個帶有文本行的數據集。每次在單詞文件中出現的文本文件行中都有一個單詞時，我想要執行一個操作。我的代碼如下所示：if ... in - 不匹配時它必須是

import re 
f = open(r"words.txt") 
print len(flist) 
d = open(r"text.txt", "r") 
dlist = d.readlines() 

for line in flist: 
    lowline = line.lower() 
    for word in dlist: 
     lowword = word.lower() 
     if lowword in lowline: 
      *trick*

然而，這段代碼發現沒有比賽，altough還有多的話，是完全一樣的。對此有任何想法？

來源

2013-06-26 user2525375

你混淆了你的文件和變量嗎？ 'word'變量似乎是從'text.txt'文件中讀取的，而'line'是來自'words.txt'，這似乎表明您需要將它們交換。 – andersschuller

readlines在字符串的末尾用換行符返回行。你不會在''貓裏面'找到''cat \ n''我的貓是黑色的\ n'' – jterrace

'flist'是怎麼製作的 – cmd

首先將數據庫中的單詞保存爲set，然後將str.strip和str.lower應用於它們。 str.strip將刪除前導和尾隨空白字符，如'\n'等。

設置提供O(1)查找，並且設置交點將比您當前的O(n^2)方法更有效率。

然後迭代單詞文件中的每一行，並首先應用str.strip和str.lower，然後再在集合中搜索它。

with open(r"words.txt") as f1, open(r"text.txt", "r") as f2: 

    dlist = set(line.strip().lower() for line in f2) #set of words from database 
    for line in f1: 
     line = line.strip().lower()  #use strip to remove '\n' 
     words = set(line.split()) #use split to get the words from the line 
            #and convert it into a set 
     common_words = words & dlist #use set intersection to find common words 
     for word in common_words: 
      *trick*

請更換f1和f2適當來我很困惑哪一個數據庫，其中一個是文本數據集。

來源

2013-06-26 18:50:12

if ... in - 不匹配時它必須是

回答

相關問題