閱讀文件的下一個單詞在python

我要尋找一個在Python中的文件中的一些話。找到每個單詞後，我需要從文件中讀取下兩個單詞。我尋找了一些解決方案，但是我找不到接下來的單詞。閱讀文件的下一個單詞在python

# offsetFile - file pointer 
# searchTerms - list of words 

for line in offsetFile: 
    for word in searchTerms: 
     if word in line: 
      # here get the next two terms after the word

謝謝您的時間。

更新：只有首先出場的是必要的。實際上，在這種情況下，這個詞只有一個可能。

文件：

accept 42 2820 access 183 3145 accid 1 4589 algebra 153 16272 algem 4 17439 algol 202 6530

字： '訪問'， '代數']

搜索文件時，我遇到 '訪問' 和 '代數'，我需要的183 3145的價值觀和153 16272。

來源

2012-04-22 Quazi Farhan

你應該張貼的文件的外觀LIK爲例即 – Akavall 2012-04-22 01:33:34

爲你最後的評論，你的意思是你在行中找到的單詞後面的兩個單詞嗎？你能提供一些樣本輸入/輸出嗎？ – Levon 2012-04-22 01:35:02

一個簡單的方法來解決這個問題是使用一臺發電機，在從文件時間產生一個字來讀取文件。

def words(fileobj): 
    for line in fileobj: 
     for word in line.split(): 
      yield word

然後找到你感興趣的單詞和閱讀下一兩個詞：

with open("offsetfile.txt") as wordfile: 
    wordgen = words(wordfile) 
    for word in wordgen: 
     if word in searchterms: # searchterms should be a set() to make this fast 
      break 
    else: 
     word = None    # makes sure word is None if the word wasn't found 

    foundwords = [word, next(wordgen, None), next(wordgen, None)]

現在foundwords[0]是你找到的話，foundwords[1]是後話，和foundwords[2]是第二個字在它之後。如果沒有足夠的單詞，則列表中的一個或多個元素將是None。

這是一個有點複雜，如果你想力這僅在一行匹配，但通常你可以考慮文件只是一個單詞的順序離開。

來源

2012-04-22 01:37:27 kindall

我認爲這是正確的，但提問者應註明自己是否在尋找只有兩個字或多次出現的首次亮相 – 2012-04-22 01:38:59

是的，你需要一個額外的循環，以保持如果你想找到多次發生下去。這很容易添加。 – kindall 2012-04-22 01:40:08

感謝您的代碼。我做了一些細微的變化，其工作完美：行= line.split（」「） – 2012-04-22 02:37:48

如果您需要檢索只有兩首詞，只是做：

 
offsetFile.readline().split()[:2]

來源

2012-04-22 01:40:04 Stan

「後面的兩個詞[搜索詞]後面的」 – 2012-04-22 01:42:09

word = '3' #Your word 
delim = ',' #Your delim 

with open('test_file.txt') as f: 
    for line in f: 
     if word in line: 
      s_line = line.strip().split(delim) 
      two_words = (s_line[s_line.index(word) + 1],\ 
      s_line[s_line.index(word) + 2]) 
      break

來源

2012-04-22 01:47:42 Akavall

def searchTerm(offsetFile, searchTerms): 
      # remove any found words from this list; if empty we can exit 
      searchThese = searchTerms[:] 
      for line in offsetFile: 
        words_in_line = line.split() 
        # Use this list comprehension if always two numbers continue a word. 
        # Else use words_in_line. 
        for word in [w for i, w in enumerate(words_in_line) if i % 3 == 0]: 
          # No more words to search. 
          if not searchThese: 
            return 
          # Search remaining words. 
          if word in searchThese: 
            searchThese.remove(word) 
            i = words_in_line.index(word) 
            print words_in_line[i:i+3]

對於 '訪問'， '代數' 我得到這樣的結果：

[」訪問， '183'， '3145']
[ '代數'， '153'， '16272']

來源

2012-04-22 11:49:19

閱讀文件的下一個單詞在python

回答

相關問題