在閱讀python文件後返回單詞列表

我有一個文本文件，名爲test.txt。我想閱讀它並從文件中返回所有單詞的列表（刪除了換行符）。在閱讀python文件後返回單詞列表

這是我當前的代碼：

def read_words(test.txt): 
    open_file = open(words_file, 'r') 
    words_list =[] 
    contents = open_file.readlines() 
    for i in range(len(contents)): 
     words_list.append(contents[i].strip('\n')) 
    return words_list  
    open_file.close()

運行該代碼會產生這個名單：

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot']

我想要列表如下所示：

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot']

來源

2012-11-06 mzn.rft

http://docs.python.org/2/library/stdtypes.html#str。split – kreativitea

更換words_list.append(...)在for循環中輸入以下內容：

words_list.extend(contents[i].split())

這會將每行分割爲空白字符，然後將結果列表的每個元素添加到words_list。

或作爲重寫整個功能列表理解的替代方法：

def read_words(words_file): 
    return [word for line in open(words_file, 'r') for word in line.split()]

來源

2012-11-06 21:00:26

謝謝F.J，這很有用 –

這裏是我會寫：

def read_words(words_file): 
    with open(words_file, 'r') as f: 
    ret = [] 
    for line in f: 
     ret += line.split() 
    return ret 

print read_words('test.txt')

的功能可以通過使用有所縮短itertools，但我個人發現結果較不可讀：

import itertools 

def read_words(words_file): 
    with open(words_file, 'r') as f: 
    return list(itertools.chain.from_iterable(line.split() for line in f)) 

print read_words('test.txt')

關於第二個版本的好處是它可以完全基於生成器，因此可以避免一次將所有文件的單詞保存在內存中。

來源

2012-11-06 21:06:47 NPE

根據文件的大小，這似乎將是一樣簡單：

with open(file) as f: 
    words = f.read().split()

來源

2012-11-06 21:21:12 mgilson

+1，因爲它是consise和重點。 –

有幾種方法可以做到這一點。這裏有幾個：

如果你不在乎重複的單詞：

def getWords(filepath): 
    with open('filepath') as f: 
     return list(itertools.chain(line.split() for line in f))

如果你想返回單詞列表，其中每個字只出現一次：

注：這不保留的單詞的順序

def getWords(filepath): 
    with open('filepath') as f: 
     return {word for word in line.split() for line in f} # python2.7 
     return set((word for word in line.split() for line in f)) # python 2.6

如果你想一套--and--想保留的話順序：如果你想有一個詞頻詞典

def getWords(filepath): 
    with open('filepath') as f: 
     words = [] 
     pos = {} 
     position = itertools.count() 
     for line in f: 
      for word in line.split(): 
       if word not in pos: 
        pos[word] = position.next() 
         words.append(word) 
    return sorted(words, key=pos.__getitem__)

：

def getWords(filepath): with open('filepath') as f: return collections.Counter(itertools.chain(line.split() for line in file))

希望這些幫助

來源

2012-11-06 21:34:08 inspectorG4dget

在閱讀python文件後返回單詞列表

回答

相關問題