2012-11-06 27 views
7

我有一個文本文件,名爲test.txt。我想閱讀它並從文件中返回所有單詞的列表(刪除了換行符)。在閱讀python文件後返回單詞列表

這是我當前的代碼:

def read_words(test.txt): 
    open_file = open(words_file, 'r') 
    words_list =[] 
    contents = open_file.readlines() 
    for i in range(len(contents)): 
     words_list.append(contents[i].strip('\n')) 
    return words_list  
    open_file.close() 

運行該代碼會產生這個名單:

['hello there how is everything ', 'thank you all', 'again', 'thanks a lot'] 

我想要列表如下所示:

['hello','there','how','is','everything','thank','you','all','again','thanks','a','lot'] 
+1

http://docs.python.org/2/library/stdtypes.html#str。split – kreativitea

回答

13

更換words_list.append(...)在for循環中輸入以下內容:

words_list.extend(contents[i].split()) 

這會將每行分割爲空白字符,然後將結果列表的每個元素添加到words_list

或作爲重寫整個功能列表理解的替代方法:

def read_words(words_file): 
    return [word for line in open(words_file, 'r') for word in line.split()] 
+0

謝謝F.J,這很有用 –

5

這裏是我會寫:

def read_words(words_file): 
    with open(words_file, 'r') as f: 
    ret = [] 
    for line in f: 
     ret += line.split() 
    return ret 

print read_words('test.txt') 

的功能可以通過使用有所縮短itertools,但我個人發現結果較不可讀:

import itertools 

def read_words(words_file): 
    with open(words_file, 'r') as f: 
    return list(itertools.chain.from_iterable(line.split() for line in f)) 

print read_words('test.txt') 

關於第二個版本的好處是它可以完全基於生成器,因此可以避免一次將所有文件的單詞保存在內存中。

17

根據文件的大小,這似乎將是一樣簡單:

with open(file) as f: 
    words = f.read().split() 
+1

+1,因爲它是consise和重點。 –

3

有幾種方法可以做到這一點。這裏有幾個:

如果你不在乎重複的單詞

def getWords(filepath): 
    with open('filepath') as f: 
     return list(itertools.chain(line.split() for line in f)) 

如果你想返回單詞列表,其中每個字只出現一次

注:這不保留的單詞的順序

def getWords(filepath): 
    with open('filepath') as f: 
     return {word for word in line.split() for line in f} # python2.7 
     return set((word for word in line.split() for line in f)) # python 2.6 

如果你想一套--and--想保留的話順序:如果你想有一個詞頻詞典

def getWords(filepath): 
    with open('filepath') as f: 
     words = [] 
     pos = {} 
     position = itertools.count() 
     for line in f: 
      for word in line.split(): 
       if word not in pos: 
        pos[word] = position.next() 
         words.append(word) 
    return sorted(words, key=pos.__getitem__) 

def getWords(filepath): 
    with open('filepath') as f: 
     return collections.Counter(itertools.chain(line.split() for line in file)) 

希望這些幫助