打開文件並將其保存在內存

我想實現一個Python模塊的一個方法，它首先加載一個大文件，之後應用的篩選到的參數，就像這樣：打開文件並將其保存在內存

def filter(word_list): 
    filtered_words = [] 
    special_words = [line.strip() for line in open('special_words.txt', 'r')] 
    for w in word_list: 
     if not w in special_words 
      filtered_words.append(w) 
    return filtered_words

的問題是，我只想爲孔執行加載一次該文件，而不是每次調用此方法。在Java中，我可以爲此使用靜態塊，但是我在Python中有哪些選項？

來源

2013-06-18 Le_Coeur

''filter''是一個非常糟糕的函數名稱。已經有一個這樣的名稱內置。 –

你想要預先構建單詞集，以便每次調用函數時都不會讀取文件。另外，你可以通過列表理解簡化你的過濾功能：

with open('special_words.txt', 'r') as handle: 
    special_words = {line.strip() for line in handle} 

def filter(word_list): 
    return [word for word in word_list if word not in special_words]

來源

2013-06-18 15:52:09 Blender

所以第一行只執行一次？還有更多的事情，在這種情況下如何關閉文件？ –

@Le_Coeur：是的。 'with'塊創建一個上下文管理器，一旦你離開它就自動關閉文件，所以你不需要顯式調用'handle.close（）'。 – Blender

太好了，謝謝，這正是我一直在尋找的。 –

您可以將文件加載到模塊全局範圍的列表中;此代碼只會在第一次導入模塊時運行一次。

來源

2013-06-18 15:47:21 geoffspear

對我來說，這聽起來像你想memoized功能，這樣，當你與已知的參數來調用它，它會返回已知的反應，而不是重做的......這個特定的實現來自http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize

雖然它可能是輕微的矯枉過正對於這個問題memoize的是要知道

import collections 
import functools 

class memoized(object): 
    '''Decorator. Caches a function's return value each time it is called. 
    If called later with the same arguments, the cached value is returned 
    (not reevaluated). 
    ''' 
    def __init__(self, func): 
     self.func = func 
     self.cache = {} 
    def __call__(self, *args): 
     if not isinstance(args, collections.Hashable): 
     # uncacheable. a list, for instance. 
     # better to not cache than blow up. 
     return self.func(*args) 
     if args in self.cache: 
     return self.cache[args] 
     else: 
     value = self.func(*args) 
     self.cache[args] = value 
     return value 
    def __repr__(self): 
     '''Return the function's docstring.''' 
     return self.func.__doc__ 
    def __get__(self, obj, objtype): 
     '''Support instance methods.''' 
     return functools.partial(self.__call__, obj) 

@memoized 
def get_words(fname): 
    return list(open(fname, 'r')) 

@memoized 
def filter(word_list): 
    filtered_words = [] 
    special_words = [line.strip() for line in get_words("special_words.txt")] 
    for w in word_list: 
     if not w in special_words 
      filtered_words.append(w) 
    return filtered_words

在一個側面說明一個絕招是

print set(word_list).difference(special_words)

一個非常有用的模式0

它應該快得多（假設你不關心丟失的重複項）

來源

2013-06-18 15:47:41

這將記憶特定輸入參數的函數結果，但不會阻止用不同參數調用時再次讀取文件。 –

好了，修正了...現在我記起了fname的開頭也 –

你不得不做'set（special_words） - set（word_list）'，因爲OP想要不在特殊字詞集中的單詞。 – Blender

打開文件並將其保存在內存

回答

相關問題