2011-06-22 112 views
2

我對map_async有一個有趣的問題,我找不出來。將字符串列表傳遞給map_async()

我正在使用python的多進程庫進程池。我想傳遞一個字符串來比較的清單和字符串列表進行比較的使用功能map_async()

現在我有:

from multiprocessing import Pool, cpu_count 
import functools 

dictionary = /a/file/on/my/disk 
passin = /another/file/on/my/disk 

num_proc = cpu_count() 

dictionary = readFiletoList(fdict) 
dictionary = sortByLength(dictionary) 

words = readFiletoList(passin, 'WINDOWS-1252') 
words = sortByLength(words) 

result = pool.map_async(functools.partial(mpmine, dictionary=dictionary), [words], 1000) 

def readFiletoList(fname, fencode='utf-8'): 
    linelist = list() 
    with open(fname, encoding=fencode) as f: 
    for line in f: 
     linelist.append(line.strip()) 
    return linelist 


def sortByLength(words): 
    '''Takes an ordered iterable and sorts it based on word length''' 
    return sorted(words, key=len) 

def mpmine(word, dictionary): 
    '''Takes a tuple of length 2 with it's arguments. 

    At least dictionary needs to be sorted by word length. If not, whacky results ensue. 
    ''' 
    results = dict() 
    for pw in word: 
    pwlen = len(pw) 
    pwres = list() 
    for word in dictionary: 
     if len(word) > pwlen: 
     break 
     if word in pw: 
     pwres.append(word) 
    if len(pwres) > 0: 
     results[pw] = pwres 
    return results 



if __name__ == '__main__': 
    main() 

兩個字典及文字列表的字符串。這導致只使用一個進程而不是我設置的數量。如果我把方括號關掉變量的「單詞」,它似乎會依次遍歷每個字符串的字符並導致混亂。

我想要發生的事情是,需要1000個字符串出來,並將它們傳遞給工作進程,然後得到結果,因爲這是一個可以並行的荒謬任務。編輯:添加更多的代碼,使更清晰的事情。

+0

你應該給更多的代碼。就像現在我們必須增加很多才能重現問題。 –

回答

2

好吧,我其實自己弄明白了這一點。我只會在這裏發佈答案,以便其他任何可能會出現並具有相同問題的人。我遇到問題的原因是map_async從列表中取出一個項目(在本例中是一個字符串),並將它提供給期望列表字符串的函數。所以它然後將每個字符串視爲一個基本的字符列表。 mpmine的更正代碼是:

def mpmine(word, dictionary): 
    '''Takes a tuple of length 2 with it's arguments. 

    At least dictionary needs to be sorted by word length. If not, whacky results ensue. 
    ''' 
    results = dict() 
    pw = word 
    pwlen = len(pw) 
    pwres = list() 
    for word in dictionary: 
    if len(word) > pwlen: 
     break 
    if word in pw: 
     pwres.append(word) 
    if len(pwres) > 0: 
    results[pw] = pwres 
    return results 

我希望這可以幫助其他人面臨類似的問題。