2015-08-08 61 views
1

我試圖使用與BeautifulSoup多處理一起,但我遇到一個maximum recursion depth exceeded錯誤:多重BeautifulSoup bs4.element.Tag

def process_card(card): 
    result = card.find("p") 
    # Do some more parsing with beautifulsoup 

    return results 


pool = multiprocessing.Pool(processes=4) 
soup = BeautifulSoup(url, 'html.parser') 
cards = soup.findAll("li") 
for card in cards: 
    result = pool.apply_async(process_card, [card]) 
    article = result.get() 
    if article is not None: 
     print article 
     articles.append(article) 
pool.close() 
pool.join() 

據我所知,card<class bs4.element.Tag>型的,問題可能有與酸洗這個對象有關。目前尚不清楚如何修改我的代碼來解決這個問題。

+1

[最大遞歸錯誤Python]的可能重複(http://stackoverflow.com/questions/19529708/maximum-recursion-error-python)答案在這裏相關。此外,如果你因爲某些原因照做中的鏈接,另一種選擇是使用一個更好的序列化,像(我的代碼)'dill'這是在'multiprocess'(一個'multiprocessing'叉使用更好的系列化)。不知道它是否適用於'bs4'對象。 –

回答

2

它是在一個可以簡單地投card爲Unicode評論中指出。然而,這導致process_card功能與slice indices must be integers or None or have an __index__ method示數出來。事實證明,這種錯誤與事實,card不再是BS4對象,因此必須BS4功能用不上做。相反,card簡直是Unicode和錯誤是一個Unicode相關的錯誤。所以人們需要把card而爲湯,然後再從那裏繼續。這工作!

def process_card(unicode_card): 
    card = BeautifulSoup(unicode_card) 
    result = card.find("p") 
    # Do some more parsing with beautifulsoup 

    return results 


pool = multiprocessing.Pool(processes=4) 
soup = BeautifulSoup(url, 'html.parser') 
cards = soup.findAll("li") 
for card in cards: 
    result = pool.apply_async(process_card, [unicode(card)]) 
    article = result.get() 
    if article is not None: 
     print article 
     articles.append(article) 
pool.close() 
pool.join()