0
我正在嘗試做以下操作:從頁面中獲取一些信息,然後將其插入到mongodb中。有一個頁面列表,我想要多處理,因爲這些頁面可能需要一段時間才能加載。一旦webdriver返回我想要插入數據庫的結果。我面臨的問題是,我只能得到我在數據庫中預期結果的1/4,所以我想象我管理結果的方式,並且插入不起作用。我希望有人能告訴我我出錯的地方。以下是代碼示例:多處理python pymongo
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
from selenium import webdriver
import timeit
from pymongo import MongoClient
def mp_worker(urls):
driver = webdriver.Chrome(chromedriver,
chrome_options=options)
url = "http://website"+urls
driver.get(url)
return what_you_want
driver.quit() #do I do this here, close or quit?
def mp_handler():
urls= ["14360705","4584061","13788961","6877217","13194596","13400479","9868014","8524704","16394198","16315464"]
client = MongoClient()
db = client.test
collection = db['test-collection']
p = Pool(cpu_count()*2)
for result in p.imap(mp_worker, urls):
db.restaurants.update(result,{"upsert":"True"})
if __name__=='__main__':
start = timeit.default_timer()
mp_handler()
stop = timeit.default_timer()
print (stop - start)
你是否檢查過你所抓取的所有頁面都返回數據?可能只有1/4的人確實會返回任何東西。 – elena
是的,如果我按順序循環使用for循環,我會得到完整的結果 – FancyDolphin