2014-09-04 96 views
2

我正在尋找一種解決方案,在不等待答案的情況下製作大量異步Web請求。機械化+異步瀏覽器調用

這裏是我當前的代碼:

import mechanize 
from mechanize._opener import urlopen 
from mechanize._form import ParseResponse 
from multiprocessing import Pool 

brow = mechanize.Browser() 
brow.open('https://website.com') 

#Login 
brow.select_form(nr = 0) 

brow.form['username'] = 'user' 
brow.form['password'] = 'password' 
brow.submit() 

while(true): 
    #async open the browser until some state is fullfilled 
    brow.open('https://website.com/needthiswebsite') 

與上面的代碼的問題是,如果我儘量讓bro2必須等待bro1到結束,以啓動兩個瀏覽器的開口。 (其阻斷)溶液的

bro1.open('https://website.com/needthiswebsite') 
bro2.open('https://website.com/needthiswebsite') 

嘗試:

#PSUDO-CODE 

#GLOBAL VARIABLE STATE 
boolean state = true 

while(state): 
    #async open the browser until some state is full filled 
    #I spam this function until I get a positive answer from one of the calls 
    pool = Pool(processes = 1) 
    result = pool.apply_async(openWebsite,[brow1],callback = updateState) 

def openWebsite(browser): 
    result = browser.open('https://website.com/needthiswebsite') 
    if result.something() == WHATIWANT: 
     return true 
    return false 

def updateState(state): 
    state = true 

我想實現我的問題就像在回答類似的解決方案:在計算器 Asynchronous method call in Python?問題。

的問題,這是我在嘗試使用pool.apply_async(brow.open())

錯誤味精當一個錯誤:

PicklingError: Can't pickle : attribute lookup builtin.function failed

我已經試過很多事情要儘量修復PicklingError,但似乎沒有任何工作。

  • 是否可以用機械化來做到這一點?
  • 我應該改用另一個庫,比如urllib2或類似的東西嗎?

任何幫助將非常感激:)

回答

1

mechanize.Browser對象不是與pickle,所以它不能被傳遞給pool.apply_async(或需要的對象發送到一個子進程的任何其他方法) :

>>> b = mechanize.Browser() 
>>> import pickle 
>>> pickle.dumps(b) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/pickle.py", line 1374, in dumps 
    Pickler(file, protocol).dump(obj) 
    File "/usr/lib/python2.7/pickle.py", line 224, in dump 
    self.save(obj) 
    File "/usr/lib/python2.7/pickle.py", line 286, in save 
    f(self, obj) # Call unbound method with explicit self 
    File "/usr/lib/python2.7/pickle.py", line 725, in save_inst 
    save(stuff) 
    File "/usr/lib/python2.7/pickle.py", line 286, in save 
    f(self, obj) # Call unbound method with explicit self 
    File "/usr/lib/python2.7/pickle.py", line 649, in save_dict 
    self._batch_setitems(obj.iteritems()) 
    File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems 
    save(v) 
    File "/usr/lib/python2.7/pickle.py", line 286, in save 
    f(self, obj) # Call unbound method with explicit self 
    File "/usr/lib/python2.7/pickle.py", line 600, in save_list 
    self._batch_appends(iter(obj)) 
    File "/usr/lib/python2.7/pickle.py", line 615, in _batch_appends 
    save(x) 
    File "/usr/lib/python2.7/pickle.py", line 286, in save 
    f(self, obj) # Call unbound method with explicit self 
    File "/usr/lib/python2.7/pickle.py", line 725, in save_inst 
    save(stuff) 
    File "/usr/lib/python2.7/pickle.py", line 286, in save 
    f(self, obj) # Call unbound method with explicit self 
    File "/usr/lib/python2.7/pickle.py", line 649, in save_dict 
    self._batch_setitems(obj.iteritems()) 
    File "/usr/lib/python2.7/pickle.py", line 663, in _batch_setitems 
    save(v) 
    File "/usr/lib/python2.7/pickle.py", line 306, in save 
    rv = reduce(self.proto) 
    File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex 
    raise TypeError, "can't pickle %s objects" % base.__name__ 
TypeError: can't pickle instancemethod objects 

做最簡單的事情是創建每個子流程內Browser實例,而不是在父:

​​

理想情況下,您只需在父進程中使用Browser對象登錄,然後在多個進程間發出並行請求,但可能需要花費大量精力才能使對象變爲可選狀態(如果它是可能的話) - 即使您設法刪除造成當前錯誤的instancemethod對象,除Browser之外,還可能有更多不可打開的對象。

+0

謝謝,這種告訴我機械化可能不是我的問題的解決方案,因爲我需要提前登錄才能開始向網站發出這些異步請求,所以每次登錄都不會工作。 – geb12 2014-09-04 16:19:25

+0

@ geb12您可以嘗試將您的請求集中在一起,例如一次向每個函數調用傳遞1000/2000鏈接 – 2017-09-27 07:56:08