2017-04-25 28 views
4

使用splinter和Python,我有兩個線程正在運行,每個線程都訪問相同的主URL但路徑不同,例如,線程一個點擊:mainurl.com/threadone和線程兩支安打:用mainurl.com/threadtwoPython + splinter + http:Error - httplib.ResponseNotReady

Traceback (most recent call last): 
    File "multi_thread_practice.py", line 299, in <module> 
    main() 
    File "multi_thread_practice.py", line 290, in main 
    first_method(r) 
    File "multi_thread_practice.py", line 195, in parser 
    second_method(title, name) 
    File "multi_thread_practice.py", line 208, in confirm_product 
    third_method(current_url) 
    File "multi_thread_practice.py", line 214, in buy_product 
    browser.visit(url) 
    File "/Users/joshua/anaconda/lib/python2.7/site-packages/splinter/driver/webdriver/__init__.py", line 184, in visit 
    self.driver.get(url) 
    File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 261, in get 
    self.execute(Command.GET, {'url': url}) 
    File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 247, in execute 
    response = self.command_executor.execute(driver_command, params) 
    File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 464, in execute 
    return self._request(command_info[0], url, body=data) 
    File "/Users/joshua/anaconda/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 488, in _request 
    resp = self._conn.getresponse() 
    File "/Users/joshua/anaconda/lib/python2.7/httplib.py", line 1108, in getresponse 
    raise ResponseNotReady() 
httplib.ResponseNotReady 

什麼是錯誤,我應該如何去處理這個問題:

from splinter import Browser 
browser = Browser('chrome') 

但是遇到下列錯誤來了?

謝謝你提前一定會給予好評/接受的答案

CODE ADDED

import time 
from splinter import Browser 
import threading 

browser = Browser('chrome') 

start_time = time.time() 

urlOne = 'http://www.practiceurl.com/one' 
urlTwo = 'http://www.practiceurl.com/two' 
baseUrl = 'http://practiceurl.com' 

browser.visit(baseUrl) 

def secondThread(url): 
    print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time) 
    browser.visit(url) 
    print 'END 2ND REQUEST: ' + str(time.time() - start_time) 


def mainThread(url): 
    print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time) 
    browser.visit(url) 
    print 'END 1ST REQUEST: ' + str(time.time() - start_time) 


def main(): 
    threadObj = threading.Thread(target=secondThread, args=[urlTwo]) 
    threadObj.daemon = True 

    threadObj.start() 

    mainThread(urlOne) 

main() 
+0

httplib.ResponseNotReady通常是重用響應來執行。我不知道你是否因爲沒有代碼,但我認爲這是錯誤的。 –

+0

這將有助於提供[MVC](https://stackoverflow.com/help/mcve) – Adonis

+0

@GenericSnake道歉。只需在原始文章中添加代碼即可。請看一下。 –

回答

2

據我所知,你想做什麼是不可能的在一個瀏覽器上。斯普林特正在對一個實際的瀏覽器進行操作,因此,同時傳入許多命令會導致問題。它的行爲就像人類與瀏覽器交互(當然是自動的)。可以打開許多瀏覽器窗口,但不能在不接收前一個請求的響應的情況下,在其他線程中發送請求。這會導致CannotSendRequest錯誤。所以,我建議(如果您需要使用線程)打開兩個瀏覽器,然後使用線程通過每個瀏覽器發送請求。否則,它不能完成。

此線程在硒上,但信息是可轉移的。 Selenium multiple tabs at once同樣,這說明你想要的(我假設)要做的事情是不可能的。綠色答覆的答題者給出了和我一樣的建議。

希望不會讓你走得太遠,並幫助你。

編輯:只是爲了證明:

import time 
from splinter import Browser 
import threading 

browser = Browser('firefox') 
browser2 = Browser('firefox') 

start_time = time.time() 

urlOne = 'http://www.practiceurl.com/one' 
urlTwo = 'http://www.practiceurl.com/two' 
baseUrl = 'http://practiceurl.com' 

browser.visit(baseUrl) 


def secondThread(url): 
    print 'STARTING 2ND REQUEST: ' + str(time.time() - start_time) 
    browser2.visit(url) 
    print 'END 2ND REQUEST: ' + str(time.time() - start_time) 


def mainThread(url): 
    print 'STARTING 1ST REQUEST: ' + str(time.time() - start_time) 
    browser.visit(url) 
    print 'END 1ST REQUEST: ' + str(time.time() - start_time) 


def main(): 
    threadObj = threading.Thread(target=secondThread, args=[urlTwo]) 
    threadObj.daemon = True 

    threadObj.start() 

    mainThread(urlOne) 

main() 

請注意,我用的Firefox,因爲我一直沒有得到chromedriver安裝。

在定時器開始之前,在瀏覽器打開後設置一個等待,以確保它們完全準備就緒可能是一個好主意。

+0

欣賞您的輸入!事實上,我做了一些嘗試,但似乎兩個窗口沒有連接,這意味着,在一個窗口上操作與打開的其他窗口無關。所以我在考慮讓第二個線程在同一個窗口中打開一個新標籤。這可能嗎? –

+0

您應該能夠在一個瀏覽器窗口中打開新標籤,然後在其中打開不同的網址。但請記住,它不可能在您打開網址的同一時間。您必須等待一個選項卡接收其請求的響應,然後通過第二個選項卡發送請求。它使得線程化有點毫無意義。我認爲@asettouf對於我這個話題更加了解,所以他可能會更多地展示他的例子,這可以幫助你。 –

+0

欣賞洞察無論。我怎樣才能打開一個新的標籤與'碎片'雖然呢? –

1

@GenericSnake在這個問題上是正確的。爲了一點點添加到它,我會強烈建議您重構代碼使用multiprocessing library,主要是因爲線程實現使用GIL

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

實際使用多一個好處是,你可以重構你的代碼爲了避免重複的方法secondThreadmainThread,例如這種方式(一個過去的事情,不要忘記清理你使用的資源,像browser.quit()關閉瀏覽器一旦你完成):

import time 
from splinter import Browser 
from multiprocessing import Process 
import os 

os.environ['PATH'] = os.environ[ 
         'PATH'] + "path/to/geckodriver" + "path/to/firefox/binary" 

start_time = time.time() 

urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx' 
urlTwo = 'http://pythoncarsecurity.com/Products/' 



def url_visitor(url): 
    print("url called: " + url) 
    browser = Browser('firefox') 
    print('STARTING REQUEST TO: ' + url + " at "+ str(time.time() - start_time)) 
    browser.visit(url) 
    print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time)) 

def main(): 
    p1 = Process(target=url_visitor, args=[urlTwo]) 
    p2 = Process(target=url_visitor, args=[urlOne]) 
    p1.start() 
    p2.start() 
    p1.join() #join processes to the main process to see the output 
    p2.join() 

if __name__=="__main__": 
    main() 

那給我們下面的輸出(定時w生病取決於您的系統雖然):

url called: http://pythoncarsecurity.com/Support/FAQ.aspx 
url called: http://pythoncarsecurity.com/Products/ 
STARTING REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 10.763000011444092 
STARTING REQUEST TO: http://pythoncarsecurity.com/Products/ at 11.764999866485596 
END REQUEST TO: http://pythoncarsecurity.com/Support/FAQ.aspx at 16.20199990272522 
END REQUEST TO: http://pythoncarsecurity.com/Products/ at 16.625999927520752 

編輯:多線程和硒的問題是,一個瀏覽器實例不是線程安全的,我發現了繞過這個問題的唯一辦法就是獲得鎖在url_visitor,但是,在這種情況下,你失去了多線程的優勢。這就是爲什麼我相信,使用多個瀏覽器是更有益的(雖然我猜你有一些非常具體的要求),請參閱下面的代碼:

import time 
from splinter import Browser 
import threading 
from threading import Lock 
import os 

os.environ['PATH'] = os.environ[ 
         'PATH'] + "/path/to/chromedriver" 

start_time = time.time() 

urlOne = 'http://pythoncarsecurity.com/Support/FAQ.aspx' 
urlTwo = 'http://pythoncarsecurity.com/Products/' 
browser = Browser('chrome') 
lock = threading.Lock()#create a lock for the url_visitor method 

def init(): 
    browser.visit("https://www.google.fr") 
    driver = browser.driver 
    driver.execute_script("window.open('{0}', '_blank');") #create a new tab 
    tabs = driver.window_handles 


def url_visitor(url, tabs): 
    with lock: 
     if tabs != 0: 
      browser.driver.switch_to_window(browser.driver.window_handles[tabs]) 
     print("url called: " + url) 
     print('STARTING REQUEST TO: ' + url + " at "+ str(time.time() - start_time)) 
     browser.visit(url) 
     print('END REQUEST TO: ' + url + " at "+ str(time.time() - start_time)) 
     browser.quit() 


def main(): 
    p1 = threading.Thread(target=url_visitor, args=[urlTwo, 0]) 
    p2 = threading.Thread(target=url_visitor, args=[urlOne, 1]) 
    p1.start() 
    p2.start() 

if __name__=="__main__": 
    init() #create a browser with two tabs 
    main() 
+0

謝謝你的建議!我希望這兩個線程/進程都在同一個窗口上運行,以便與他們執行的操作有關係。有兩個窗戶,他們沒有關係。是否有一個進程/線程打開一個新的選項卡,但與主進程/線程在同一個窗口是可能的?先謝謝你! –

+0

@JoKo當你有兩個進程時,他們不會共享相同的內存,據我所知,使用單個窗口時,你將被困在多線程中。稍後我會以一個例子回來。 – Adonis

+0

明白了。期待它。謝謝你! –

相關問題