2014-10-08 106 views
2

我試圖做此頁面上的無限滾動,這裏是我的代碼:爲什麼硒在執行此代碼之前等待很長時間?

from selenium import webdriver 
import time 

profile = webdriver.FirefoxProfile() 
profile.set_preference("general.useragent.override","Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:28.0) Gecko/20100101 Firefox/28.0") 
driver = webdriver.Firefox(profile) 

driver.get("http://www.quora.com/Programming-Languages/followers") 
for n in range(0,5): # For testing I have capped this at 5, will handle this properly once things start to work. 
    driver.execute_script("window.scrollTo(0,1000000);") 
    time.sleep(2) 

所以,當我運行它,它等待了許多秒(超過1分鐘有時)做任何滾動前然後在下一次滾動之前等待相同的時間。代碼似乎在其他頁面上正常工作。 關於如何解決這個問題的任何想法?

當我嘗試使用Chrome而不是Firefox時,出現以下錯誤: driver = webdriver.Chrome('/home/asdf/apps/chromedrive/chromedriver')已添加到.py文件。

Traceback (most recent call last): 
    File "ok.py", line 8, in <module> 
    driver = webdriver.Chrome('/home/asdf/apps/chromedrive/chromedriver') 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/chrome/webdriver.py", line 65, in __init__ 
    keep_alive=True) 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 73, in __init__ 
    self.start_session(desired_capabilities, browser_profile) 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 121, in start_session 
    'desiredCapabilities': desired_capabilities, 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 171, in execute 
    response = self.command_executor.execute(driver_command, params) 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute 
    return self._request(command_info[0], url, body=data) 
    File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/remote_connection.py", line 379, in _request 
    self._conn.request(method, parsed_url.path, body, headers) 
    File "/usr/lib/python2.7/httplib.py", line 973, in request 
    self._send_request(method, url, body, headers) 
    File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request 
    self.endheaders(body) 
    File "/usr/lib/python2.7/httplib.py", line 969, in endheaders 
    self._send_output(message_body) 
    File "/usr/lib/python2.7/httplib.py", line 829, in _send_output 
    self.send(msg) 
    File "/usr/lib/python2.7/httplib.py", line 791, in send 
    self.connect() 
    File "/usr/lib/python2.7/httplib.py", line 772, in connect 
    self.timeout, self.source_address) 
    File "/usr/lib/python2.7/socket.py", line 553, in create_connection 
    for res in getaddrinfo(host, port, 0, SOCK_STREAM): 
socket.gaierror: [Errno -2] Name or service not known 

回答

1

切換到Chrome()幫我解決這個問題:

import time 
from selenium import webdriver 

followers_per_page = 18 

driver = webdriver.Chrome() 
driver.get("http://www.quora.com/Programming-Languages/followers") 

# get the followers count 
element = driver.find_element_by_class_name('count') 
followers_count = int(element.text.replace('k', '000').replace('.', '')) 
print followers_count 

# scroll down the page iteratively with a delay 
for _ in xrange(0, followers_count/followers_per_page + 1): 
    driver.execute_script("window.scrollTo(0, 0,1000000);") 
    time.sleep(2) 

通知你,我,用有點不同的方法:解析追隨者的數量和計算每頁面的追隨者考慮到事實上,它每次加載18個關注者。

我其實是一個類似Quora的問題之前工作過,請參閱:


那麼,這是不是進入我的腦海裏的第一件事。這是故事。

問題是存在掛起的請求到http://tch840195.tch.quora.com/up/chan5-8886/updates URL需要幾分鐘才能完成。這就是硒會認爲頁面沒有完全加載的原因。而且,情況越來越糟糕 - 這是每隔X秒發生一次的週期性事件。把它想成長時間的池。

我已經試過許多東西要克服這個問題用Firefox的webdriver:

  • 設置webdriver.load.strategy偏好unstable
  • 設置network.http.response.timeoutnetwork.http.connection-timeoutnetwork.http.keep-alive.timeoutnetwork.http.request.max-start-delay喜好
  • 集頁面加載超時:

    driver.set_page_load_timeout(3) 
    
  • 集腳本超時:

    driver.set_script_timeout(3) 
    
  • 呼叫window.stop();希望它會停止活動的請求:

    driver.execute_script('window.stop();') 
    
  • 更新到最新的Firefox和硒軟件包版本

另外一個可能工作的選項是以某種方式阻止對該"slow url" e的請求它使用代理服務器並指向它的Firefox,或者,如果可能的話,讓Firefox知道黑名單的URL(可能通過擴展)。

另請參閱相關的問題與解決方法多裏:

另見:

+0

感謝您看問題。我試圖使用chrome:'driver = webdriver.Chrome('/ home/asdf/apps/chromedrive/chromedriver')'但我遇到一些錯誤。在問題中更新。 – aste123 2014-10-08 14:29:28

+0

@ aste123好吧,看起來路徑配置正確,你是否安裝了Chrome瀏覽器? – alecxe 2014-10-08 14:40:22

+0

是安裝了鉻。 ''給''/ usr/bin/google-chrome' – aste123 2014-10-08 14:42:50

相關問題