2016-01-21 38 views
1

有在頁面上幾個點擊的元素,我想後面湊一些網頁,但我有這個錯誤和蜘蛛第一次點擊後關閉:Scrapy和硒StaleElementReferenceException

StaleElementReferenceException: Message: Element not found in the cache - perhaps the page has changed since it was looked up 

現在我只是試圖打開頁面來捕捉新的網址。這是我的代碼

from scrapy import signals 
from scrapy.http import TextResponse 
from scrapy.spider import Spider 
from scrapy.selector import Selector 
from scrapy.xlib.pydispatch import dispatcher 

from MySpider.items import MyItem 

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 

import time 

class MySpider(Spider): 
    name = "myspider" 
    allowed_domains = ["http://example.com"] 
    base_url = 'http://example.com' 
    start_urls = ["http://example.com/Page.aspx",] 

    def __init__(self): 
     self.driver = webdriver.Firefox() 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 

    def spider_closed(self, spider): 
     self.driver.close() 

    def parse(self, response): 

     self.driver.get(response.url) 
     item = MyItem() 

     links = self.driver.find_elements_by_xpath("//input[@class='GetData']") 

     for button in links: 
      button.click() 
      time.sleep(5) 

      source = self.driver.page_source 
      sel = Selector(text=source) # create a Selector object 

      item['url'] = self.driver.current_url 

      print '\n\nURL\n', item['url'], '\n' 
      yield item 

回答

2

因爲鏈接元素位於第一頁。如果您打開一個新頁面,則鏈接元素已過時。

您可以嘗試以下兩種解決方案:

1,店鋪鏈接元素的鏈接網址,並使用driver.get(url)打開鏈接。

def parse(self, response): 

    self.driver.get(response.url) 
    item = MyItem() 

    links = self.driver.find_elements_by_xpath("//input[@class='GetData']") 
    link_urls = links.get_attribute("href") 

    for link_url in link_urls: 
     self.driver.get(link_url) 
     time.sleep(5) 

     source = self.driver.page_source 
     sel = Selector(text=source) # create a Selector object 

     item['url'] = self.driver.current_url 

     print '\n\nURL\n', item['url'], '\n' 
     yield item 

2,點擊後鏈接並獲取URL,請撥打driver.back()備份到第一頁。然後重新找到鏈接元素。

def parse(self, response): 

    self.driver.get(response.url) 
    item = MyItem() 

    links = self.driver.find_elements_by_xpath("//input[@class='GetData']") 

    for i in range(len(links)): 
     links[i].click() 
     time.sleep(5) 

     source = self.driver.page_source 
     sel = Selector(text=source) # create a Selector object 

     item['url'] = self.driver.current_url 

     print '\n\nURL\n', item['url'], '\n' 
     yield item 
     self.driver.back() 
     links = self.driver.find_elements_by_xpath("//input[@class='GetData']") 
+0

非常感謝 – Goran

相關問題