2016-02-29 61 views
0

我想寫一個Scrapy蜘蛛與Selenium的組合來訪問我正在抓取的頁面上的一些JavaScript內容。我設法使用Selenium打開頁面,並等待內容出現。現在我想從完全加載的頁面構建Scrapy TextResponse。我的代碼看起來像這樣(我刪除的網址和選擇字符串,它們並不重要):Scrapy解析函數中沒有定義的響應

import scrapy 
from scrapy import signals 
from scrapy.http import TextResponse 
from scrapy.xlib.pydispatch import dispatcher 

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

class EexSpider(scrapy.Spider): 
    name = "eex" 
    allowed_domain = ["..."] 
    start_urls = ["..."] 

    def __init__(self): 
     self.driver = webdriver.Chrome() 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 

    def spider_closed(self, spider): 
     self.driver.close() 

    def parse(self, response): 
     self.driver.get(response.url) 
     wait = WebDriverWait(self.driver, 10) 
     element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '...'))) 

     # this is where things go wrong 
     print response.url # prints the correct url 
     text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8') 
     # NameError: name 'response' is not defined 

當我運行履帶我得到的錯誤NameError: name 'response' is not defined在我打電話的TextResponse構造線。奇怪的是,我能夠成功打印response.url之前的行。

有人知道爲什麼會出現這種情況嗎?

P.S.讓我知道如果你想看到堆棧跟蹤,我只是不想讓問題看起來更長。

免責聲明:我是一個總的Python小白;-)

回答

1

檢查包含TextResponse線正確縮進。

舉例來說,如果我有以下代碼:

import scrapy 
from scrapy import signals 
from scrapy.http import TextResponse 
from scrapy.xlib.pydispatch import dispatcher 

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

class EexSpider(scrapy.Spider): 
    name = "eex" 
    allowed_domain = ["google.com"] 
    start_urls = ["http://google.com"] 

    def __init__(self): 
     self.driver = webdriver.Chrome() 
     dispatcher.connect(self.spider_closed, signals.spider_closed) 

    def spider_closed(self, spider): 
     self.driver.close() 

    def parse(self, response): 
     self.driver.get(response.url) 

    text_response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8') 

我得到確切同樣的錯誤:

​​
+0

我是混合製表符和空格...這就是你得到的複製來自StackOverflow的代碼。謝謝! – muenchdo