python scrapy：抓取動態信息

我正試圖從http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx中取消信息。我想做以下事情： - 從頁面頂部的下拉列表中選擇「牙醫」 - 點擊搜索 - 請注意頁面底部的信息會動態變化使用javascript - 點擊從業者姓名的超鏈接和彈出窗口顯示 - 我想將所有信息保存在每個從業者的json/csv文件中 - 我還希望其他頁面上的信息位於頁面底部的linkedin處，以更改保存div中的信息。python scrapy：抓取動態信息

我很新的scrapy，只是看着硒，因爲我讀的地方，你需要硒的動態信息

所以我使用的硒scrapy應用程序內。不知道這是否正確。我不知道這樣做的最好方法是什麼。到目前爲止，我有以下代碼。我得到這個錯誤sch_spider.py」，

line 21, in DmozSpider 
    all_options = element.find_elements_by_tag_name("option") 
NameError: name 'element' is not defined

sch_spider.py

from scrapy.spider import Spider 
from scrapy.selector import Selector 
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from scrapytutorial.items import SchItem 
from selenium.webdriver.support.ui import Select 

class DmozSpider(Spider): 
    name = "sch" 

    driver = webdriver.Firefox() 
    driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx") 
    select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType')) 
    all_options = element.find_elements_by_tag_name("option") 

    for option in all_options: 
     if option.get_attribute("value") == "4": #Dentist 
      option.click() 
     ends 
     break 

    driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click() 


    def parse(self, response): 

     all_docs = element.find_elements_by_tag_name("td") 
     for name in all_docs: 
      name.click() 
      alert = driver.switch_to_alert() 
      sel = Selector(response) 
      ma = sel.xpath('//table') 
      items = [] 
      for site in ma: 
       item = SchItem() 
       item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract() 
       item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract() 
       item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract() 
       item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract() 
       item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract() 
       item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract() 
       item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract() 

       items.append(item) 
      return items

items.py

from scrapy.item import Item, Field 

class SchItem(Item): 

    name = Field() 
    profession = Field() 
    scope_of_practise = Field() 
    instituition = Field() 
    license = Field() 
    license_expiry_date = Field() 
    qualification = Field()

來源

2014-06-06 James L.

我不是在尋找代碼審查。我有一個錯誤，並尋找解決方案。 –

您應該向服務器發送一個POST請求。[This answer here]（http://stackoverflow.com/questions/10218581/using-scrapy-to-scrap-asp-net-website-with-javascript-buttons-and -ajax-requests）應該是一個好的開始。 – agstudy

你不應該改變element.find_elements。在下面的代碼select.find_element ..

select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType')) 
    all_options = element.find_elements_by_tag_name("option")

或者寧可不要使用select.options？

來源

2014-06-06 18:32:30 Biswanath

python scrapy：抓取動態信息

回答

相關問題