1
我正試圖從http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx中取消信息。我想做以下事情: - 從頁面頂部的下拉列表中選擇「牙醫」 - 點擊搜索 - 請注意頁面底部的信息會動態變化使用javascript - 點擊從業者姓名的超鏈接和彈出窗口顯示 - 我想將所有信息保存在每個從業者的json/csv文件中 - 我還希望其他頁面上的信息位於頁面底部的linkedin處,以更改保存div中的信息。python scrapy:抓取動態信息
我很新的scrapy,只是看着硒,因爲我讀的地方,你需要硒的動態信息
所以我使用的硒scrapy應用程序內。不知道這是否正確。我不知道這樣做的最好方法是什麼。到目前爲止,我有以下代碼。我得到這個錯誤sch_spider.py」,
line 21, in DmozSpider
all_options = element.find_elements_by_tag_name("option")
NameError: name 'element' is not defined
sch_spider.py
from scrapy.spider import Spider
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapytutorial.items import SchItem
from selenium.webdriver.support.ui import Select
class DmozSpider(Spider):
name = "sch"
driver = webdriver.Firefox()
driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx")
select = Select(driver.find_element_by_name('ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType'))
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
if option.get_attribute("value") == "4": #Dentist
option.click()
ends
break
driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click()
def parse(self, response):
all_docs = element.find_elements_by_tag_name("td")
for name in all_docs:
name.click()
alert = driver.switch_to_alert()
sel = Selector(response)
ma = sel.xpath('//table')
items = []
for site in ma:
item = SchItem()
item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract()
item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract()
item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract()
item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract()
item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract()
item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract()
item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract()
items.append(item)
return items
items.py
from scrapy.item import Item, Field
class SchItem(Item):
name = Field()
profession = Field()
scope_of_practise = Field()
instituition = Field()
license = Field()
license_expiry_date = Field()
qualification = Field()
我不是在尋找代碼審查。我有一個錯誤,並尋找解決方案。 –
您應該向服務器發送一個POST請求。[This answer here](http://stackoverflow.com/questions/10218581/using-scrapy-to-scrap-asp-net-website-with-javascript-buttons-and -ajax-requests)應該是一個好的開始。 – agstudy