1
所以我讓我的刮刀工作與一個表單請求。我甚至可以看到終端打印出刮數據隨着從該單頁版本:Scrapy存儲跨多個formrequest頁面的項目?元? python
class MySpider(BaseSpider):
name = "swim"
start_urls = ["example.website"]
DOWNLAD_DELAY= 30.0
def parse(self, response):
return [FormRequest.from_response(response,formname="TTForm",
formdata={"Ctype":"A", "Req_Team": "", "AgeGrp": "0-6",
"lowage": "", "highage": "", "sex": "W", "StrkDist": "10025",
"How_Many": "50", "foolOldPerl": ""}
,callback=self.swimparse1,dont_click=True)]
def swimparse1(self, response):
open_in_browser(response)
hxs = Selector(response)
rows = hxs.xpath(".//tr")
items = []
for rows in rows[4:54]:
item = swimItem()
item["names"] = rows.xpath(".//td[2]/text()").extract()
item["age"] = rows.xpath(".//td[3]/text()").extract()
item["free"] = rows.xpath(".//td[4]/text()").extract()
item["team"] = rows.xpath(".//td[6]/text()").extract()
items.append(item)
return items
然而,當我在第二個加formrequest回電,它只在刮第二個項目。它也只打印第二頁上的刮擦,就好像它完全跳過第一頁刮擦一樣? :
class MySpider(BaseSpider):
name = "swim"
start_urls = ["example.website"]
DOWNLAD_DELAY= 30.0
def parse(self, response):
return [FormRequest.from_response(response,formname="TTForm",
formdata={"Ctype":"A", "Req_Team": "", "AgeGrp": "0-6",
"lowage": "", "highage": "", "sex": "W", "StrkDist": "10025",
"How_Many": "50", "foolOldPerl": ""}
,callback=self.swimparse1,dont_click=True)]
def swimparse1(self, response):
open_in_browser(response)
hxs = Selector(response)
rows = hxs.xpath(".//tr")
items = []
for rows in rows[4:54]:
item = swimItem()
item["names"] = rows.xpath(".//td[2]/text()").extract()
item["age"] = rows.xpath(".//td[3]/text()").extract()
item["free"] = rows.xpath(".//td[4]/text()").extract()
item["team"] = rows.xpath(".//td[6]/text()").extract()
items.append(item)
#print item[]
return [FormRequest.from_response(response,formname="TTForm",
formdata={"Ctype":"A", "Req_Team": "", "AgeGrp": "0-6",
"lowage": "", "highage": "", "sex": "W", "StrkDist": "40025",
"How_Many": "50", "foolOldPerl": ""}
,callback=self.Swimparse2,dont_click=True),]
def swimparse2(self, response):
open_in_browser(response)
hxs = Selector(response)
rows = hxs.xpath(".//tr")
items = []
for rows in rows[4:54]:
item = swimItem()
item["names"] = rows.xpath(".//td[2]/text()").extract()
item["age"] = rows.xpath(".//td[3]/text()").extract()
item["fly"] = rows.xpath(".//td[4]/text()").extract()
item["team"] = rows.xpath(".//td[6]/text()").extract()
items.append(item)
#print item[]
return items
猜測: A)我怎樣才能導出或返回從第一刮的項目進入第二刮,這樣我結束了所有項目的數據一起,就好像它是從一個頁面刮? B)或者如果第一次刮擦被完全跳過,我該如何停止跳過並將這些物品傳遞給下一個?
謝謝!
PS:額外的:我使用已經試過:
item = response.request.meta = ["item]
item = response.request.meta = []
item = response.request.meta = ["names":item, "age":item, "free":item, "team":item]
所有這些創建密鑰錯誤或其他異常升高
伊夫還試圖修改形式請求以包括元= {」名稱「:項目,」年齡「:項目,」免費「:項目,」團隊「:項目}。不會引發錯誤,但不會刮擦或存儲任何東西。
編輯:我嘗試使用收益,像這樣:
class MySpider(BaseSpider):
name = "swim"
start_urls = ["www.website.com"]
DOWNLAD_DELAY= 30.0
def parse(self, response):
open_in_browser(response)
hxs = Selector(response)
rows = hxs.xpath(".//tr")
items = []
for rows in rows[4:54]:
item = swimItem()
item["names"] = rows.xpath(".//td[2]/text()").extract()
item["age"] = rows.xpath(".//td[3]/text()").extract()
item["free"] = rows.xpath(".//td[4]/text()").extract()
item["team"] = rows.xpath(".//td[6]/text()").extract()
items.append(item)
yield [FormRequest.from_response(response,formname="TTForm",
formdata={"Ctype":"A", "Req_Team": "", "AgeGrp": "0-6",
"lowage": "", "highage": "", "sex": "W", "StrkDist": "10025",
"How_Many": "50", "foolOldPerl": ""}
,callback=self.parse,dont_click=True)]
for rows in rows[4:54]:
item = swimItem()
item["names"] = rows.xpath(".//td[2]/text()").extract()
item["age"] = rows.xpath(".//td[3]/text()").extract()
item["fly"] = rows.xpath(".//td[4]/text()").extract()
item["team"] = rows.xpath(".//td[6]/text()").extract()
items.append(item)
yield [FormRequest.from_response(response,formname="TTForm",
formdata={"Ctype":"A", "Req_Team": "", "AgeGrp": "0-6",
"lowage": "", "highage": "", "sex": "W", "StrkDist": "40025",
"How_Many": "50", "foolOldPerl": ""}
,callback=self.parse,dont_click=True)]
仍然沒有任何刮。我知道xpaths是正確的,因爲當我只嘗試和刮一個表單(以回報而不是收益率)時,它完美地工作。我讀過的零碎文件,它只是是不是非常有幫助:(
感謝不幸的是,收益率似乎並不被幫助進行調試PUR:儘可能多的項目和要求:從你想要的功能,scrapy將完成剩下的
從scrapy docs。構成我在代碼的每個部分都包含了一個開放的瀏覽器命令。代碼停止(沒有瀏覽器打開),我得到這個:錯誤:Spider必須返回Request,BaseItem或None,在中獲得'list' 。使用yield代替下兩個return的任何組合,瀏覽器打開(代碼執行?),但不會發生刮擦。 –
InfinteScroll
沒關係,所以上面的評論我只是用yield來代替回報。這一次,我進一步縮減產量,以便他們將執行「for」的每個實例。然而,仍然沒有刮,並且這個錯誤被打印的次數可能是「for ...」錯誤:Spider必須返回Request,BaseItem或None,在中獲得'list' –
InfinteScroll
我不知道如何表明這是要走的路:)我只在我的蜘蛛中寫出產量,他們總是工作,確保在產出之前打印產品不會產生無。並確保你改變**全部**返回收益率 –