0
我是Scrapy和python的新手。我花了幾個小時嘗試調試並尋找有用的響應,但我仍然陷入困境。我正試圖從www.pro-football-reference.com提取數據。這是我現在所擁有的從未使用Scrapy調用的回調函數
import scrapy
from nfl_predictor.items import NflPredictorItem
class NflSpider(scrapy.Spider):
name = "nfl2"
allowed_domains = ["http://www.pro-football-reference.com/"]
start_url = [
"http://www.pro-football-reference.com/boxscores/201509100nwe.htm"
]
def parse(self, response):
print "parse"
for href in response.xpath('// [@id="page_content"]/div[1]/table/tr/td/a/@href'):
url = response.urljoin(href.extract())
yield scrapy.Request(url, callback=self.parse_game_content)
def parse_game_content(self, response):
print "parse_game_content"
items = []
for sel in response.xpath('//table[@id = "team_stats"]/tr'):
item = NflPredictorItem()
item['away_stats'] = sel.xpath('td[@align = "center"][1]/text()').extract()
item['home_stats'] = sel.xpath('td[@align = "center"][2]/text()').extract()
items.append(item)
return items
我用解析命令進行調試和使用此命令
scrapy parse --spider=nfl2 "http://www.pro-football-reference.com/boxscores/201509100nwe.htm"
我得到以下輸出
>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items ------------------------------------------------------------
[]
# Requests -----------------------------------------------------------------
[<GET http://www.pro-football-reference.com/years/2015/games.htm>,
<GET http://www.nfl.com/scores/2015/REG1>,
<GET http://www.pro-football-reference.com/boxscores/201509130buf.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130chi.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130crd.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130dal.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130den.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130htx.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130jax.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130nyj.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130rai.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130ram.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130sdg.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130tam.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509130was.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509140atl.htm>,
<GET http://www.pro-football-reference.com/boxscores/201509140sfo.htm>]
爲什麼它的代碼正在記錄我想要的鏈接的請求,但它從來不會進入parse_game_content函數來實際地刮取數據?我還測試了parse_game_content函數作爲解析函數,以確保它正在抓取正確的數據,並在此情況下正常工作。
謝謝你的幫助!
你確定你有進口的所有庫? –