我在第一次嘗試Scrapy。在做了一點研究之後,我得到了一些基礎知識。現在我正在嘗試獲取表格的數據。它不工作。檢查下面的源代碼。用Scrapy刮掉表格中的數據
items.py
from scrapy.item import Item, Field
class Digi(Item):
sl = Field()
player_name = Field()
dismissal_info = Field()
bowler_name = Field()
runs_scored = Field()
balls_faced = Field()
minutes_played = Field()
fours = Field()
sixes = Field()
strike_rate = Field()
digicric.py
from scrapy.spider import Spider
from scrapy.selector import Selector
from crawler01.items import Digi
class DmozSpider(Spider):
name = "digicric"
allowed_domains = ["digicricket.marssil.com"]
start_urls = ["http://digicricket.marssil.com/match/MatchData.aspx?op=2&match=1250"]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//*[@id="ctl00_ContentPlaceHolder1_divData"]/table[3]/tr')
items = []
for site in sites:
item = Digi()
item['sl'] = sel.xpath('td/text()').extract()
item['player_name'] = sel.xpath('td/a/text()').extract()
item['dismissal_info'] = sel.xpath('td/text()').extract()
item['bowler_name'] = sel.xpath('td/text()').extract()
item['runs_scored'] = sel.xpath('td/text()').extract()
item['balls_faced'] = sel.xpath('td/text()').extract()
item['minutes_played'] = sel.xpath('td/text()').extract()
item['fours'] = sel.xpath('td/text()').extract()
item['sixes'] = sel.xpath('td/text()').extract()
item['strike_rate'] = sel.xpath('td/text()').extract()
items.append(item)
return items
它顯示錯誤。這裏是錯誤屏幕截圖 [error screenshot](http://i.imgur.com/HPh5lia.png) 這裏是代碼: [鏈接](http://i.imgur.com/InxV60O .png) [鏈接](http://i.imgur.com/XtKyOkr.png) – 2015-04-06 06:17:34
@TanzibHossainNirjhor奇怪,爲我工作。您使用的是什麼Scrapy版本? – alecxe 2015-04-06 09:26:51
[Scrapy 0.24.5] [Python 2.7.9] [PIP 6.0.8] [Windows 8.1] – 2015-04-06 16:41:14