0
我與馬項目合作XYZ需要幫助理解程序
的輸出和我被困在從源中提取文本
<a href="/gifts" class="title" data-tracking-id="mdd-heading">gifts</a>
我想extrack將href爲內容
我想這
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from XYZ.items import XYZ
class MySpider(BaseSpider):
name = "main"
allowed_domains = ["XYZ"]
start_urls = ["XYZ"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//a[@data-tracking-id='mdd-heading']")
items = []
for titles in titles:
item = XYZ()
item ["title"] = titles.select("text()").extract()
item ["link"] = titles.select("@href").extract()
items.append(item)
print "www.xyz.com"+str(item["link"])
return items
和output
是
www.xyz.com[u'/gifts']
我期待爲
www.xyz.com/gifts
我做了錯誤的輸出....?
'item ['link']'顯然是一個列表;改用其第一個元素。 – jonrsharpe 2014-09-10 09:21:51
謝謝@jonrsharpe – 2014-09-10 09:23:24