scrapy：不是的XPath返回完整URL @href

執行使用XPath與scrapy我不獲取完整URLscrapy：不是的XPath返回完整URL @href

這裏刮的網址是我使用的看着

scrapy殼

scrapy shell "http://www.ybracing.com/omp-ia01854-omp-first-evo-race-suit.html"

我執行下面的XPath從殼

sel.xpath("//*[@id='Thumbnail-Image-Container']/li[1]/a//@href")

選擇並只獲得一半HREF

[<Selector xpath="//*[@id='Thumbnail-Image-Container']/li[1]/a//@href" data=u'http://images.esellerpro.com/2489/I/160/'>]

這裏的HTML的代碼段我在瀏覽器中

 <li><a data-medimg="http://images.esellerpro.com/2489/I/160/260/1/medIA01854-GALLERY.jpg" href="http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg" class="cloud-zoom-gallery Selected" title="OMP FIRST EVO RACE SUIT" rel="useZoom: 'MainIMGLink', smallImage: 'http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg'"><img src="http://images.esellerpro.com/2489/I/160/260/1/smIA01854-GALLERY.jpg" alt="OMP FIRST EVO RACE SUIT Thumbnail 1"></a></li>

看着這裏是從wget的

<li><a data-medimg="http://images.esellerpro.com/2489/I/513/0/medIA01838_GALLERY.JPG" href="http://images.esellerpro.com/2489/I/513/0/lrgIA01838_GALLERY.JPG" class="cloud-zoom-gallery Selected" title="OMP DYNAMO RACE SUIT" rel="useZoom: 'MainIMGLink', smallImage: 'http://images.esellerpro.com/2489/I/513/0/lrgIA01838_GALLERY.JPG'"><img src="http://images.esellerpro.com/2489/I/513/0/smIA01838_GALLERY.JPG" alt="OMP DYNAMO RACE SUIT Thumbnail 1" /></a></li>

我曾試圖改變我的XPath拉相同但仍然得到相同的結果

什麼是造成這個，我能做些什麼來解決它想了解而不是有人只是糾正我的xpath對我來說

對頁面本身的一些想法我禁用JavaScript來查看如果js產生的一半但它不是。我也下載了wget的網頁，確認網址是在orriginal HTML完整

我還沒有任何其它測試的構建但我在CentOS的使用與2.7 scrapy 1.2.1 7

我GOOGLE只有找到人誰不能搶是因爲JavaScript產生的飛行數據的數據，但我的數據是存在於HTML

來源

2016-10-25 r_al_sim

通過使用

sel.xpath("//*[@id='Thumbnail-Image-Container']/li[1]/a//@href")

你Selector實例的列表，在其中data字段只顯示其所有內容的前幾個字節（因爲它可能很長）。

要檢索的內容作爲字符串（而不是Selector實例），您將需要使用像.extract或.extract_first：

>>> print(sel.xpath("//*[@id='Thumbnail-Image-Container']/li[1]/a//@href").extract_first()) 
http://images.esellerpro.com/2489/I/160/260/1/lrgIA01854-GALLERY.jpg

來源

2016-10-26 05:26:21 starrify

感謝完美地解釋了它 –

scrapy：不是的XPath返回完整URL @href

回答

相關問題