如何在Scrapy中提取嵌套文本？

我試圖使用Scrapy提取的品牌描述的這個網站上的一段話： http://us.asos.com/hope-and-ivy/hope-ivy-dotty-mesh-midi-dress-with-ruffle-detail/prd/8663409?clr=black&cid=2623&pgesize=36&pge=0&totalstyles=627&gridsize=3&gridrow=1&gridcolumn=1 如何在Scrapy中提取嵌套文本？

的HTML元素看起來是這樣的：

<div class="brand-description"> 
    <h4>Brand</h4> 
    <span>"Prom queens and wedding guests, claim the best-dressed title in " 
    <a href="/Women/A-To-Z-Of-Brands/Hope-And-Ivy/Cat/pgecategory.aspx?cid=21368"> 
     <strong>"Hope and Ivy's"</strong> 
    </a> 
    "occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses." 
    </span> 
</div>

我期望的結果是：

「舞會王后和婚禮嘉賓，聲稱希望和Ivy的場合就緒系列中穿着最好的頭銜。爲手繪花卉，Bardot領口和圖形效果的鉛筆連衣裙購買其通知我款式。

這個方法我試過：

response.css("div.brand-description span::text").extract()

然而，文本列表我缺少這些「強」的標籤，這是「希望與常春藤的」 Inside：

['Prom queens and wedding guests, claim the best-dressed title in ', ' occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses.']

我的問題是，我可以在不注意「href」標籤的情況下獲得純文本嗎？

來源

2017-08-29 lliu05

嘗試採取這種// DIV [@類=「品牌描述」] /格 –

你仍然可能需要做一些後期處理，但是這可能是最好的，你可以這樣做：

response.xpath('normalize-space(//div[@class="brand-description"]/span)').extract_first()

，這將給你

u'"Prom queens and wedding guests, claim the best-dressed title in " "Hope and Ivy\'s" "occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses."'

來源

2017-08-29 05:49:22

如何在Scrapy中提取嵌套文本？

回答

相關問題