Scrapy：如何獲得一個正確的選擇

大膽正常斜體

，我需要選擇並獲得：粗體正常italist。

的HTML是：

<a href=""><strong>Bold</strong> normal <i>Italist</i></a>

然而，a/text()產生

正常僅

。有誰知道一個修復？我正在測試bing爬行，並且粗體文本根據查詢處於不同的位置。

來源

2017-06-02 GRS

您需要了解[**中的XPath文本節點和字符串值之間的差異**]（https://stackoverflow.com/a/41077106/290085） – kjhughes

您可以使用a//text()而不是a/text()來獲取所有文本項目。

# -*- coding: utf-8 -*- 
from scrapy.selector import Selector 

doc = """ 
<a href=""><strong>Bold</strong> normal <i>Italist</i></a> 
""" 

sel = Selector(text=doc, type="html") 

result = sel.xpath('//a/text()').extract() 
print result 
# >>> [u' normal '] 

result = u''.join(sel.xpath('//a//text()').extract()) 
print result 
# >>> Bold normal Italist

來源

2017-06-02 16:05:50

你可以嘗試使用

a/string()

或

normalize-space(a)

返回Bold normal Italist

來源

2017-06-02 16:06:01 Andersson

Scrapy只支持XPath 1.0，所以'a/string（）'不起作用。 –

我不確定，所以添加了2個選項... – Andersson

Scrapy：如何獲得一個正確的選擇

回答

相關問題