如何從此XPath獲取鏈接文本？

使用Python庫Scrapy，我做了以下內容：如何從此XPath獲取鏈接文本？

scrapy shell "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/"

從那裏我想獲得返回的每個項目的各個環節+文字：

response.xpath('//div[@class="title-and-desc"]/a')

然而，只有聯繫正在返回而不是文本。下面是所返回什麼樣：

response.xpath('//div[@class="title-and-desc"]/a') 
[<Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.brpr'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://www.dive'>, <Selector xpath='//div[@class="title-and-desc"]/a' data=u'<a target="_blank" href="http://rhodesmi'>,

我可以通過上面的結果，其中i是每個迭代變量循環：

i.xpath("text()").extract_first(), 
i.xpath("@href").extract_first()

但只返回了@href值。這是因爲text()沒有任何結果檢索結果。什麼需要改變，所以我也可以得到相應的鏈接文本？

作爲參考，完整的Scrapy示例來自這裏：Scrapy Tutorial Example。

來源

2016-11-28 4thSpace

這是因爲你正在尋找的文本是在子節點div：

<div class="title-and-desc"> 
    <a target="_blank" href="http://www.network-theory.co.uk/python/intro/"> 
    <div class="site-title">An Introduction to Python </div> 
    </a> 
<div>

你可以得到一個節點的所有文本（與文本的是兒童）通過預先//它，即//text()而不是text()或只是去顯式xpath a/div/text()。

嘗試：

links = response.xpath('//div[@class="title-and-desc"]/a') 
for l in links: 
    # url: 
    print(l.xpath('@href').extract_first()) 
    # text with explicit xpath: 
    print(l.xpath('div/text()').extract_first()) 
    # or with all text elements with relative //text: 
    print(''.join(l.xpath('.//text()').extract()).strip())

來源

2016-11-28 02:29:50 Granitosaurus

因爲只獲得文本意味着沒有得到的URL，這並不解決問題。我確實嘗試過'i.xpath（「// text（）」）。extract_first（）'，但那不起作用。 – 4thSpace

@ 4thSpace它可以工作，請參閱我的編輯示例。 – Granitosaurus

另一個有用的選擇是在鏈接中使用XPath的'string（）'或'normalize-space（）'：'：print（l.xpath（'normalize-space（。）'）。extract_first（），l。 XPath的（ '@ href' 屬性）。extract_first（））' –

如何從此XPath獲取鏈接文本？

回答

相關問題