使用XPath

-1

在蟒蛇捕獲標籤之間的狀態我想捕捉字WORD一句This is what I want.在以下格式：使用XPath

<div id="message1"> 
<div class="message2"> 
<strong>WORD</strong> This is what I want.<br/> 
</div>    
</div>

我想的是：

import requests 
from lxml import html 
cont=session.get('http://mywebsite.com').content 
tree=html.fromstring(cont) 
word=tree.xpath('//div[@class="message2"]/strong') 
sentence=tree.xpath('//div[@class="message2"]/br') 
print word 
print sentence

什麼也沒有打印爲了我！

來源

2015-05-04 MLSC

我發現xpath helper是偉大的解決這樣一個

word = tree.xpath('//div[@class="message2"]/strong/text()')[0] 
sentence = tree.xpath('//div[@class="message2"]/strong/following-sibling::text()[1]')[0]

來源

2015-05-04 12:11:12 Leon

我不知道他爲什麼會短語這個問題是我想要的，但說他想要什麼的是在強烈的一個字標籤？所以如果那是他正在尋找的東西，那麼你的答案是正確的。 – PythonIsGreat

回答實際上是錯誤的......無論哪種方式，我現在修復它:) – Leon

它仍然是錯誤的:)雖然數組索引從xpath開始1，他們仍然在python的'0'開始。 – hek2mgl

問題，我不知道具體的關於LXML但如果這是你要尋找的文字，呼籲文本將不會返回子樹在強標籤內部存在的文本。

所以在一般的XPath條款中，這是你只想要匹配的文本。

//*[@class="message2"]/text()

來源

2015-05-04 12:14:38 PythonIsGreat

這是你想要的東西:)

from lxml import html 

text = """ 
<div id="message1"> 
<div class="message2"> 
<strong>WORD</strong> This is what I want.<br/> 
</div>    
</div> 
""" 

tree = html.fromstring(text); 
print(tree.xpath("//div[@class='message2']/strong/following-sibling::text()")[0])

來源

2015-05-04 12:20:51 hek2mgl

非常好。但是當我想捕獲真正的URL源並解析錯誤將是：'IndexError：列表索引超出範圍' – MLSC

你是什麼意思與*真實*來源？如果你得到這個錯誤，這意味着在HTML中有'message2'元素在這個地方不包含文本。 – hek2mgl

我的意思是'真正的url網頁'不是文字='''...''' – MLSC

回答

相關問題