我想用python使用lxml,因爲在閱讀和做谷歌推薦是使用lxml而不是其他解析包。我有以下dom結構,並且我管理寫入正確的xpath,然後在xpath檢查我的xpath檢查以確認它的有效性。 Xpath在Xpath Checker中工作正常,但是當我在Python中使用lxml時,我得不到結果infract我得到的是對象而不是實際的文本。使用lxml和xpath解析Html
這裏是我的DOM結構:
<div class="pdsc-l">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<tr>
<tr>
<tr>
<tr>
<tr>
<td width="35%" valign="top">
<font size="2" face="Arial, Helvetica, sans-serif">Brand</font>
</td>
<td width="65%" valign="top">
<font size="2" face="Arial, Helvetica, sans-serif">HTC</font>
</td>
</tr>
<tr>
<td width="35%" valign="top">
<td width="65%" valign="top">
以下XPath,我寫給我我想要的..
//td//font[text()='Brand']/following::td[1]
但隨着LXML我n要得到的結果:
This is my code:
rawPage = urllib2.urlopen(request)
read = rawPage.read()
#print read
tree = etree.HTML(read)
for tr in tree.xpath("//tr"):
print tr.xpath("//td//font[text()='Brand']/following::td[1]")
這裏是出把
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
我有以下變化但我仍然沒有得到結果試了一下,我的代碼寫有地址,希望這將有助於更好的答案:
from lxml import etree
from lxml.html import fromstring, tostring
url = 'http://www.ebay.com/ctg/111176858'
request = urllib2.Request(url)
rawPage = urllib2.urlopen(request)
read = rawPage.read()
#print read
tree = etree.HTML(read)
for tr in tree.xpath("//tr"):
t = tr.xpath("//td//font[text()='Brand']/following::td[1]")[0]
print tostring(t)
也許發佈您正在收到的輸出,以便我們可以瞭解更多信息發生了什麼? –