2014-11-08 85 views
0

我試圖從lxml使用xpath獲取本網站的名人名單,但遇到了麻煩。使用lxml從html獲取文本

下面是HTML

<div class="lists"> 
      <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd> 

而且我想要得到的文本亞當·李維

我在Python代碼...

celebs = tree.xpath('//dd[a]/following-sibling::node()') 

但我的結果元件DD在0x1084ad4c8> ...

如果任何人都可以提供幫助,那就太好了。由於

+0

嘗試增加名人後打印(celebs.text)= tree.xpath() – knittledan 2014-11-09 19:43:30

回答

0

提取與text()文本,而不是following-sibling::node(),像這樣:

from lxml import etree 

# your HTML is invalid, I have purposefully put the </dl> and </div> closing tags 
s = '''<div class="lists"> 
      <dl> <dt>A</dt> <dd><a href="/people/adam_levine/" id="20608779">Adam Levine</a> </dd></dl></div>''' 

tree = etree.fromstring(s) 

tree.xpath('.//dd/a/text()') 
['Adam Levine']