2012-12-06 30 views
1
<p> 
    Glassware veteran 
    <strong>Corning </strong> 
    (
    <span class="ticker"> 
     NYSE: 
     <a class="qsAdd qs-source-isssitthv0000001" href="http://caps.fool.com/Ticker/GLW.aspx?source=isssitthv0000001" data-id="203758">GLW</a> 
    </span> 
    <a class="addToWatchListIcon qsAdd qs-source-iwlsitbut0000010" href="http://my.fool.com/watchlist/add?ticker=&source=iwlsitbut0000010" title="Add to My Watchlist"> </a> 
    ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
</p> 

我想得到「玻璃器皿老手」和「最近陷入了困境,現在是放棄股票的時候了,還是康寧會有香蕉和捲土重來?如何使用lxml從html解析文本?

使用代碼

tnode = root.xpath("/p") 
content = tnode.text 

我只能得到 「玻璃器皿老將」,爲什麼呢?

回答

0

像這樣的東西可能會得到你想要的東西:

>>> tnode = root.xpath('/p') 
>>> content = tnode.xpath('text()') 
>>> print ''.join(content) 

Glassware veteran 

(


) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
>>> 

如果你想文本節點的所有,只需使用//text()代替text()

>>> print ' '.join([x.strip() for x in ele.xpath('//text()')]) 
Glassware veteran Corning (NYSE: GLW ) has fallen on hard times lately. Is it time to give up on the stock, or will Corning have a banana and a comeback? 
+0

非常感謝你。但是現在我遇到了一個新問題,我希望得到「玻璃器皿老兵康寧(紐約證券交易所代碼:GLW)最近陷入了困境,現在是放棄股票的時候了,還是康寧會有香蕉和捲土重來?使用代碼:tnode = root.xpath('/ p |/p/strong |/p/a |/p/span')content = tnode.xpath('text()')print''.join(content)結果是:「Glassware老將()最近陷入了困境,是放棄股票的時候了,還是康寧會有香蕉和捲土重來呢?」康寧紐約證券交易所股票代碼: GLW「你有什麼想法嗎?謝謝。 – yinyao

+0

我已經更新了我的答案。 – larsks