2016-05-20 63 views
3

我想從這個HTML標籤,這是我存放在變量tag內得到的所有文字:如何從此標籤中獲取所有文本?

<td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> &amp; His Orchestra</td> 

結果應該是"Glenn Miller & His Orchestra"

但是print ing tag.find(text=True)返回此信息:"Glenn Miller"

如何獲取td元素內的其餘文本?

回答

4

tag.find(text=True)會返回第一個匹配文本節點。改爲使用.get_text()

>>> from bs4 import BeautifulSoup 
>>> data = '<td rowspan="2" style="text-align: center;"><a href="/wiki/Glenn_Miller" title="Glenn Miller">Glenn Miller</a> &amp; His Orchestra</td>' 
>>> soup = BeautifulSoup(data, "html.parser") 
>>> tag = soup.td 
>>> tag.get_text() 
'Glenn Miller & His Orchestra'