2013-06-20 26 views
1

我有一個HTML代碼段,看起來像這樣:如何僅從HTML片段返回文本?

<pre>zdfsfsf<br/>adfadfadf 
adfadfasdfadfad adfadf adf 
Mill Valley, CA 94941 
122-2323-24124 
Email: adfadfadf<br/><i>sfsfsfsf</i></pre> 
<br/> 

我想要去除所有標籤,只是有文字。

內容應該是這樣的:

zdfsfsf adfadfadf 
adfadfasdfadfad adfadf adf 
Mill Valley, CA 94941 
122-2323-24124 
Email: adfadfadf sfsfsfsf 

我正在尋找這樣的事情:

cells = row.find_all('td') 
for c in cells: 
    c.STRIP_HTML_TAGS()?????? <--WHAT IS THIS FUNCTION? 

回答

3

您正在尋找get_text()

>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup("""<pre>zdfsfsf<br/>adfadfadf 
... adfadfasdfadfad adfadf adf 
... Mill Valley, CA 94941 
... 122-2323-24124 
... Email: adfadfadf<br/><i>sfsfsfsf</i></pre> 
... <br/>""") 
>>> print(soup.get_text()) 
zdfsfsfadfadfadf 
adfadfasdfadfad adfadf adf 
Mill Valley, CA 94941 
122-2323-24124 
Email: adfadfadfsfsfsfsf 
>>>