2012-11-05 17 views
1

嗨,我已經使用這個標籤從html文件中找到標籤的內容。如何使用python查找並從html文件中獲取文本?

def everything_between(text,begin,end): 
    idx1=content.find(begin) 
    idx2=content.find(end,idx1) 
    return content[idx1+len(begin):idx2].strip() 

content=open('page.html').read() 
title=everything_between(content,'<ul class="members">','</ul>') 
interesting=everything_between(content,'INTERESTING:','bodystuff') 
print(title) 

但在標籤<ul class="member">有多個<ahref>標籤, 我想有<a href="/history/member/">

腳本應該得到<a href="/history/member/"></a>之間的值<a href>之間的內容。

我該怎麼做?

+0

[使用XML解析器(http://stackoverflow.com/a/1732454/647772) – 2012-11-05 12:26:17

回答

0

http://www.crummy.com/software/BeautifulSoup/

soup.title 
# <title>The Dormouse's story</title> 

soup.title.name 
# u'title' 

soup.title.string 
# u'The Dormouse's story' 

soup.title.parent.name 
# u'head' 

soup.p 
# <p class="title"><b>The Dormouse's story</b></p> 

soup.p['class'] 
# u'title' 

soup.a 
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a> 

soup.find_all('a') 
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>, 
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, 
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>] 

soup.find(id="link3") 
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 
相關問題