遇到問題從裏面提取文本刮html標籤使用美麗的湯

此方法返回的條目與此類似

<li class="title"><h4><a href="/addons/wow/world-quest-tracker">World Quest Tracker</a></h4></li>

我的列表中的代碼試圖提取中間的href標籤中的文字，在這種情況下，

World Quest Tracker

我怎麼能完成這個？

來源

2017-10-14 Lost Boy

試試這個。

from bs4 import BeautifulSoup 

html=''' 
<li class="title"><h4><a href="/addons/wow/world-quest-tracker">World Quest Tracker</a></h4></li> 
''' 
soup = BeautifulSoup(html, "lxml") 
for item in soup.select(".title"): 
    print(item.text)

結果：

World Quest Tracker

來源

2017-10-14 06:36:38 SIM

html_doc = '<li class="title"><h4><a href="/addons/wow/world-quest-tracker">World Quest Tracker</a></h4></li>' 
soup = BeautifulSoup(html_doc, 'html.parser') 
print soup.find('a').text

這將打印

u'World任務追蹤」

來源

2017-10-14 06:12:38

我試圖提取文本其間將href標籤

如果你確實想在href屬性的文字，而不是文本內容由<a></a>錨定（您的措辭有點不清楚），請使用get('href')：

from bs4 import BeautifulSoup 

html = '<li class="title"><h4><a href="/addons/wow/world-quest-tracker">World Quest Tracker</a></h4></li>' 
soup = BeautifulSoup(html, 'lxml') 
soup.find('a').get('href') 

'/addons/wow/world-quest-tracker'

來源

2017-10-14 06:39:20

遇到問題從裏面提取文本刮html標籤使用美麗的湯

回答

相關問題