BeautifulSoup webscraping ......想起來了文本

我試圖提取BeautifulSoup webscraping ......想起來了文本

<a href="/reviews/28th-and-b-st-skatepark/"> 

    28th & B St Skatepark  #This is what I'm trying to grab, just the text. 

</a>

我的代碼

import urllib2 
from bs4 import BeautifulSoup 

url1 = "http://www.thrashermagazine.com/skateparks/search-results_m94/?cat=61&jr_state=CA&order=alpha&query=all" 
content1 = urllib2.urlopen(url1).read() 
soup = BeautifulSoup(content1) 
print soup.findAll('a')

我得到這樣的回報。

</a>, <a href="http://www.thrashermagazine.com/"><img alt="Thrasher Magazine Logo" src="/templates/HomePage/images/templatesImages/Header_logo.jpg" style="border:0px;"/></a>, <a href="javascript:void();" onclick="secondFunction();">Log in</a>, <a href="/Register/">Register</a>, <a href="http://www.thrashermagazine.com/"><span>Home</span></a>, <a href="http://shop.thrashermagazine.com"><span>Store</span></a>, <a href="/component/option,com_hwdvideoshare/Itemid,93/"><span>Thrasher Skateboard Magazine | Videos</span></a>, <a href="/tags/features/"><span>Features</span></a>, <a href="/component/option,com_jevents/Itemid,100/task,week.listevents/"><span>Thrasher Skateboard Magazine | Events</span></a>,

據我所知，這正是我要問我的腳本做的，但我想知道如果有一種方式來獲得，只是我已經指出，而不是與標籤有關的一切文字。

來源

2013-12-10 Matt

您可以隨時參考[documentation]（https://beautiful-soup-4.readthedocs.org） – justhalf

使用.text屬性。例如：

import urllib2 
from BeautifulSoup import BeautifulSoup 

url1 = "http://www.thrashermagazine.com/skateparks/search-results_m94/?cat=61&jr_state=CA&order=alpha&query=all" 
content1 = urllib2.urlopen(url1).read() 
soup = BeautifulSoup(content1) 
print [e.text for e in soup.findAll('a')]

來源

2013-12-10 06:20:54

BeautifulSoup webscraping ......想起來了文本

回答

相關問題