2011-04-26 30 views
2

我試圖解析http://www.ted.com/talks頁會談的所有名稱。使用BeautifulSoup,這裏是我有:故障排除AttributeError的:「結果集」對象有沒有屬性「的findAll」

import urllib2 

from BeautifulSoup import BeautifulSoup 

page = urllib2.urlopen("http://www.ted.com/talks") 

soup = BeautifulSoup(page) 

link = soup.findAll(lambda tag: tag.name == 'a' and tag.findParent('dt', 'thumbnail')) 

for anchor in link.findAll('a', title = True): 
    print anchor['title'] 

最初的「鏈接」顯示八個視頻有塊的一個很好的陣列。然後,我嘗試通過這個並拿出標籤中的標題,使用上面的代碼,這給了我以下錯誤:

for anchor in link.findAll('a', title=True): 
AttributeError: 'ResultSet' object has no attribute 'findAll' 

我在做什麼錯?

回答

3

linkTag對象,你需要遍歷集合。例如:

for anchor in link: 
    print anchor['title'] 
+0

這提供了以下錯誤:打印[ '標題' ] NameError:名稱 'A' 沒有定義 – EGP 2011-04-26 22:41:31

+0

@Adam遺憾,這是一個錯字。現在修復。 – interjay 2011-04-26 22:42:53

+0

看起來很迷人。謝謝!瞭解我可以在哪裏瞭解更多關於'錨'的語法?舉例來說:假設我想要的而不是標籤。我只想改變的findAll(「IMG」?只是好奇,瞭解更多信息。 – EGP 2011-04-26 23:55:40

0

通過比較的方式,pyparsing的方法是這樣的:

from contextlib import closing 
import urllib2 
from pyparsing import makeHTMLTags, withAttribute 

# pull HTML from web page 
with closing(urllib2.urlopen("http://www.ted.com/talks")) as page: 
    html = page.read() 

# define opening and closing tags 
dt,dtEnd = makeHTMLTags("dt") 
a,aEnd = makeHTMLTags("a") 

# restrict <dt> tag matches to those with class='thumbnail' 
dt.setParseAction(withAttribute(**{'class':'thumbnail'})) 

# define pattern of <dt> tag followed immediately by <a> tag 
patt = dt + a("A") 

# scan input html for matches of this pattern, and access 
# attributes of the <A> tag 
for match,s,e in patt.scanString(html): 
    print match.A.title 
    print match.A.href 
    print 

,並提供:

Bruce Schneier: The security mirage 
/talks/bruce_schneier.html 

Harvey Fineberg: Are we ready for neo-evolution? 
/talks/harvey_fineberg_are_we_ready_for_neo_evolution.html 

Ric Elias: 3 things I learned while my plane crashed 
/talks/ric_elias.html 

Anil Ananthaswamy: What it takes to do extreme astrophysics 
/talks/anil_ananthaswamy.html 

John Hunter on the World Peace Game 
/talks/john_hunter_on_the_world_peace_game.html 

Kathryn Schulz: On being wrong 
/talks/kathryn_schulz_on_being_wrong.html 

Sam Richards: A radical experiment in empathy 
/talks/sam_richards_a_radical_experiment_in_empathy.html 

Susan Lim: Transplant cells, not organs 
/talks/susan_lim.html 

Marcin Jakubowski: Open-sourced blueprints for civilization 
/talks/marcin_jakubowski.html 

Roger Ebert: Remaking my voice 
/talks/roger_ebert_remaking_my_voice.html 
+0

感謝額外的觀點:) – EGP 2011-04-26 23:54:00

相關問題