BeautifulSoup錯誤（CGI逃亡）

得到以下錯誤：BeautifulSoup錯誤（CGI逃亡）

Traceback (most recent call last):
File "stack.py", line 31, in ?
print >> out, "%s" % escape(p) File
"/usr/lib/python2.4/cgi.py", line
1039, in escape
s = s.replace("&", "&") # Must be done first! TypeError: 'NoneType'
object is not callable

對於下面的代碼：

import urllib2 
from cgi import escape # Important! 
from BeautifulSoup import BeautifulSoup 

def is_talk_anchor(tag): 
return tag.name == "a" and tag.findParent("dt", "thumbnail") 

def talk_description(tag): 
return tag.name == "p" and tag.findParent("h3") 

links = [] 
desc = [] 

for pagenum in xrange(1, 5): 
soup = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks?page=%d" % pagenum)) 
links.extend(soup.findAll(is_talk_anchor)) 
page = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks/arvind_gupta_turning_trash_into_toys_for_learning.html")) 
desc.extend(soup.findAll(talk_description)) 

out = open("test.html", "w") 

print >>out, """<html><head><title>TED Talks Index</title></head> 
<body> 
<table> 
<tr><th>#</th><th>Name</th><th>URL</th><th>Description</th></tr>""" 

for x, a in enumerate(links): 
    print >> out, "<tr><td>%d</td><td>%s</td><td>http://www.ted.com%s</td>" % (x + 1, escape(a["title"]), escape(a["href"])) 

for y, p in enumerate(page): 
    print >> out, "<td>%s</td>" % escape(p) 

print >>out, "</tr></table>"

我認爲這個問題是% escape(p)。我試圖把<p>的內容拿出來。我不應該使用逃脫？

還分別具有與行的問題：

page = BeautifulSoup(urllib2.urlopen("%s") % a["href"])

這就是我想做的事，但同樣運行到錯誤並想知道是否有這樣做的另一種方式。試圖收集我從前面的代碼中找到的鏈接，並再次通過BeautifulSoup運行它。

來源

2011-05-03 EGP

你的縮進搞砸了吧？ – 2011-05-03 04:59:41

您必須調查（使用pdb）爲什麼您的一個鏈接返回爲None實例。

特別是：追蹤是自我說話。 escape（）被調用None。因此，您必須調查哪些參數是無......這是「鏈接」中的項目之一。那麼爲什麼你的一個項目沒有？

很可能是因爲您的通話之一

def is_talk_anchor(tag): 
    return tag.name == "a" and tag.findParent("dt", "thumbnail")

回報無因tag.findParent（「DT」，「縮略圖」）返回無（由於您指定的HTML輸入）。

因此，您必須檢查或過濾「連接」中的項目爲無（或調整上面的解析器代碼）以便根據您的需要僅拾取現有鏈接。

請仔細閱讀您的回溯並思考問題可能是什麼 - 回溯是非常有用的，併爲您提供有關您問題的寶貴信息。

來源

2011-05-03 04:49:00

BeautifulSoup錯誤（CGI逃亡）

回答

相關問題