得到以下錯誤:BeautifulSoup錯誤(CGI逃亡)
Traceback (most recent call last):
File "stack.py", line 31, in ?
print >> out, "%s" % escape(p) File
"/usr/lib/python2.4/cgi.py", line
1039, in escape
s = s.replace("&", "&") # Must be done first! TypeError: 'NoneType'
object is not callable
對於下面的代碼:
import urllib2
from cgi import escape # Important!
from BeautifulSoup import BeautifulSoup
def is_talk_anchor(tag):
return tag.name == "a" and tag.findParent("dt", "thumbnail")
def talk_description(tag):
return tag.name == "p" and tag.findParent("h3")
links = []
desc = []
for pagenum in xrange(1, 5):
soup = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks?page=%d" % pagenum))
links.extend(soup.findAll(is_talk_anchor))
page = BeautifulSoup(urllib2.urlopen("http://www.ted.com/talks/arvind_gupta_turning_trash_into_toys_for_learning.html"))
desc.extend(soup.findAll(talk_description))
out = open("test.html", "w")
print >>out, """<html><head><title>TED Talks Index</title></head>
<body>
<table>
<tr><th>#</th><th>Name</th><th>URL</th><th>Description</th></tr>"""
for x, a in enumerate(links):
print >> out, "<tr><td>%d</td><td>%s</td><td>http://www.ted.com%s</td>" % (x + 1, escape(a["title"]), escape(a["href"]))
for y, p in enumerate(page):
print >> out, "<td>%s</td>" % escape(p)
print >>out, "</tr></table>"
我認爲這個問題是% escape(p)
。我試圖把<p>
的內容拿出來。我不應該使用逃脫?
還分別具有與行的問題:
page = BeautifulSoup(urllib2.urlopen("%s") % a["href"])
這就是我想做的事,但同樣運行到錯誤並想知道是否有這樣做的另一種方式。試圖收集我從前面的代碼中找到的鏈接,並再次通過BeautifulSoup運行它。
你的縮進搞砸了吧? – 2011-05-03 04:59:41