我已經撰寫了以下試用代碼,以從歐洲議會撤回立法行爲的標題。使用BeautifulSoup抓取數據的問題
import urllib2
from BeautifulSoup import BeautifulSoup
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-%.4d&language=EN"
for number in xrange(1,10):
url = search_url % number
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
title = soup.findAll("title")
print title
但是,每當我運行它,我得到以下錯誤:
Traceback (most recent call last):
File "<stdin>", line 20, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 70: ordinal not in range(128)
我已經把範圍縮小到BeautifulSoup不能夠在循環讀取第四文檔。任何人都可以向我解釋我做錯了什麼?
隨着親切的問候
托馬斯
親愛Unutbu,謝謝你的提示,我都工作。奇怪... – 2010-07-02 08:39:49