我正在寫一個程序來用python刮維基百科表。一切工作正常,除了一些似乎似乎沒有被python正確編碼的字符。Python:問題與字符編碼
下面是代碼:
import csv
import requests
from BeautifulSoup import BeautifulSoup
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
url = 'https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('table', attrs={'class': 'wikitable sortable'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = cell.text.replace(' ', '')
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./scrapedata.csv", "wb")
writer = csv.writer(outfile)
print list_of_rows
writer.writerows(list_of_rows)
例如Merzbrück
被編碼爲Merzbrück
。 這個問題或多或少似乎與scandics(é,è,ç,à等)有關。有沒有辦法可以避免這種情況? 在此先感謝您的幫助。