Python：問題與字符編碼

我正在寫一個程序來用python刮維基百科表。一切工作正常，除了一些似乎似乎沒有被python正確編碼的字符。Python：問題與字符編碼

下面是代碼：

import csv 
import requests 
from BeautifulSoup import BeautifulSoup 
import sys 

reload(sys) 
sys.setdefaultencoding("utf-8") 

url = 'https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_A' 
response = requests.get(url) 
html = response.content 

soup = BeautifulSoup(html) 
table = soup.find('table', attrs={'class': 'wikitable sortable'}) 

list_of_rows = [] 
for row in table.findAll('tr'): 
    list_of_cells = [] 
    for cell in row.findAll('td'): 
     text = cell.text.replace('&nbsp;', '') 
     list_of_cells.append(text) 
    list_of_rows.append(list_of_cells) 

outfile = open("./scrapedata.csv", "wb") 
writer = csv.writer(outfile) 
print list_of_rows 
writer.writerows(list_of_rows)

例如Merzbrück被編碼爲MerzbrÃ¼ck。這個問題或多或少似乎與scandics（é，è，ç，à等）有關。有沒有辦法可以避免這種情況？在此先感謝您的幫助。

來源

2016-02-24 Tauseef Hussain

這當然是一個編碼問題。問題是，其中是。我的建議是，你要完成每一步，看看原始數據，看看你是否能夠找出編碼問題的確切位置。

因此，例如，打印response.content以查看符號是否與requests對象中的期望值相符。如果是這樣，繼續前進，並檢查出soup.prettify()以查看BeautifulSoup對象是否正常，然後list_of_rows等

所有這一切說，我的懷疑是，問題與寫入csv。 csv documentation有一個如何將unicode寫入csv的例子。 This answer也可能會幫助您解決問題。

對於它的價值，我能寫正確的符號使用pandas庫（我使用python 3讓你的經驗或語法可能會有點不同，因爲它看起來像你使用到csv蟒蛇2）：

import pandas as pd 

df = pd.DataFrame(list_of_rows) 
df.to_csv('scrapedata.csv', encoding='utf-8')

來源

2016-02-25 20:30:39 dagrha

回答

相關問題