我試圖找出從這裏開始的位置,我已經用盡了搜索,所以想要對可能的後續步驟甚至是更好的選擇。UnicodeDecode錯誤:在寫入xls(x)時與非英文字符相關的解碼
劇情簡介:我使用python從網站上刮取結果,然後將這些數據寫入xls(x)doc。我在csv上選擇了xls(x),因爲我的csv在保存時保留了非英文字符。
我已經成功地在英文頁上成功運行此代碼,但只要我點擊非英文字符,就會在write()上引發以下錯誤。
要注意,我也試過string.decode('utf-8')
,但是會拋出「'ascii'編解碼器無法編碼字符」的錯誤。
問題:我需要做些什麼才能正確地將這些寫入xls(x)?我已經能夠在沒有問題的情況下做到這一點,但正如我所提到的,保存它會改變格式。我是否需要對它進行不同的編碼,以便write()函數正確傳輸它?
對於下面的代碼,我已經導入了scrapy,編解碼器,xlsxwriter(Workbook)以及其他一些代碼。
# set xpaths:
item_1 = 'xpath'
item_2 = 'xpath'
item_3 = 'xpath'
item_4 = 'xpath'
pagination_lookup = {}
results = []
def write_to_excel(list_of_dicts,filename):
filename = filename + '.xlsx'
ordered_list = list(set().union(*(d.keys() for d in list_of_dicts))) # OR set up as actual list of keys (e.g. ['Listing Title','Item Price', etc.])
wb=Workbook(filename)
ws=wb.add_worksheet("Sheet 1") #or leave it blank, default name is "Sheet 1"
first_row=0
for header in ordered_list:
col=ordered_list.index(header) # to keep order
ws.write(first_row,col,header) # to write first row/header
row=1
for each_dict in list_of_dicts:
for _key,_value in each_dict.items():
col=ordered_list.index(_key)
ws.write(row,col,_value)
row+=1 #enter the next row
wb.close()
name = 'Scraper'
# AREA FOR CODE TO GATHER AND SCRAPE URLS (taken out for brevity)
driver.get(clean_url)
time.sleep(2)
selectable_page = Selector(text=driver.page_source)
ResultsDict = {}
ResultsDict['item_1'] = selectable_page.xpath(item_1).extract_first().encode('utf-8')
ResultsDict['item_2'] = selectable_page.xpath(item_2).extract_first().encode('utf-8')
ResultsDict['item_3'] = selectable_page.xpath(item_3).extract_first().encode('utf-8')
ResultsDict['item_4'] = selectable_page.xpath(item_4).extract_first().encode('utf-8')
results.append(ResultsDict)
print ResultsDict
write_to_excel(results,'Scraped_results')
代碼擱淺在這個錯誤,這是一個值與任何類型的非英文字符的觸發(例如,N,O,A,等)
Traceback (most recent call last): File "/Users/name/scraper1/scraper1/spiders/scraped_results.py", line 128, in write_to_excel(results,'Scraped_results') [...] File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 369, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 39: ordinal not in range(128)
問題:我需要做些什麼來正確地讓這些寫入xls(x)?我已經能夠在沒有問題的情況下做到這一點,但正如我所提到的,保存它會改變格式。我是否需要對它進行不同的編碼,以便write()函數正確傳輸它?
你遺漏了回溯中最重要的部分! *你的*代碼行中的哪一行產生了錯誤? –
@MarkRansom更新! – Winklevoss333