1
我的CSV最初是由Excel創建的。預計編碼異常,我打開並重新保存文件與UTF-8 BOM encoding
使用Sublime文本。正確編碼sc.textFile數據(python 2.7)
導入到筆記本:
filepath = "file:///Volumes/PASSPORT/Inserts/IMAGETRAC/csv/universe_wcsv.csv"
uverse = sc.textFile(filepath)
header = uverse.first()
data = uverse.filter(lambda x:x<>header)
格式化我的領域:
fields = header.replace(" ", "_").replace("/", "_").split(",")
結構化數據:
import csv
from StringIO import StringIO
from collections import namedtuple
Products = namedtuple("Products", fields, verbose=True)
def parse(row):
reader = csv.reader(StringIO(row))
row = reader.next()
return Products(*row)
products = data.map(parse)
如果我那麼做products.first()
,我會得到第一筆記錄。但是,如果我想,說,看count by brand
等運行:
products.map(lambda x: x.brand).countByValue()
我仍然得到了UnicodeEncodeError
相關Py4JJavaError
:
File "<ipython-input-18-4cc0cb8c6fe7>", line 3, in parse
UnicodeEncodeError: 'ascii' codec can't encode character u'\xab' in
position 125: ordinal not in range(128)
我怎樣才能解決這個代碼?