2014-06-27 74 views
2

我想從下面的示例csv中創建一個數據框,但我得到了錯誤標記數據。 C錯誤:在字符串0開始的字符串內的EOF。我沒有太多的治療壞行的做法,但真的很想學習處理這種事情的最佳方法。我在read_csv中嘗試了很多不同的選項,例如error_bad_line = False,但這也沒有奏效。Python Pandas使用特定行終止符讀取CSV文件

CParserError: Error tokenizing data. C error: EOF inside string starting at line 0 

我猜測的,」該行終止導致該問題,我猜,最好的辦法是通過各條線和工藝循環,所以我想出了下面發生器幫助來自不同並希望我靠近真的想學習如何使用一臺發電機和屈服的,這也

的樣本數據:。

"USNC3255","27","US","NC","LANDS END","72305006","KNJM","KNCA","KNKT","T72305006","","","NCC031","NCZ095","","545","28594","America/New_York","34.65266","-77.07661","7","RDU","893727"," 
"USNC3256","27","US","NC","LANDSDOWN","72314058","KEHO","KAKH","KIPJ","T72314058","","","NCC045","NCZ068","sc007","517","28150","America/New_York","35.29374","-81.46537","797","CLT","317845"," 

我已經制作了下面這最後兩個字符,但不知道刪除熱從線產生數據幀:

def big_table_generator(filename): 
    with open(filename, 'rt') as f: 
     for line in f: 
      yield line[:-3] 

gen = big_table_generator('../data/test_sun_file.csv') 
pd.DataFrame(gen) 
+0

你能解釋一下如何在樣本數據中的數據點被格式化,並且你所期望的數據幀尋找喜歡? – dustyrockpyle

+0

我不確定數據點格式化的含義。這些行只是文件中用逗號分隔的值和引號字符的行。嘗試僅生成一個數據幀,其中的列由與任何讀取csv進程類似的值填充。 – horatio1701d

回答

0

這裏是我想出了一個解決方案,但我真的想避免使用列表和添加,並利用發電機代替,但沒有足夠的舒適與發電機工作。

def parse_file(filename): 

    newline = [] 

    with open(filename, 'rb') as f: 
     reader = csv.reader(f, quoting=csv.QUOTE_NONE) 
     for row in reader: 
      newline.append([s.strip('"') for s in row[:-1]]) 
    df = pd.DataFrame(newline) 
    df = df.applymap(lambda x: nan if len(x) == 0 else x).astype(object) 
    return df 

df = parse_file(filename) 
3

我有一個類似的錯誤。通過在read_csv中使用選項quoting = csv.QUOTE_NONE來修復它。

例如:

df = pd.read_csv(csvfile, header = None, delimiter="\t", quoting=csv.QUOTE_NONE, encoding='utf-8') 

的一些信息,爲什麼在第二個評論這裏:https://github.com/pydata/pandas/issues/5500