2016-11-26 78 views
2

我正在處理一些我以csv格式從網上下載的數據。原始數據如下所示。如何在沒有解析日期字符串的情況下調用pandas read_csv()

Test Data 
"Date","T1","T2","T3","T4","T5","T6","T7","T8" 
"105/11/01","123,855","1,150,909","9.30","9.36","9.27","9.28","-0.06","60", 
"105/11/02","114,385","1,062,118","9.26","9.42","9.23","9.31","+0.03","78", 
"105/11/03","71,350","659,848","9.30","9.30","9.20","9.28","-0.03","42", 

我用下面的代碼讀取它

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5]) 

我也曾嘗試使用

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], keep_date_col=True) 

我總是得到下面的結果

  Date T3 T4 T5 
105/11/01 9.30 9.36 9.27 NaN 
105/11/02 9.26 9.42 9.23 NaN 
105/11/03 9.30 9.30 9.20 NaN 

這是什麼我想得到

 Date T3 T4 T5 
105/11/01 9.30 9.36 9.27 
105/11/02 9.26 9.42 9.23 
105/11/03 9.30 9.30 9.20 

正如你可以看到大熊貓治療日期字符串的數據不是一個組成部分,轉移該指數將一個左邊這導致最後一列是NaN

我已閱讀read_csv()上的熊貓文檔,發現它可以用parse_dates,keep_date_col參數解析日期,但有什麼辦法可以解析日期嗎?

+1

我認爲你的問題完全是關於數據行,但沒有尾隨分隔符標題。請參閱http://stackoverflow.com/questions/13719946/python-pandas-trailing-delimiter-confuses-read-csv –

回答

2

這似乎很好地工作:從幫助文檔

import pandas as pd 
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], index_col=False) 

df 
#  Date  T3  T4  T5 
#0 105/11/01 9.30 9.36 9.27 
#1 105/11/02 9.26 9.42 9.23 
#2 105/11/03 9.30 9.30 9.20 

而且這樣的:

index_col : int or sequence or False, default None 
    Column to use as the row labels of the DataFrame. If a sequence is given, a 
    MultiIndex is used. If you have a malformed file with delimiters at the end 
    of each line, you might consider index_col=False to force pandas to _not_ 
    use the first column as the index (row names)