同時使用pd.read_csv

我使用pd.read_csv CSV文件中讀取從外部數據源，如下面的代碼跳過與劣棗行，還有一個日期misformatted，導致以下錯誤：同時使用pd.read_csv

ValueError: Error parsing datetime string "2015-08-2" at position 8

這導致整個應用程序崩潰。當然，我可以用try/except來處理這個案例，但是我會失去那個特定csv中的所有其他數據。我需要熊貓來保存和解析其他數據。

我沒有辦法預測何時/何地這些數據（每天更改）的日期格式不正確。有什麼辦法可以讓pd.read_csv只跳過日期不正確的行，但仍然可以解析csv中的所有其他行？

來源

2015-12-24 LateCoder

檢查['read_cvs']（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html）的'skiprows'參數。您可以傳遞一個行號不良的日期列表，但您需要知道行號。 –

潛在的格式是什麼？ –

somewhere in the csv that's being sent, there is a misformatted date

np.datetime64需要ISO8601 formatted串正常工作。好消息是，你可以在自己的函數包np.datetime64，並以此爲date_parser：

def parse_date(v): 
    try: 
     return np.datetime64(v) 
    except: 
     # apply whatever remedies you deem appropriate 
     pass 
    return v 

    pd.read_csv(
    ... 
    date_parser=parse_date 
    )

I need pandas to keep and parse that other data.

我經常發現，更靈活的日期解析器像dateutil作品比np.datetime64更好，甚至可以不工作額外的功能：

import dateutil 
pd.read_csv(
    BytesIO(raw_data), 
    parse_dates=['dates'], 
    date_parser=dateutil.parser.parse, 
)

來源

2015-12-24 22:50:45 miraculixx

downvote所有你喜歡的，請添加評論，以便我可以改善答案。謝謝。 – miraculixx

我不想更改代碼的核心功能。我需要爲項目特定的原因使用'np.datetime64'，以便需要保留。 – LateCoder

對不起，並不意味着downvote。除了建議擺脫'np.datetime64'外，這是一個很好的解決方案。謝謝！ – LateCoder

這裏的另一種方式來做到這一點使用pd.convert_objects（）方法：

# make good and bad date csv files 
# read in good dates file using parse_dates - no problem 
df = pd.read_csv('dategood.csv', parse_dates=['dates'], date_parser=np.datetime64) 

df.dtypes 

dates datetime64[ns] 
data   float64 
dtype: object 

# try same code on bad dates file - throws exceptions 
df = pd.read_csv('datebad.csv', parse_dates=['dates'], date_parser=np.datetime64) 

ValueError: Error parsing datetime string "Q%Bte0tvk5" at position 0 

# read the file first without converting dates 
# then use convert objects to force conversion 
df = pd.read_csv('datebad.csv') 
df['cdate'] = df.dates.convert_objects(convert_dates='coerce') 

# resulting new date column is a datetime64 same as good data file 
df.dtype 

dates   object 
data   float64 
cdate datetime64[ns] 
dtype: object 

# the bad date has NaT in the cdate column - can clean it later 
df.head() 

     dates  data  cdate 
0 2015-12-01 0.914836 2015-12-01 
1 2015-12-02 0.866848 2015-12-02 
2 2015-12-03 0.103718 2015-12-03 
3 2015-12-04 0.514086 2015-12-04 
4 Q%Bte0tvk5 0.583617  NaT

來源

2015-12-25 01:45:36

同時使用pd.read_csv

回答

相關問題