1
在以下數據中,日期和時間位於不同的列中,並且我將它們梳理以獲取完整的日期時間,以使得結果列的類型爲'datetime64 [ns]'。然而有時候會有空白日期和時間的記錄,在這種情況下,結果列的類型是'object',本質上是一個字符串對象。如果所有記錄都存在,並且不存在,我該如何處理這個問題?在python pandas中清理日期和時間記錄
樣本數據
CARD,IN Date,IN Time,OUT Date,OUT Time
100001,30-04-2015,14:19:18,01-05-2015,00:10:56
100002,30-04-2015,11:27:52,,
100003,30-04-2015,17:59:47,01-05-2015,04:51:52
100004,30-04-2015,16:15:25,,
100005,30-04-2015,10:25:13,01-05-2015,01:25:13
100006,30-04-2015,16:59:10,,
100007,30-04-2015,13:22:06,,
100008,30-04-2015,09:15:29,,
100009,30-04-2015,17:01:10,01-05-2015,01:51:01
100010,30-04-2015,13:13:30,01-05-2015,01:37:28
100011,30-04-2015,09:37:28,01-05-2015,00:37:28
100012,30-04-2015,18:55:44,01-05-2015,03:22:22
100013,30-04-2015,14:28:16,01-05-2015,01:27:18
100014,30-04-2015,09:02:13,01-05-2015,00:02:13
100015,30-04-2015,09:04:10,01-05-2015,00:04:10
100016,30-04-2015,18:51:56,01-05-2015,09:51:56
100017,30-04-2015,09:12:51,01-05-2015,00:12:51
100018,30-04-2015,10:40:31,01-05-2015,01:40:31
100019,30-04-2015,10:35:56,01-05-2015,01:35:56
100020,30-04-2015,17:50:03,01-05-2015,03:54:54
100021,30-04-2015,17:00:16,01-05-2015,02:45:35
100022,30-04-2015,11:18:41,01-05-2015,01:15:52
以下是我現在的代碼:
import numpy as np
import pandas as pd
from datetime import datetime
#CARD,IN Date,IN Time,OUT Date,OUT Time
data = pd.read_csv('DATA.csv', parse_dates=[['IN Date','IN Time'],['OUT Date','OUT Time'],'IN Date','OUT Date'], keep_date_col=True)
data.rename(columns={'IN Date_IN Time':'IN','OUT Date_OUT Time':'OUT'}, inplace=True)
data = data[['CARD','IN Date', 'IN', 'OUT Date', 'OUT']]
#This line will fail when all the records are present
data.ix[(data.OUT == 'nan nan'), 'OUT'] = np.nan