熊貓 - 在運行時自動檢測日期列** **

我想知道熊貓是否能夠自動檢測哪些列是日期時間對象，並在日期而不是字符串中讀取這些列？熊貓 - 在運行時自動檢測日期列** **

我正在查看API和相關的堆棧溢出帖子，但我似乎無法弄清楚。

這是一個黑盒系統，它可以生產任意的csv模式，所以我不知道列名是。

這似乎是它的工作，但你必須知道哪些列日期字段：

import pandas as pd 

#creating the test data 
df = pd.DataFrame({'0': ['a', 'b', 'c'], '1': ['2015-12-27','2015-12-28', '2015-12-29'], '2': [11,12,13]}) 
df.to_csv('test.csv', index=False) 

#loading the test data 
df = pd.read_csv('test.csv', parse_dates=True) 
print df.dtypes 
# prints (object, object, int64) instead of (object,datetime, int64)

我在想，如果不能做到這一點，那麼我可以寫的東西：

查找字符串類型的列。

獲取一些獨特的值並嘗試解析它們。

如果成功，然後嘗試解析整個列。

編輯。我寫了一個簡單的方法convertDateColumns，將做到這一點：

import pandas as pd 
from dateutil import parser 

def convertDateColumns(self, df): 
    object_cols = df.columns.values[df.dtypes.values == 'object'] 
    date_cols = [c for c in object_cols if testIfColumnIsDate(df[c], num_tries=3)] 

    for col in date_cols: 
     try: 
      df[col] = pd.to_datetime(df[col], coerce=True, infer_datetime_format=True) 
     except ValueError: 
      pass 

    return df 

def testIfColumnIsDate(series, num_tries=4): 
""" Test if a column contains date values. 
    This can try a few times for the scenerio where a date column may have 
    a couple of null or missing values but we still want to parse when 
    possible (and convert those null/missing to NaD values) 
""" 
    if series.dtype != 'object': 
     return False 

    vals = set() 
    for val in series: 
     vals.add(val) 
     if len(vals) > num_tries: 
      break 

    for val in list(vals): 
     try: 
      if type(val) is int: 
       continue 

      parser.parse(val) 
      return True 
     except ValueError: 
      pass 

    return False

來源

2015-10-18 anthonybell

什麼是「不行！」具體是指？ – ako

它不會轉換日期時間列，除非您明確地給它一個日期時間列的列表。 – anthonybell

我發佈了一個重現問題的代碼示例。正如你所看到的那樣，日期時間字符串的列不會被轉換爲日期時間列。 – anthonybell

我會用pd.to_datetime，並捉對不工作列例外。例如：

import pandas as pd 

df = pd.read_csv('test.csv') 

for col in df.columns: 
    if df[col].dtype == 'object': 
     try: 
      df[col] = pd.to_datetime(df[col]) 
     except ValueError: 
      pass 

df.dtypes 
# (object, datetime64[ns], int64)

我相信這是接近「自動」，你可以得到這個應用程序。

來源

2015-10-24 15:04:39 jakevdp

您可以避免使用for循環，並使用參數errors='ignore'來避免修改不需要的值。在下面的代碼中，我們在所有對象列上應用to_datetime轉換（忽略錯誤）（其他列按原樣返回）。

如果「忽略」，然後無效解析將對返回輸入

df = df.apply(lambda col: pd.to_datetime(col, errors='ignore') 
       if col.dtypes == object 
       else col, 
       axis=0) 

df.dtypes 

# 0   object 
# 1 datetime64[ns] 
# 2    int64

來源

2016-12-19 20:51:30 Romain

熊貓 - 在運行時自動檢測日期列** **

回答

相關問題

熊貓 - 在運行時自動檢測日期列