2017-10-15 71 views
1

如何在將三列轉換爲datetime時在多索引列中刪除一個級別?下面的示例只包含三列,而在我的日期框中,當然有更多列,而其他列使用兩個級別名稱。熊貓to_datetime multiindex

>>> import pandas as pd 
    >>> df = pd.DataFrame([[2010, 1, 2],[2011,1,3],[2012,2,3]]) 
    >>> df.columns = [['year', 'month', 'day'],['y', 'm', 'd']] 
    >>> print(df) 
     year month day 
      y  m d 
    0 2010  1 2 
    1 2011  1 3 
    2 2012  2 3 
    >>> pd.to_datetime(df[['year', 'month', 'day']]) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 512, in to_datetime 
    result = _assemble_from_unit_mappings(arg, errors=errors) 
    File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in _assemble_from_unit_mappings 
    unit = {k: f(k) for k in arg.keys()} 
    File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 582, in <dictcomp> 
    unit = {k: f(k) for k in arg.keys()} 
    File "/usr/lib64/python2.7/site-packages/pandas/core/tools/datetimes.py", line 577, in f 
    if value.lower() in _unit_map: 
AttributeError: 'tuple' object has no attribute 'lower' 

編輯:添加更多的列更好地說明:

>>> df = pd.DataFrame([[2010, 1, 2, 10, 2],[2011,1,3,11,3],[2012,2,3,12,2]]) 
>>> df.columns = [['year', 'month', 'day', 'temp', 'wind_speed'],['', '', '', 'degc','m/s']] 
>>> print(df) 
    year month day temp wind_speed 
        degc  m/s 
0 2010  1 2 10   2 
1 2011  1 3 11   3 
2 2012  2 3 12   2 

我需要的是前三列相結合,日期時間指數,留下最後兩個欄數據。

+0

你能與所需的輸出添加更多的數據? – jezrael

+0

謝謝,我也爲此添加了解決方案。 – jezrael

回答

3

使用droplevel用於刪除第二層次:

df.columns = df.columns.droplevel(1) 
df = pd.to_datetime(df[['year', 'month', 'day']]) 
print (df) 
0 2010-01-02 
1 2011-01-03 
2 2012-02-03 
dtype: datetime64[ns] 

如果只有3 columns

df.columns = df.columns.droplevel(1) 
df = pd.to_datetime(df) 
print (df) 

0 2010-01-02 
1 2011-01-03 
2 2012-02-03 
dtype: datetime64[ns] 

如果列:

df = pd.DataFrame([[2010, 1, 2,3],[2011,1,3,5],[2012,2,3,7]]) 
df.columns = [['year', 'month', 'day','a'],['y', 'm', 'd', 'b']] 
print(df) 
    year month day a 
     y  m d b 
0 2010  1 2 3 
1 2011  1 3 5 
2 2012  2 3 7 

#select datetime columns only 
df1 = df[['year', 'month', 'day']] 
df1.columns = df1.columns.droplevel(1) 
print (df1) 
    year month day 
0 2010  1 2 
1 2011  1 3 
2 2012  2 3 

#convert to Series 
s1 = pd.to_datetime(df1) 
#set new MultiIndex 
s1.name=('date','dat') 
print (s1) 
0 2010-01-02 
1 2011-01-03 
2 2012-02-03 
Name: (date, dat), dtype: datetime64[ns] 

#remove original columns and add new datetime Series 
df = df.drop(['year', 'month', 'day'], axis=1, level=0).join(s1) 
print (df) 
    a  date 
    b  dat 
0 3 2010-01-02 
1 5 2011-01-03 
2 7 2012-02-03 

與轉另一種解決辦法,應該是在大數據幀slowier:

df1 = df[['year', 'month', 'day']] 
s1 = pd.to_datetime(df1.T.reset_index(drop=True, level=1).T).rename(('date', 'dat')) 
print (s1) 
0 2010-01-02 
1 2011-01-03 
2 2012-02-03 
Name: (date, dat), dtype: datetime64[ns] 

df1 = df.join(s1) 
print (df1) 
    year month day temp wind_speed  date 
        degc  m/s  dat 
0 2010  1 2 10   2 2010-01-02 
1 2011  1 3 11   3 2011-01-03 
2 2012  2 3 12   2 2012-02-03 
+0

謝謝@jezrael的幫助。它工作,不知何故,我只是認爲它可以一步完成。但這也很好。 – crayxt