2017-09-18 45 views
0

我有一個數據集/ pandas df〜50列 - 列是字符,數字和日期的組合。其中5列是日期,標記爲Meeting1-Meeting5,我試圖計算會議日期之間的日期。ValueError:值的長度與索引|的長度不匹配計算日期之間的差異

我DF通常是這樣的:

ID_number Meeting1 Meeting2 Meeting3 Meeting4 Meeting5 Comments … 
123456789 2014-09-17 2015-04-22 2015-05-30 NaN   NaN   text text … 
987654321 2015-09-22 NaN   2016-02-20 NaN   NaN   text text … 
456789123 2016-10-22 2017-05-29 NaN   NaN   NaN   text text … 

在SQL我將通常使用SELECT DATEDIFF(dd,Meeting1,Meeting2) AS diff_mt1_mt2 在Python我使用

from datetime import datetime 
from datetime import date 

df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']) 

嘗試,但得到一個ValueError:值的長度不匹配長度的索引(完整的錯誤以下)

是否有更容易/更好的方式來做到這一點在Python?

完整的錯誤:

ValueError        Traceback (most recent call last) 
<ipython-input-9-055085bc04d7> in <module>() 
     3 from datetime import date 
     4 
----> 5 df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']), 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value) 
    2427   else: 
    2428    # set column 
-> 2429    self._set_item(key, value) 
    2430 
    2431  def _setitem_slice(self, key, value): 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value) 
    2493 
    2494   self._ensure_valid_index(value) 
-> 2495   value = self._sanitize_column(key, value) 
    2496   NDFrame._set_item(self, key, value) 
    2497 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast) 
    2664 
    2665    # turn me into an ndarray 
-> 2666    value = _sanitize_index(value, self.index, copy=False) 
    2667    if not isinstance(value, (np.ndarray, Index)): 
    2668     if isinstance(value, list) and len(value) > 0: 

C:\Users\lmgagne\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in _sanitize_index(data, index, copy) 
    2877 
    2878  if len(data) != len(index): 
-> 2879   raise ValueError('Length of values does not match length of ' 'index') 
    2880 
    2881  if isinstance(data, PeriodIndex): 

ValueError: Length of values does not match length of index 

我使用:

Python版本3.6.1和熊貓版本0.20.1

+0

你可以添加數據樣本嗎?因爲它應該工作。 – jezrael

+0

@jezrael增加了一些數據 – LMGagne

回答

0

我想你需要先轉換Meetingdatetime S按to_datetime將參數errors='coerce'轉換爲非日期時間爲NaT(日期時間缺失值):

#filter columns 
cols = df.columns[df.columns.str.startswith('Meeting')] 
df[cols] = df[cols].apply(lambda x: pd.to_datetime(x, errors='coerce')) 

df['diff_mt1_mt2'] = (df['Meeting2']-df['Meeting1']) 
+0

增加了這個代碼,但它仍然產生相同的錯誤 – LMGagne

+0

沒有數據很難回答。但也許使用老版本的熊貓。 – jezrael

0
import pandas as pd 
import numpy as np 

d1 = pd.to_datetime(['2014-09-17','2015-04-22','2015-05-30',np.NaN,np.NaN]) 
d2= pd.to_datetime(['2015-09-22',np.NaN,'2016-02-20',np.NaN,np.NaN]) 
d3= pd.to_datetime(['2016-10-22','2017-05-29',np.NaN,np.NaN,np.NaN]) 
data=[d1,d2,d3] 
index_serie = np.array((123456789,987654321,456789123)) 

df = pd.DataFrame(data=data,index=index_serie,columns=['Meeting 1','Meeting 2','Meeting 3','Meeting 4','Meeting 5']) 
df.index.name = 'ID_number' 
df['diff_mt1_mt2'] = (df['Meeting 2']-df['Meeting 1']) 

它適用於我最新版本的Python和Pandas。

相關問題