熊貓pd.cut（） - 裝箱datetime列/系列

試圖使用pd.cut（）做一個箱子，但是它是相當elaborate-熊貓pd.cut（） - 裝箱datetime列/系列

一個collegue向我發送多個文件報告的日期，如：

'03-16-2017 to 03-22-2017' 
'03-23-2017 to 03-29-2017' 
'03-30-2017 to 04-05-2017'

它們全部組合成單個數據框並給出列名df ['Filedate']，以便文件中的每個記錄都有正確的文件。

最後一天是一個臨界點，所以我創建了一個新列df ['Filedate_bin']，它將最後一天轉換爲2017年3月22日2017年3月29日2017年4月4日的字符串。

然後我創建了一個列表：Filedate_bin_list = df.Filedate_bin.unique（）。因此，我有一個我想用作垃圾箱的字符串截止日期的唯一列表。

將不同的數據導入數據框中，有一列交易日期：2017/3/28，2017/3/29，2017/3/30，2017/4/1，2017/4/4等。它們分配到箱櫃是困難的，它試圖：

df['bin'] = pd.cut(df.Processed_date, Filedate_bin_list)

接收TypeError: unsupported operand type for -: 'str' and 'str'

回去，並試圖轉換Filedate_bin爲datetime，格式= '％M /％d /％Y'，並獲得

TypeError: Cannot cast ufunc less input from dtype('<m8[ns]') to dtype ('<m8') with casting rule 'same_kind'.

有沒有更好的方法將我的processed_date（s）放入文本框？

我試圖在我的加工日期，以配合2017年3月27日至03 -23-2017至2017' 年3月29日

來源

2017-04-19 Arthur D. Howland

考慮使用此方法：

df = pd.DataFrame(pd.date_range('2000-01-02', freq='1D', periods=15), columns=['Date']) 

bins_dt = pd.date_range('2000-01-01', freq='3D', periods=6) 
bins_str = bins_dt.astype(str).values 

labels = ['({}, {}]'.format(bins_str[i-1], bins_str[i]) for i in range(1, len(bins_str))] 

df['cat'] = pd.cut(df.Date.astype(np.int64)//10**9, 
        bins=bins_dt.astype(np.int64)//10**9, 
        labels=labels)

結果：

In [59]: df 
Out[59]: 
     Date      cat 
0 2000-01-02 (2000-01-01, 2000-01-04] 
1 2000-01-03 (2000-01-01, 2000-01-04] 
2 2000-01-04 (2000-01-01, 2000-01-04] 
3 2000-01-05 (2000-01-04, 2000-01-07] 
4 2000-01-06 (2000-01-04, 2000-01-07] 
5 2000-01-07 (2000-01-04, 2000-01-07] 
6 2000-01-08 (2000-01-07, 2000-01-10] 
7 2000-01-09 (2000-01-07, 2000-01-10] 
8 2000-01-10 (2000-01-07, 2000-01-10] 
9 2000-01-11 (2000-01-10, 2000-01-13] 
10 2000-01-12 (2000-01-10, 2000-01-13] 
11 2000-01-13 (2000-01-10, 2000-01-13] 
12 2000-01-14 (2000-01-13, 2000-01-16] 
13 2000-01-15 (2000-01-13, 2000-01-16] 
14 2000-01-16 (2000-01-13, 2000-01-16] 

In [60]: df.dtypes 
Out[60]: 
Date datetime64[ns] 
cat   category 
dtype: object

說明：

df.Date.astype(np.int64)//10**9 - datetime值轉換成UNIX紀元（時間戳 - 因爲1970-01-01 00:00:00秒＃）：

In [65]: df.Date.astype(np.int64)//10**9 
Out[65]: 
0  946771200 
1  946857600 
2  946944000 
3  947030400 
4  947116800 
5  947203200 
6  947289600 
7  947376000 
8  947462400 
9  947548800 
10 947635200 
11 947721600 
12 947808000 
13 947894400 
14 947980800 
Name: Date, dtype: int64

同樣會applyied到bins：

In [66]: bins_dt.astype(np.int64)//10**9 
Out[66]: Int64Index([946684800, 946944000, 947203200, 947462400, 947721600, 947980800], dtype='int64')

標籤：

In [67]: labels 
Out[67]: 
['(2000-01-01, 2000-01-04]', 
'(2000-01-04, 2000-01-07]', 
'(2000-01-07, 2000-01-10]', 
'(2000-01-10, 2000-01-13]', 
'(2000-01-13, 2000-01-16]']

來源

2017-04-19 16:45:13 MaxU

外貌就像我需要建立一個頻率並輸入多少個週期（箱）。有一個TypeError：不能從[datetime64 [ns]]到[uint64]這樣的日期類型。除熊貓，numpy和datetime以外的任何圖書館？ –

@ ArthurD.Howland，嘗試使用'np.int64'而不是'np.uint64'。不，你只需要熊貓和Numpy來完成這項任務。你的熊貓和Numpy版本是什麼？ – MaxU

np.int64工作並重復結果。看起來像我的第一個成功的垃圾箱。可能需要一天的時間將其轉換爲我的設置。使用Python 3.5.2。出現我的垃圾箱是7D，我可以使用我獨特的列表長度來確定垃圾箱的數量。越來越近:) –

熊貓pd.cut（） - 裝箱datetime列/系列

回答

相關問題