計算在熊貓的GroupBy對象

我有一個熊貓數據框以下格式在日期的區別：計算在熊貓的GroupBy對象

In [0]: df 
Out[0]: 
     col1 col2  date 
0  1  1   2015-01-01 
1  1  2   2015-01-09 
2  1  3   2015-01-10 
3  2  1   2015-02-10 
4  2  2   2015-02-10 
5  2  3   2015-02-25 

In [1]: df.dtypes 
Out[1]: 
col1    int64 
col2    int64 
date datetime64[ns] 
dtype: object

我們要找到col2對應日期的最大區別在連續元素之間的值（按日期分組），按col1分組。假設有沒有大小1.

所需的輸出

In [2]: output 
Out[2]: 
col1 col2 
1  1   # This is because the difference between 2015-01-09 and 2015-01-01 is the greatest 
2  2   # This is because the difference between 2015-02-25 and 2015-02-10 is the greatest

真正df有col1，我們需要GROUPBY做很多計算值的組。這是可能的通過應用以下功能？請注意，日期已經是升序。

gb = df.groupby(col1) 
gb.apply(right_maximum_date_difference)

來源

2015-06-08 invoker

所以，正如我在我的回答中指出的那樣，我認爲你在這個問題上有一個錯誤：「2015-01-09 - 2015-01-01」是*不是最大的。 –

2015-01-09和2015-01-01之間的差異爲8天。 2015-01-10和2015-01-09之間的差異爲1天。在這種情況下，我有興趣獲取對應於2015-01-01日期的「col2」的值，因爲差異最大。 – invoker

哦，所以你的意思是在同一組中的前一行。我不得不說這個問題是非常不清楚的。此外，它是未定義的大小爲1的組。 –

這裏的東西，幾乎是你的數據框（我避免複製日期）：

df = pd.DataFrame({ 
    'col1': [1, 1, 1, 2, 2, 2], 
    'col2': [1, 2, 3, 1, 2, 3], 
    'date': [1, 9, 10, 10, 10, 25] 
})

有了這個，定義：

def max_diff_date(g): 
    g = g.sort(columns=['date']) 
    return g.col2.ix[(g.date.ix[1: ] - g.date.shift(1)).argmax() - 1]

，你必須：

>> df.groupby(df.col1).apply(max_diff_date) 
col1 
1 1 
2 2 
dtype: int64

來源

2015-06-08 18:23:40

您編寫的'lambda'函數將查找**最大日期**並返回與該日期相對應的'col2'的值。這不是我們想要的（如果我想要的話，我可以查看每個組中的最後一行，因爲日期已經排序）。 **我需要在行操作之間做一個查找日期差異的操作，並且最大限度地找到'col2'的值。** – invoker

我以前的評論已過時，代碼自那時以來一直在改變。阿美，我認爲你做到了！謝謝一堆！ – invoker

Ami，我有一堆麻煩讓這個工作DateTime類型。你能幫我嗎？它適用於整數，但這對我來說只是一步。 – invoker

我會嘗試一個稍微不同的方法：旋轉表格，以便爲col2中的每個值包含日期和值col1作爲索引。然後，您可以使用.diff方法來獲取連續單元格之間的差異。如果存在重複的col1，col2對，這可能不起作用，但這並不明確。

df = pd.DataFrame({'col1': [1, 1, 1, 2, 2, 2], 
      'col2': [1, 2, 3, 1, 2, 3], 
      'date': pd.to_datetime(['2015-01-01', '2015-01-09', '2015-01-10', 
            '2015-02-10', '2015-02-10', '2015-02-25'])}) 
p = df.pivot(columns='col1', index='col2', values='date') 
p 
    col1 1 2 
col2   
1 2015-01-01 2015-02-10 
2 2015-01-09 2015-02-10 
3 2015-01-10 2015-02-25 

p.diff().shift(-1).idxmax() 

col1 
1  1 
2  2

的.shift(-1)需要的是你要首先用最大差值的兩個連續日期的事實照顧。

來源

2015-06-08 19:14:18 JoeCondron

計算在熊貓的GroupBy對象

回答

相關問題