同時使用groupby（「1d」）和first_valid_index

This後顯示如何使用first_valid_index查找數據幀列中第一次出現的值。如何將first_valid_index與每日groupby一起使用，以便每天查找鏈接帖子中顯示的同一示例數據框的第一個匹配項？同時使用groupby（「1d」）和first_valid_index

這是GROUPBY代碼我需要使用：

grouper = pd.TimeGrouper("1d")

編輯：

當我使用lambda和apply方法它給正確的輸出。我無法將此輸出發送到一個新的列['test_output']雖然它只是表明納特：

df['test_output'] = df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index()) 

df 
Out[9]: 
test_1 test_output 
2014-03-04 09:00:00 NaN NaT 
2014-03-04 10:00:00 NaN NaT 
2014-03-04 11:00:00 NaN NaT 
2014-03-04 12:00:00 NaN NaT 
2014-03-04 13:00:00 NaN NaT 
2014-03-04 14:00:00 1.0 NaT 
2014-03-04 15:00:00 1.0 NaT 
2014-03-04 16:00:00 1.0 NaT 
2014-03-05 09:00:00 1.0 NaT

來源

2016-07-04 ade1e

與'groupby'你可以做'df.groupby（斑）。首先（） ' – EdChum

IIUC您可以groupby對象使用first：

In [95]: 
df.groupby(grouper).first() 

Out[95]: 
      test_1 
2014-03-04  1.0 
2014-03-05  1.0

應該工作，產生上述使用相同的數據作爲您的鏈接的問題

編輯

我覺得上面的其實是正確的，因爲它不同於呼籲head(1)例如：

In [3]: 
df.groupby(grouper).head(1) 

Out[3]: 
        test_1 test_output 
2014-03-04 09:00:00  NaN   NaN 
2014-03-05 09:00:00  1   1

但你也可以使用lambda與apply撥打first_valid_index：

In [6]: 
df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index()) 

Out[6]: 
2014-03-04 2014-03-04 14:00:00 
2014-03-05 2014-03-05 09:00:00 
Name: test_1, dtype: datetime64[ns]

編輯

要將它添加回來作爲一個列有點棘手，這是因爲你試圖將orig索引與新的每日分組groupby對象，所以它不會對齊，這就是爲什麼你NaT。你可以做的就是在索引上撥打to_series，我們想要這個的原因是我們可以調用map，並且只訪問date屬性。 map將進行查找，因此將匹配上的日期在groupby結果，並返回所期望的第一個有效日期：

In [136]: 
df['first'] = df.index.to_series().dt.date.map(df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index())) 
df 

Out[136]: 
        test_1 test_output    first 
2014-03-04 09:00:00  NaN   NaN 2014-03-04 14:00:00 
2014-03-04 10:00:00  NaN   NaN 2014-03-04 14:00:00 
2014-03-04 11:00:00  NaN   NaN 2014-03-04 14:00:00 
2014-03-04 12:00:00  NaN   NaN 2014-03-04 14:00:00 
2014-03-04 13:00:00  NaN   NaN 2014-03-04 14:00:00 
2014-03-04 14:00:00  1.0   1.0 2014-03-04 14:00:00 
2014-03-04 15:00:00  1.0   1.0 2014-03-04 14:00:00 
2014-03-04 16:00:00  1.0   1.0 2014-03-04 14:00:00 
2014-03-05 09:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 10:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 11:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 12:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 13:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 14:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 15:00:00  1.0   1.0 2014-03-05 09:00:00 
2014-03-05 16:00:00  1.0   1.0 2014-03-05 09:00:00

來源

2016-07-04 14:47:39 EdChum

再次感謝。這裏的問題是我對groupby的理解，它可能不是我的任務的正確方法。我期待着回到這一天的第一次，這發生在我之前的帖子中，但是groupby只是給我一天而不是一天的時間。 – ade1e

查看最新答案 – EdChum

Thanks EdChum。隨着Lambda應用方法的接近，這將返回所需的輸出。不幸的是，我不能將這些數據發送到另一列「test_output」，因爲它只顯示NaT值，當我使用：'df ['test_output'] = df.groupby（grouper）['test_1']。apply（lambda x：x .first_valid_index（））' – ade1e

同時使用groupby（「1d」）和first_valid_index

回答

相關問題