如何統計多索引數據框中每天的行數？

我有一個兩級MultiIndex的DataFrame。第一級date是DatetimeIndex，第二級name只是一些字符串。數據有10分鐘的時間間隔。如何統計多索引數據框中每天的行數？

如何按日期對MultiIndex的第一級進行分組並計算每天的行數？

我懷疑耦合到一個多指標的DatetimeIndex是給我的問題，因爲這樣做

data.groupby(pd.TimeGrouper(freq='D')).count()

給我

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

我也試着寫

data.groupby(data.index.levels[0].date).count()

這導致

ValueError: Grouper and axis must be same length

例如，我該如何讓石斑變得更長（即，包括重複的索引值，現在忽略它們使它比軸短）？

謝謝！

來源

2017-08-03 basse

你能提供你的數據框的問題的樣本中刪除名字？ –

您可以在Grouper中使用level關鍵字。（另請注意，TimeGrouper已棄用）。此參數爲

目標指數的等級。

實例數據框：

dates = pd.date_range('2017-01', freq='10MIN', periods=1000) 
strs = ['aa'] * 1000 
df = pd.DataFrame(np.random.rand(1000,2), index=pd.MultiIndex.from_arrays((dates, strs)))

解決方案：

print(df.groupby(pd.Grouper(freq='D', level=0)).count()) 
       0 1 
2017-01-01 144 144 
2017-01-02 144 144 
2017-01-03 144 144 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

更新：你在你的意見，你得到的計數有你想降爲零指出。例如，假設您的數據幀實際上是缺少一些天：

df = df.drop(df.index[140:400]) 
print(df.groupby(pd.Grouper(freq='D', level=0)).count()) 
       0 1 
2017-01-01 140 140 
2017-01-02 0 0 
2017-01-03 32 32 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

據我所知，有沒有辦法中.count排除零個計數。相反，您可以使用上面的結果來刪除零。

第一溶液（可能不太可取，因爲它轉換和int結果float在引入np.nan，將

res = df.groupby(pd.Grouper(freq='D', level=0)).count() 
res = res.replace(0, np.nan).dropna()

第二和更好的解決方案，在我看來，從here：

res = res[(res.T != 0).any()] 
print(res) # notice - excludes 2017-01-02 
       0 1 
2017-01-01 140 140 
2017-01-03 32 32 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

.any從NumPy移植到熊貓，並且當任何元素在請求的軸上爲真時返回True。

來源

2017-08-03 15:47:59

謝謝，布拉德，你完美地回答了我的問題。作爲一個學習機會，我注意到我得到了零計數的行，並將'.dropna（）'附加到'.groupby（）。count（）'語句不會刪除這些行。任何使「Grouper」在同一行中直接落入零計數的方法？ – basse

假設數據框看起來像這樣

d=pd.DataFrame([['Mon','foo',3],['Tue','bar',6],['Wed','qux',9]], 
       columns=['date','name','amount'])\ 
       .set_index(['date','name'])

可以從指數僅此分組操作

d.reset_index('name', drop=True)\ 
.groupby('date')\ 
['amount'].count()

來源

2017-08-03 16:01:14

如何統計多索引數據框中每天的行數？

回答

相關問題