2016-09-27 65 views
2

我有數據幀:熊貓:彙總數據通過數據幀

ID,"url","app_name","used_at","active_seconds","device_connection","device_os","device_type","device_usage" 
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:00:13,5,3g,ios,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:01:45,107,wifi,android,smartphone,home 
1ca9bb884462c3ba2391bf669c22d4bd,"",Twitter,2016-01-01 00:02:48,20,3g,ios,smartphone,home 
1ca9bb884462c3ba2391bf669c22d4bd,"",VK Client,2016-01-01 00:03:08,796,3g,ios,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:03:32,70,wifi,android,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:04:42,27,wifi,android,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:05:30,5,wifi,android,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",WhatsApp Messenger,2016-01-01 00:05:36,47,wifi,android,smartphone,home 
b8f4df3f99ad786a77897c583d98f615,"",VKontakte,2016-01-01 00:06:23,20,wifi,android,smartphone,home 
a703114aa8a03495c3e042647212fa63,"",Instagram,2016-01-01 00:06:41,118,3g,android,smartphone,home 
1637ce5a4c4868e694004528c642d0ac,"",Camera,2016-01-01 00:06:43,16,wifi,android,smartphone,home 
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:07:00,45,wifi,android,smartphone,home 
a703114aa8a03495c3e042647212fa63,"",VKontakte,2016-01-01 00:08:40,99,3g,android,smartphone,home 
1637ce5a4c4868e694004528c642d0ac,"",VKontakte,2016-01-01 00:10:05,1,wifi,android,smartphone,home 

我需要計算每一個app_name的份額每ID。 但我不能做下一個: 和每一個應用程序的每一個ID,我應該劃分到所有的應用ID和一個倍數100的總和(找百分比) 我做的:

short = df.groupby(['ID', 'app_name']).agg({'app_name': len, 'active_seconds': sum}).rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}).reset_index() 

但只返回數量每一個應用程序,當我嘗試

short = df.groupby(['ID', 'app_name']).agg({'app_name': len, 'active_seconds': sum/df.ID.app_name.sum() * 100}).rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}).reset_index() 

它返回一個錯誤

我怎樣才能解決呢?

+0

,你能否告訴預期的輸出? –

回答

3

IIUC你需要:

short = df.groupby(['ID', 'app_name']) 
      .agg({'app_name': len, 
       'active_seconds': lambda x: 100 * x.sum()/df.active_seconds.sum()}) 
      .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}) 
      .reset_index() 

print (short) 

           ID   app_name count_sec sum_app 
0 1637ce5a4c4868e694004528c642d0ac    Camera 1.162791  1 
1 1637ce5a4c4868e694004528c642d0ac   VKontakte 3.343023  2 
2 1ca9bb884462c3ba2391bf669c22d4bd    Twitter 1.453488  1 
3 1ca9bb884462c3ba2391bf669c22d4bd   VK Client 58.212209  2 
4 a703114aa8a03495c3e042647212fa63   Instagram 8.575581  1 
5 a703114aa8a03495c3e042647212fa63   VKontakte 7.194767  1 
6 b8f4df3f99ad786a77897c583d98f615   VKontakte 11.555233  4 
7 b8f4df3f99ad786a77897c583d98f615 WhatsApp Messenger 8.502907  2 

另一種解決方案:

#you need another name of df, e.g. short1 
short1 = df.groupby(['ID', 'app_name']) 
      .agg({'app_name': len, 'active_seconds': sum}) 
      .rename(columns={'active_seconds': 'count_sec', 'app_name': 'sum_app'}) 
      .reset_index() 
short1.count_sec = 100 * short1.count_sec/df.active_seconds.sum() 
print (short1) 
           ID   app_name count_sec sum_app 
0 1637ce5a4c4868e694004528c642d0ac    Camera 1.162791  1 
1 1637ce5a4c4868e694004528c642d0ac   VKontakte 3.343023  2 
2 1ca9bb884462c3ba2391bf669c22d4bd    Twitter 1.453488  1 
3 1ca9bb884462c3ba2391bf669c22d4bd   VK Client 58.212209  2 
4 a703114aa8a03495c3e042647212fa63   Instagram 8.575581  1 
5 a703114aa8a03495c3e042647212fa63   VKontakte 7.194767  1 
6 b8f4df3f99ad786a77897c583d98f615   VKontakte 11.555233  4 
7 b8f4df3f99ad786a77897c583d98f615 WhatsApp Messenger 8.502907  2 
+0

我的df更大,它在'count_sec'列中返回所有'0'。我試圖乘以10000,但它不會改變的情況 –

+0

我認爲它返回我'int'。如何將其轉換爲flioat? –

+0

使用'.type(float)' – jezrael