Python熊貓計數

我有一個「句子」的數據框，我想從中搜索關鍵字。假設我的關鍵字只是字母'A'。示例數據：Python熊貓計數

year | sentence | index 
----------------------- 
2015 | AAX  | 0 
2015 | BAX  | 1 
2015 | XXY  | -1 
2016 | AWY  | 0 
2017 | BWY  | -1

也就是說，「索引」列顯示每個句子中「A」的第一次出現的索引（如果未找到則爲-1）。我想將行分組到各自的年份，並在列中顯示每年記錄中出現「A」的百分比。那就是：

year | index 
------------- 
2015 | 0.667 
2016 | 1.0 
2017 | 0

我有一種感覺，這需要以某種方式agg或groupby，但我不清楚如何串在一起的這些。我已經得到了儘可能：

df.groupby("index").count()

但這裏的問題是某種條件計數的（）首先，我們先算含有「A」在一年201X的行數，再除以按201X年的行數計算。

來源

2017-07-10 AndreyIto

您可以使用value_counts或GroupBy.size與boolean indexing：

What is the difference between size and count in pandas?

df2 = df['year'].value_counts() 
print (df2) 
2015 3 
2017 1 
2016 1 
Name: year, dtype: int64 

df1 = df.loc[df['index'] != -1, 'year'].value_counts() 
print (df1) 
2015 2 
2016 1 
Name: year, dtype: int64

或者：

df2 = df.groupby('year').size() 
print (df2) 
year 
2015 3 
2016 1 
2017 1 
dtype: int64 

df1 = df.loc[df['index'] != -1, ['year']].groupby('year').size() 
print (df1) 
year 
2015 2 
2016 1 
dtype: int64

而在去年除以div：

print (df1.div(df2, fill_value=0)) 
2015 0.666667 
2016 1.000000 
2017 0.000000 
Name: year, dtype: float64

來源

2017-07-10 05:21:08 jezrael

據我所知，有不同的方法可以做到，但沒有「原生」方式。這裏有一個例子，只有一個grouby：

g = df.groupby('year')['index'].agg([lambda x: x[x>=0].count(), 'count']) 
g['<lambda>']/g['count']

檢查也：

來源

2017-07-10 05:35:33 Alex

from __future__ import division 
import pandas as pd 
x_df = # your dataframe 

y = x_df.groupby('year')['sentence'].apply(lambda x: sum(True if i.count('A') >0 else False for i in x)/len(x)) 

#or 

y = x.groupby('year')['index'].apply(lambda x: sum(True if i >=0 else False for i in x)/len(x))

來源

2017-07-10 06:09:39

使用sentence檢查

df.sentence.str.contains('A').groupby(df.year).mean() 

year 
2015 0.666667 
2016 1.000000 
2017 0.000000 
Name: sentence, dtype: float64

使用index一個已經籤

df['index'].ne(-1).groupby(df.year).mean() 

year 
2015 0.666667 
2016 1.000000 
2017 0.000000 
Name: index, dtype: float64

來源

2017-07-10 06:26:57 piRSquared

Python熊貓計數

回答

相關問題