2017-03-09 162 views
2

我曾以此爲CSV在pandas-前十行Python的大熊貓篩選和GROUPBY

print frame1.head(10) 

     alert   Subject filetype type  country status 
0 33965790 44676 aba  Attachment doc RU,RU,RU,RU deleted 
1 33965786 44676 rcrump Attachment zip   NaN deleted 
2 33965771   3aba Attachment zip   NaN deleted 
3 33965770    NaN Attachment js   ,, deleted 
4 33965766    NaN Attachment js   ,, deleted 
5 33965761    NaN Attachment zip   NaN deleted 
6 33965760    NaN Attachment zip   NaN deleted 
7 33965757    NaN Attachment zip   NaN deleted 
8 33965751 35200  3aba Attachment doc  RU,RU,RU deleted 
9 33965747 35200 INVaba Attachment zip   NaN deleted 

我需要拍攝的對象列數和計數具有「ABA」作爲一個子字符串的所有行工作。

Occurrences of aba- 512 

,甚至導致這樣

aba 12 
3aba 5 
INVaba 2 

這裏是我的代碼 -

targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject') 
print (targeted.to_string(header=False)) 

得到的錯誤 - AttributeError錯誤:無法訪問 'DataFrameGroupBy' 對象的可調用屬性 'to_string',嘗試使用「應用」方法

*****注:我得到這個工作更早一個公司UNT不同的文件類型,這個工程 -

filetype = frame1.groupby('filetype').size() 
###clean up the printing 
print "Delivered in Email" 
print (filetype.to_string(header=False)) 

,並給了我 -

Delivered in Email 
Attachment 32647 
Header   131 
URL   9236 

回答

2

要得到一個完整的計數,只需使用str.contains然後count

>>> df.Subject.str.contains('aba', case=False, na=False).count() 
10 

然後以獲取包含'aba',您可以訪問由contains發現這些值,然後使用value_counts唯一字符串計數。

>>> df.loc[df.Subject.str.contains('aba', case=False, na=False), 'Subject'].value_counts() 

3aba  1 
INVaba 1 
aba  1 
Name: Subject, dtype: int64 
0

對你的建議,你可以這樣做下面的第一輸出:

containts_aba = frame1[frame1['Subject'].str.contains('aba', case=False) 
print("Occurrences of aba-",len(contains_aba)) 

它創建另一個數據框基於您的條件,然後該數據框的長度將是出現次數,然後您可以將其打印出來。

0
targeted = frame1[frame1['Subject'].str.contains('aba', case=False , na=False)].groupby('Subject').size() 
print (targeted.to_string(header=False)) 

給人

3aba  1 
INVaba 1 
aba  1