估計每個

組

唯一出現次數的數字。這是我的數據幀df：估計每個

CITY ID_C 
abc 123 
abc 123 
abc 456 
def 123 
def 456 
def 789 
def 789

我需要CITY計算分組的ID_C唯一值的數量：

CITY TOTAL_UNIQUE_COUNT 
abc 2 
def 3

我想這代碼，但得到錯誤ValueError: cannot insert ID_CITIZEN, already exists：

df.groupby('CITY').ID_C.value_counts().reset_index()

來源

2017-04-20 Dinosaurius

沒有爲一個直接的方法：

df.groupby('CITY')['ID_C'].nunique() 
Out: 
CITY 
abc 2 
def 3 
Name: ID_C, dtype: int64

對於格式：

df.groupby('CITY')['ID_C'].nunique().to_frame('TOTAL_UNIQUE_COUNT') 
Out: 
     TOTAL_UNIQUE_COUNT 
CITY      
abc     2 
def     3 

df.groupby('CITY')['ID_C'].nunique().to_frame('TOTAL_UNIQUE_COUNT').reset_index() 
Out: 
    CITY TOTAL_UNIQUE_COUNT 
0 abc     2 
1 def     3

來源

2017-04-20 11:21:23 ayhan

如果我只是想計算出非唯一的總？ 'df.groupby（'CITY'）['ID_C']。count（）'？ – Dinosaurius

@Dinosaurius是的。如果你想排除'NaN'（如果有的話），你可以使用'groupby.count'。否則，你可以使用'groupby.size'。 – ayhan

回答

相關問題