計算Pandas DataFrame中的重複值

必須有一種簡單的方法才能做到這一點，但我無法找到一個適合SO的優雅解決方案，或者自己解決這個問題。計算Pandas DataFrame中的重複值

我想根據DataFrame中的一組列來計算重複值的數量。

例子：

print df 

    Month LSOA code Longitude Latitude Crime type 
0 2015-01 E01000916 -0.106453 51.518207 Bicycle theft 
1 2015-01 E01000914 -0.111497 51.518226 Burglary 
2 2015-01 E01000914 -0.111497 51.518226 Burglary 
3 2015-01 E01000914 -0.111497 51.518226 Other theft 
4 2015-01 E01000914 -0.113767 51.517372 Theft from the person

我的解決方法：

counts = dict() 
for i, row in df.iterrows(): 
    key = (
      row['Longitude'], 
      row['Latitude'], 
      row['Crime type'] 
     ) 

    if counts.has_key(key): 
     counts[key] = counts[key] + 1 
    else: 
     counts[key] = 1

而且我得到的計數：

{(-0.11376700000000001, 51.517371999999995, 'Theft from the person'): 1, 
(-0.111497, 51.518226, 'Burglary'): 2, 
(-0.111497, 51.518226, 'Other theft'): 1, 
(-0.10645299999999999, 51.518207000000004, 'Bicycle theft'): 1}

從實際上這個代碼可以作爲很好的改善

除了（隨意如何評論），通過熊貓做到這一點的方式是什麼？

對於那些有興趣，我工作的一個數據集從https://data.police.uk/

來源

2015-11-30 tales

可以使用groupby與功能size。然後，我重新將索引重新命名爲0至count。

print df 
    Month LSOA  code Longitude Latitude    Crime type 
0 2015-01 E01000916 -0.106453 51.518207   Bicycle theft 
1 2015-01 E01000914 -0.111497 51.518226    Burglary 
2 2015-01 E01000914 -0.111497 51.518226    Burglary 
3 2015-01 E01000914 -0.111497 51.518226   Other theft 
4 2015-01 E01000914 -0.113767 51.517372 Theft from the person 

df = df.groupby(['Longitude', 'Latitude', 'Crime type']).size().reset_index(name='count') 
print df 
    Longitude Latitude    Crime type count 
0 -0.113767 51.517372 Theft from the person  1 
1 -0.111497 51.518226    Burglary  2 
2 -0.111497 51.518226   Other theft  1 
3 -0.106453 51.518207   Bicycle theft  1 

print df['count'] 
0 1 
1 2 
2 1 
3 1 
Name: count, dtype: int64

來源

2015-11-30 07:47:28 jezrael

計算Pandas DataFrame中的重複值

回答

相關問題