3
我需要在熊貓數據框中查找重複行,然後添加一個帶有計數的額外列。比方說,我們有一個數據幀:獲取帶有原始索引的熊貓重複行數
>>print(df)
+----+-----+-----+-----+-----+-----+-----+-----+-----+
| | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2 | 4 | 3 | 4 | 1 | 1 | 4 | 4 |
| 3 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2 | 3 | 4 | 3 | 4 | 0 | 0 | 0 |
| 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 1 | 1 | 4 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 |
| 10 | 3 | 3 | 4 | 3 | 5 | 5 | 5 | 0 |
| 11 | 5 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 12 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 13 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15 | 1 | 3 | 5 | 0 | 0 | 0 | 0 | 0 |
| 16 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 17 | 3 | 3 | 4 | 4 | 0 | 0 | 0 | 0 |
| 18 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+
上述幀隨後將與計數的附加列成爲下一個。您可以看到我們仍然保留索引列。
+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|-----|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| 2 | 2 | 4 | 3 | 4 | 1 | 1 | 4 | 4 | 1 |
| 3 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 2 |
| 4 | 2 | 3 | 4 | 3 | 4 | 0 | 0 | 0 | 1 |
| 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
| 6 | 4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7 | 1 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 1 |
| 10 | 3 | 3 | 4 | 3 | 5 | 5 | 5 | 0 | 1 |
| 11 | 5 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 13 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 15 | 1 | 3 | 5 | 0 | 0 | 0 | 0 | 0 | 1 |
| 16 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 17 | 3 | 3 | 4 | 4 | 0 | 0 | 0 | 0 | 1 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
我見過其他的解決方案,這如:
df.groupby(list(df.columns.values)).size()
但是,返回與差距,並沒有初始指數的矩陣。
Thankyou..that工作得很好。 – kPow989
很高興能幫到你! – jezrael