基於python熊貓數據框中列的狀態變化將時間序列數據分組到組中

我需要將熊貓數據框中的一些數據分組，但標準分組方法並不完全符合我的需要。它必須組合，以便「loc」中的每個更改和/或「name」中的每個更改都被視爲一個單獨的組。基於python熊貓數據框中列的狀態變化將時間序列數據分組到組中

示例;

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]]) 
x.columns = ['name','loc','time'] 

name loc time 
john abc 1 
john abc 2 
john abc 3 
john xyz 4 
john xyz 5 
john abc 6 
john abc 7 
matt abc 8

我需要一羣這些值，這樣得出的數據是

name loc first last 
john abc 1  3 
john xyz 4  5 
john abc 6  7 
matt abc 8  8

默認分組（正常）工作組中的所有所以我們只剩下3組祿和名稱值（約翰福音/ abc是1組）。有人知道如何強制分組，我怎麼需要它？

我能夠使用for循環（iterrows）生成所需的表格，但是如果有一個很好的熊貓pythonic方法來做同樣的事情，我很想知道。

預先感謝您。

馬特

來源

2014-01-16 Matt

只是爲了確保，你想要的倒數第二行中的結果，第二個'（「約翰」，「ABC」）'行。我知道Github上有一個關於連續「groupby」的問題，我會看看我能否找到它。 – TomAugspurger

這是不是真的爲groupby工作，因爲行事項的順序。而是使用shift來比較連續的行。

In [37]: cols = ['name', 'loc'] 

In [38]: change = (x[cols] != x[cols].shift(-1)).any(1).shift(1).fillna(True) 

In [39]: groups = x[change] 

In [40]: groups.columns = ['name', 'loc', 'first'] 

In [41]: groups['last'] = (groups['first'].shift(-1) - 1).fillna(len(x)) 

In [42]: groups 
Out[42]: 
    name loc first last 
0 john abc  1  3 
3 john xyz  4  5 
5 john abc  6  7 
7 matt abc  8  8 

[4 rows x 4 columns]

來源

2014-01-16 15:47:00

您可以使用在groupby功能：

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]]) 
x.columns = ['name','loc','time'] 

last_group = None 
c =0 
def f(y): 
    global c,last_group 
    g = x.irow(y)['name'],x.irow(y)['loc'] 
    if last_group != g: 
     c += 1 
     last_group = g 
    return c 

print x.groupby(f).head()

來源

2014-01-16 16:19:31

基於python熊貓數據框中列的狀態變化將時間序列數據分組到組中

回答

相關問題