2015-10-19 64 views
1

我有一個數據框,我正在做行按行處理,我目前使用iterrows(),我知道這很慢,寧可使用apply()。不過,我不確定如何通過申請(如果可能的話)來解決這個問題。熊貓識別改變時間戳和計算組合總和

的 '邊緣' 的數據:

time raw_signal amp_change edge edge_dir 
2.73105 499.878 -22.583 TRUE decr 
2.7311 477.295 -24.414 TRUE decr 
2.73115 452.881 -25.025 TRUE decr 
2.7312 427.856 -21.362 TRUE decr 
2.7315 412.598 28.076 TRUE incr 
2.73155 440.674 25.024 TRUE incr 
8.5267 490.112 -24.414 TRUE decr 
8.52675 465.698 -30.517 TRUE decr 
8.5268 435.181 -25.635 TRUE decr 
8.70805 413.208 21.362 TRUE incr 
8.7081 434.57 24.414 TRUE incr 
10.7113 487.671 -20.752 TRUE decr 
10.71135 466.919 -34.79 TRUE decr 
10.7114 432.129 -37.842 TRUE decr 
10.71145 394.287 -24.414 TRUE decr 
10.9586 367.432 25.634 TRUE incr 
10.95865 393.066 34.79 TRUE incr 
10.9587 427.856 32.349 TRUE incr 
10.95875 460.205 20.142 TRUE incr 
12.35745 477.295 -23.193 TRUE decr 

施加到每一行

start = None 
dir = None 
sum_amp = 0 
for index, row in edges.iterrows(): 

    # this will collapse the multiple incr/decr together by taking only the first one seen 
    # the others will get their edge set to False 
    # it also assumes that the distance been multiple incr/decr is less than some threshold 
    if start == None: 
     start = index 
     dir = row.edge_dir 
     sum_amp = row.amp_change 
    else: 
     if row.edge_dir == dir and abs(start - index) < 0.01: 
      edges.loc[index,'edge'] = False 
      sum_amp += row.amp_change # sum amp increase so we can get an overall for this edge 
     else: 
      edges.loc[start,'amp_change'] = sum_amp 
      sum_amp = row.amp_change 
      start = index 
      dir = row.edge_dir 

功能應該產生

time raw_signal amp_change edge edge_dir 
2.73105 499.878 -93.384 TRUE decr 
2.7311 477.295 -24.414 FALSE decr 
2.73115 452.881 -25.025 FALSE decr 
2.7312 427.856 -21.362 FALSE decr 
2.7315 412.598 53.1 TRUE incr 
2.73155 440.674 25.024 FALSE incr 
8.5267 490.112 -80.566 TRUE decr 
8.52675 465.698 -30.517 FALSE decr 
8.5268 435.181 -25.635 FALSE decr 
8.70805 413.208 45.776 TRUE incr 
8.7081 434.57 24.414 FALSE incr 
10.7113 487.671 -117.798 TRUE decr 
10.71135 466.919 -34.79 FALSE decr 
10.7114 432.129 -37.842 FALSE decr 
10.71145 394.287 -24.414 FALSE decr 
10.9586 367.432 112.915 TRUE incr 
10.95865 393.066 34.79 FALSE incr 
10.9587 427.856 32.349 FALSE incr 
10.95875 460.205 20.142 FALSE incr 
12.35745 477.295 -23.193 TRUE decr 

回答

2

這樣如何oneliner:

In [16]: 

df['New_amp_change'] = np.hstack((np.diff(~(np.sign(df.amp_change.shift(1))<0)), True)) 

In [40]: 

df.ix[df.New_amp_change,'amp_change'] = df.groupby(df.New_amp_change.cumsum()).amp_change.sum().values 
In [42]: 

print df 
     time raw_signal amp_change edge edge_dir New_amp_change 
0 2.73105  499.878  -93.384 True  decr   True 
1 2.73110  477.295  -24.414 True  decr   False 
2 2.73115  452.881  -25.025 True  decr   False 
3 2.73120  427.856  -21.362 True  decr   False 
4 2.73150  412.598  53.100 True  incr   True 
5 2.73155  440.674  25.024 True  incr   False 
6 8.52670  490.112  -80.566 True  decr   True 
7 8.52675  465.698  -30.517 True  decr   False 
8 8.52680  435.181  -25.635 True  decr   False 
9 8.70805  413.208  45.776 True  incr   True 
10 8.70810  434.570  24.414 True  incr   False 
11 10.71130  487.671 -117.798 True  decr   True 
12 10.71135  466.919  -34.790 True  decr   False 
13 10.71140  432.129  -37.842 True  decr   False 
14 10.71145  394.287  -24.414 True  decr   False 
15 10.95860  367.432  112.915 True  incr   True 
16 10.95865  393.066  34.790 True  incr   False 
17 10.95870  427.856  32.349 True  incr   False 
18 10.95875  460.205  20.142 True  incr   False 
19 12.35745  477.295  -23.193 True  decr   True 

1,卻將amp_change一個位置(​​)

2,檢查標誌,爲負數

3返回True,檢查該標誌已改變(np.diff()

4 ,最後填充一個Truenp.diff()返回一個向量縮短的向量1個元素)

5,groupby獲取組總和,使用新創建的New_amp_change列

6,將組合數分配回符號變化的行(邊? )在原始數據框中。

+0

哇:)你能調整你的答案,納入amp_change的總結嗎?例如前四個「decr」被彙總在一起,以達到一個新的amp_change -93.384 – Constantino

+0

哦,忘了那部分,將在幾秒鐘內。 –

+1

你是男人,雖然現在它不再是一線:P – Constantino