2017-09-06 100 views
-3

加起來值我有一個數據幀(命名錶)與標記爲[價格1,price2,price3,時間,類型,體積]結合的行和在數據幀

爲類型6列,我「Q」和'T',狀排列:

Q

Ť

Q

Ť

Q

現在我想的行與連續的T相結合,並添加了音量的值。價格和時間的價值是相同的,連續的TS

即我想

價格...:時間:型號:體積:

10000 2012.05 Q 10

10000 2012.05牛逼20

10000 2012.05 Q 10

10000 2012.06Ť20

10000 2012.06Ť30

10000 2012.07 Q 10

爲:

10000 2012.05 Q 10

10000 2012.05Ť20

10000 2012.05 Q 10

萬2012.06 T 20 + 30 = 50

10000 2012.07 Q 10

這是我的代碼,但沒有返回所需的結果,那麼有人可以幫我找出我的錯誤嗎?

def combine(df): 
    combined = [] # Init empty list 
    length = len(df.iloc[:,0]) # Get the number of rows in DataFrame 
    i = 0 
    while i < length: 
     num_elements = num_elements_equal(df, i, 0, 'T') # Get the number of consecutive 'T's 
     if num_elements <= 1: # If there are 1 or less T's, append only that element to combined, with the same type 
      combined.append([df.iloc[i,0],df.iloc[i,1],df.iloc[i,2],df.iloc[i,3],df.iloc[i,4],df.iloc[i,5]]) 
     else: # Otherwise, append the sum of all the elements to combined, with 'T' type 
      combined.append(['T', sum_elements(df, i, i+num_elements, 5)]) 
     i += max(num_elements, 1) # Increment i by the number of elements combined, with a min increment of 1 
    return pd.DataFrame(combined, columns=df.columns) # Return as DataFrame 

def num_elements_equal(df, start, column, value): # Counts the number of consecutive elements 
    i = start 
    num = 0 
    while i < len(df.iloc[:,column]): 
     if df.iloc[i,column] == value: 
      num += 1 
      i += 1 
     else: 
      return num 
    return num 

def sum_elements(df, start, end, column): # Sums the elements from start to end 
    return sum(df.iloc[start:end, column]) 

tableT = combine(table) 
tableT 

raw data (Table) looks like this

+0

你能通過Gist分享原始數據嗎?需要測試。甚至會使問題更清楚。 –

+0

其與此問題類似,但具有兩個以上的列https:// stackoverflow。com/questions/46059157/python-combine-rows-in-dataframe-and-add-up-values – bing

回答

1

IIUC:

輸入數據幀,DF:

Price  Time Type Volume 
0 10000 2012.05 Q  10 
1 10000 2012.05 T  20 
2 10000 2012.05 Q  10 
3 10000 2012.06 T  20 
4 10000 2012.06 T  30 
5 10000 2012.07 Q  10 

聯合Ť記錄和總和體積:

df.groupby(by=[df.Type.ne('T').cumsum(),'Price','Time','Type'], as_index=False)['Volume'].sum() 

輸出:

Price  Time Type Volume 
0 10000 2012.05 Q  10 
1 10000 2012.05 T  20 
2 10000 2012.05 Q  10 
3 10000 2012.06 T  50 
4 10000 2012.07 Q  10 
+0

謝謝。但是,此代碼返回空數據框 – bing

+0

如果使用給定的輸入數據框,該怎麼辦? –

+0

如果我使用給定的輸入數據幀,它會很好地工作,但是,對於我的數據幀,Type'T'在合併數據後消失,只剩下'Q' – bing