GROUPBY和使用自定義功能

從這個問題，下面就進行逐行計算GROUPBY和使用自定義功能

col_1 col_2 col_3 col_4 
a  X  5  1 
a  Y  3  2 
a  Z  6  4 
b  X  7  8 
b  Y  4  3 
b  Z  6  5

而且我想，在COL_1每個值，使用col_3和col_4（以及更多列）中的值對col_2中的X和Z應用函數，並使用這些值創建一個新行。所以輸出如下：

col_1 col_2 col_3 col_4 
a  X  5  1 
a  Y  3  2 
a  Z  6  4 
a  NEW  *  * 
b  X  7  8 
b  Y  4  3 
b  Z  6  5 
b  NEW  *  *

其中*是函數的輸出。

原來的問題（只需要一個簡單的加法）得到的回答是：

new = df[df.col_2.isin(['X', 'Z'])]\ 
    .groupby(['col_1'], as_index=False).sum()\ 
    .assign(col_2='NEW') 

df = pd.concat([df, new]).sort_values('col_1')

現在我正在尋找一種方式來使用自定義功能，如(X/Y)或((X+Y)*2)，而不是X+Y。我如何修改此代碼以符合我的新要求？

來源

2017-09-27 Saturate

的可能的複製[蟒 - 通過並添加新的行組，其是其他行的計算]（https://stackoverflow.com/questions/46446863/python-group-by-and-add-new-其他行的行計算） – zipa

在看到coldspeed的答案之前，我有你正在尋找的解決方案。 – Dark

不是重複的@zipa，它是從這個問題開始的。 Coldspeed回答了這個問題，建議爲增加的難度創造一個新問題。 – Saturate

我不知道如果這是你在找什麼，但在這裏有雲：

def f(x): 
    y = x.values 
    return y[0]/y[1] # replace with your function

而且，改變new是：

new = df[df.col_2.isin(['X', 'Z'])]\ 
      .groupby(['col_1'], as_index=False)[['col_3', 'col_4']]\ 
      .agg(f).assign(col_2='NEW') 

    col_1  col_3 col_4 col_2 
0  a 0.833333 0.25 NEW 
1  b 1.166667 1.60 NEW 

df = pd.concat([df, new]).sort_values('col_1') 

df 
    col_1 col_2  col_3 col_4 
0  a  X 5.000000 1.00 
1  a  Y 3.000000 2.00 
2  a  Z 6.000000 4.00 
0  a NEW 0.833333 0.25 
3  b  X 7.000000 8.00 
4  b  Y 4.000000 3.00 
5  b  Z 6.000000 5.00 
1  b NEW 1.166667 1.60

我正在在f的信仰飛躍，並假設這些列在他們擊中功能之前被排序。如果情況並非如此，則需要額外致電sort_values：

df = df.sort_values(['col_1, 'col_2'])

應該這樣做。

來源

2017-09-27 15:14:45

這太好了。 ;我知道你會想出這個。 – Dark

@Bharathshetty是的..你原來的答案是好的，但沒有解決OP的新要求:-) –

現在你知道它爲什麼被刪除哈哈。 – Dark

def foo(df): 
    # Expand variables into dictionary. 
    d = {v: df.loc[df['col_2'] == v, ['col_3', 'col_4']] for v in df['col_2'].unique()} 

    # Example function: (X + Y) * 2 
    result = (d['X'].values + d['Y'].values) * 2 

    # Convert result to a new dataframe row. 
    result = result.tolist()[0] 
    df_new = pd.DataFrame(
     {'col_1': [df['col_1'].iat[0]], 
     'col_2': ['NEW'], 
     'col_3': result[0], 
     'col_4': result[1]}) 
    # Concatenate result with original dataframe for group and return. 
    return pd.concat([df, df_new]) 

>>> df.groupby('col_1').apply(lambda x: foo(x)).reset_index(drop=True) 
    col_1 col_2 col_3 col_4 
0  a  X  5  1 
1  a  Y  3  2 
2  a  Z  6  4 
3  a NEW  16  6 
4  b  X  7  8 
5  b  Y  4  3 
6  b  Z  6  5 
7  b NEW  22  22

來源

2017-09-27 15:22:52 Alexander

編輯這個問題，因爲它很不清楚。請看看它並重新訪問。 –

看起來不錯。 :-) –

GROUPBY和使用自定義功能

回答

相關問題