2015-05-15 148 views
-1

在熊貓中,我有一個數據幀,它由兩組在每組中有多個樣本組成。每個組都有一個內部參考值,我想從該組內的所有樣本值中減去。從熊貓中的行中減去組特定值

s = u"""Group sample value 
group1 ref1 18.1 
group1 smp1 NaN 
group1 smp2 20.3 
group1 smp3 30.0 
group2 ref2 16.1 
group2 smp4 29.2 
group2 smp5 19.9 
group2 smp6 28.9 
""" 
df = pd.read_csv(io.StringIO(s), sep='\s+') 
df = df.set_index(['Group', 'sample']) 
df 

Out[82]: 

       value  
Group sample 
group1 ref1 18.1 
     smp1 NaN 
     smp2 20.3 
     smp3 30.0 
group2 ref2 16.1 
     smp4 29.2 
     smp5 19.9 
     smp6 28.9 

我想要做的是添加一個新的列,其中從各個組中的所有樣本(smp)中減去參考(ref)。像這樣:

    value deltaValue 
SampleGroup sample    
Group1  ref  18.1 0 
       smp1  NaN  NaN 
       smp2  20.3 2.2 
       smp3  30.0 11.9 
Group2  ref2  16.1 0 
       smp4  29.2 13.1 
       smp5  19.9 3.8 
       smp6  28.9 12.8 

有沒有人知道如何做到這一點?謝謝!

回答

0

將您的數據框按sample列分組。然後遍歷每個組並獲取ref樣本值。然後減去整個列。

> df = pd.read_csv(io.StringIO(s), sep='\s+') 
> df['diff'] = 0 
> df_group = df.groupby('Group') 
> for index, group in df_group: 
     df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value'] 
> print df 
    Group sample value diff 
0 group1 ref1 18.1 0.0 
1 group1 smp1 NaN NaN 
2 group1 smp2 20.3 -2.2 
3 group1 smp3 30.0 -11.9 
4 group2 ref2 16.1 0.0 
5 group2 smp4 29.2 -13.1 
6 group2 smp5 19.9 -3.8 
7 group2 smp6 28.9 -12.8 
0

這裏有一個辦法做到這一點沒有循環

首先創建一個func功能標識sampleref開始,然後計算delta值。

In [33]: def func(grp): 
    ref = grp.ix[grp['sample'].str.startswith('ref'), 'value'] 
    grp['delta'] = grp['value'] - ref.values[0] 
    return grp 

使用此func並應用在了dff.groupby('Group')

In [34]: dff.groupby('Group').apply(func) 
Out[34]: 
    Group sample value delta 
0 group1 ref1 18.1 0.0 
1 group1 smp1 NaN NaN 
2 group1 smp2 20.3 2.2 
3 group1 smp3 30.0 11.9 
4 group2 ref2 16.1 0.0 
5 group2 smp4 29.2 13.1 
6 group2 smp5 19.9 3.8 
7 group2 smp6 28.9 12.8 

當您dff開始應該是這樣的,它可以像dff = df.reset_index()

In [35]: dff 
Out[35]: 
    Group sample value 
0 group1 ref1 18.1 
1 group1 smp1 NaN 
2 group1 smp2 20.3 
3 group1 smp3 30.0 
4 group2 ref2 16.1 
5 group2 smp4 29.2 
6 group2 smp5 19.9 
7 group2 smp6 28.9 
創建