通過平均基於兩個頭部熊貓

我已在CSV data以下總結一個CSV數據轉換成數據幀：通過平均基於兩個頭部熊貓

id,gene,celltype,stem,stem,stem,bcell,bcell,tcell 
id,gene,organs,bm,bm,fl,pt,pt,bm 
134,foo,about_foo,20,10,11,23,22,79 
222,bar,about_bar,17,13,55,12,13,88

注意，它包含兩個標題。我想要做的是將從第二排開始分組，並按器官和細胞類型對它們進行平均。所以它創建類似這樣的分層數據幀：

bm  stem,   bcell, tcell 
    foo (20+10)/2  0  79/1=79 
    bar (17+13)/2  0  88/1=88 



fl  stem,  bcell, tcell 
    foo 11/1=11  0   0 
    bar 55/1=55 


pt   stem,  bcell,  tcell 
    foo  0  (23+22)/2  0 
    bar  0  (12+13)/2  0

我怎樣才能做到這一點？

我堅持用下面的代碼：

import pandas as pd 
df = pd.read_csv("http://dpaste.com/1X74TNP.txt")

更新

import pandas as pd 
df = pd.read_csv("http://dpaste.com/1X74TNP.txt",header=None,index_col=[1,2]).iloc[:, 1:] 
df.columns = pd.MultiIndex.from_arrays(df.ix[:2].values) 
df = df.ix[2:] 
df.index.names = ['cell', 'organ'] 
df = df.reset_index('organ', drop=True) 
result = df.groupby(level=[0, 1], axis=1).mean().stack().replace(np.nan, 0).unstack().swaplevel(0,1, axis=1).sort_index(axis=1)

給出：

DataError: No numeric types to aggregate

來源

2015-12-01 neversaint

df = pd.read_csv(join(DESKTOP, 'bio.csv'), header=None, index_col=[1,2]).iloc[:, 1:] 

df.columns = pd.MultiIndex.from_arrays(df.ix[:2].values) 
df = df.ix[2:].astype(int) 
df.index.names = ['cell', 'organ'] 
df = df.reset_index('organ', drop=True) 

avg = df.groupby(level=[0, 1], axis=1).mean() 
result = avg.stack().replace(np.nan, 0).unstack() 
result = result.swaplevel(0,1, axis=1).sort_index(axis=1) 

     bm    fl    pt   
    bcell stem tcell bcell stem tcell bcell stem tcell 
cell             
foo  0 15 79  0 11  0 22.5 0  0 
bar  0 15 88  0 55  0 12.5 0  0

要訪問的屬性之一，使用：

print(result.loc[:, 'bm']) 

     bcell stem tcell 
cell      
foo  0 15  79 
bar  0 15  88

來源

2015-12-01 05:41:39 Stefan

沒有，這不會做。我需要跟蹤每個器官的基因。 – neversaint

好的。所以這聽起來像是你需要一個包含基因和器官列的MultiIndex'（後者在第一行中稱爲「celltype」）。接下來看起來您需要分別通過bm，fl和pt組織您的數據，分別對stem，bcell和tcell進行不同的處理。平均，保持現狀或設置爲零的規則是什麼？ – Stefan

上面的例子很明顯沒有？ – neversaint

通過平均基於兩個頭部熊貓

回答

相關問題