2014-04-30 25 views
3

鑑於這樣的數據幀,大熊貓GROUPBY添加列從應用操作

chrom first_bp_intron last_bp_intron unique_junction_reads 
chr1 100 200 10 
chr1 100 150 40 
chr1 110 200 90 

什麼是優雅的方式來做到這一點? groupbyfirst_bp_intron並將unique_junction_reads中的值除以組的總和以獲得新列phi5。那麼同爲last_bp_intron新列phi3

chrom first_bp_intron last_bp_intron unique_junction_reads phi5 phi3 
chr1 100 200 10 0.2 0.1 
chr1 100 150 40 0.8 1.0 
chr1 110 200 90 1.0 0.9 

我慢,工作液,

json = '{"chrom":{"4010":"chr2","4011":"chr2","4012":"chr2","4013":"chr2","4014":"chr2","4015":"chr2","4016":"chr2","4017":"chr2","4018":"chr2","4019":"chr2","4020":"chr2","4021":"chr2","4022":"chr2","4023":"chr2","4024":"chr2","4025":"chr2"},"first_bp_intron":{"4010":50149390,"4011":50170930,"4012":50280729,"4013":50318633,"4014":50464109,"4015":50692700,"4016":50693626,"4017":50699610,"4018":50723234,"4019":50724853,"4020":50733756,"4021":50755790,"4022":50758569,"4023":50765775,"4024":51012497,"4025":51015345},"last_bp_intron":{"4010":50170841,"4011":50280408,"4012":50318460,"4013":50463926,"4014":50692579,"4015":50693598,"4016":50699435,"4017":50723042,"4018":50724470,"4019":50733632,"4020":50755762,"4021":50758364,"4022":50765390,"4023":50779724,"4024":51017681,"4025":51017681},"unique_junction_reads":{"4010":1,"4011":3,"4012":6,"4013":6,"4014":15,"4015":8,"4016":8,"4017":5,"4018":40,"4019":86,"4020":85,"4021":64,"4022":81,"4023":53,"4024":12,"4025":9}}' 

sj = pd.read_json(json) 

five_prime_reads = sj.groupby(('chrom', 'first_bp_intron')).apply(lambda x: x.unique_junction_reads.sum()) 
three_prime_reads = sj.groupby(('chrom', 'last_bp_intron')).apply(lambda x: x.unique_junction_reads.sum()) 


for (chrom, first_bp_intron , last_bp_intron), df in sj.groupby(['chrom', 'first_bp_intron', 'last_bp_intron']): 
    print chrom, last_bp_intron, 
    print '\tphi3', (df.unique_junction_reads/three_prime_reads[(chrom, last_bp_intron)]).values, 
    print '\tphi5', (df.unique_junction_reads/five_prime_reads[(chrom, first_bp_intron)]).values 

,但我敢肯定有來表達大熊貓這種願望更優雅的方式。

這裏是我想要做一個完整的IPython的筆記本:http://nbviewer.ipython.org/11418657

回答

10

我會做類似如下的使用groupbytransform

In [9]: by_first = df.groupby('first_bp_intron') 
In [10]: df['phi5'] = by_first['unique_junction_reads'].transform(lambda x: x/x.sum()) 

In [11]: by_last = df.groupby('last_bp_intron') 
In [12]: df['phi3'] = by_last['unique_junction_reads'].transform(lambda x: x/x.sum()) 

In [13]: df 
Out[13]: 
    chrom first_bp_intron last_bp_intron unique_junction_reads phi5 phi3 
0 chr1    100    200      10 0.2 0.1 
1 chr1    100    150      40 0.8 1.0 
2 chr1    110    200      90 1.0 0.9 
+0

真棒,一個'變換()'正是我所需要的!但是,你介意解釋'transform'和'apply'之間的區別嗎? –

+1

'apply'我們也會工作得很好。如果你用'apply'替換'transform',你應該得到相同的輸出。 「apply」是更一般的方法;當你想返回類似索引的東西時,'transform'是合適的。 –