,我有以下的數據幀:如何在大熊貓選擇的列中的數據幀包含多個參數應用功能
import pandas as pd
data = {'gene':['a','b','c','d','e'],
'count':[61,320,34,14,33],
'gene_length':[152,86,92,170,111]}
df = pd.DataFrame(data)
df = df[["gene","count","gene_length"]]
,看起來像這樣:
In [9]: df
Out[9]:
gene count gene_length
0 a 61 152
1 b 320 86
2 c 34 92
3 d 14 170
4 e 33 111
我想要做的是應用功能:
def calculate_RPKM(theC,theN,theL):
"""
theC == Total reads mapped to a feature (gene/linc)
theL == Length of feature (gene/linc)
theN == Total reads mapped
"""
rpkm = float((10**9) * theC)/(theN * theL)
return rpkm
開,count
和gene_length
列和恆定N=12345
並將新結果命名爲'rpkm'。 但爲什麼這失敗?
N=12345
df["rpkm"] = calculate_RPKM(df['count'],N,df['gene_length'])
什麼是正確的做法? 第一行應該是這個樣子:
gene count gene_length rpkm
a 61 152 32508.366
更新:我得到的錯誤是這樣的:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-6270e1d19b89> in <module>()
----> 1 df["rpkm"] = calculate_RPKM(df['count'],N,df['gene_length'])
<ipython-input-1-48e311ca02f3> in calculate_RPKM(theC, theN, theL)
13 theN == Total reads mapped
14 """
---> 15 rpkm = float((10**9) * theC)/(theN * theL)
16 return rpkm
/u21/coolme/.anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in wrapper(self)
74 return converter(self.iloc[0])
75 raise TypeError(
---> 76 "cannot convert the series to {0}".format(str(converter)))
77 return wrapper
78
如果失敗,請打印出您正在收到的確切錯誤消息。這使人們更容易幫助你。 –