樞軸熊貓數據幀使用掩蔽

甲非索引DF包含基因的行中，包含在該基因中的突變的細胞，和突變的在該基因的類型的文件：樞軸熊貓數據幀使用掩蔽

df = pd.DataFrame({'gene': ['one','one','one','two','two','two','three'], 
         'cell': ['A', 'A', 'C', 'A', 'B', 'C','A'], 
         'mutation': ['frameshift', 'missense', 'nonsense', '3UTR', '3UTR', '3UTR', '3UTR']})

DF：

cell gene mutation 
0 A one frameshift 
1 A one missense 
2 C one nonsense 
3 A two  3UTR 
4 B two  3UTR 
5 C two  3UTR 
6 A three  3UTR

我想旋轉此df，以便我可以通過基因進行索引並將列設置爲單元格。問題在於每個細胞可能有多個條目：給定細胞中的任何一個基因可能存在多個突變（細胞A在基因One中有兩個不同的突變）。所以，當我運行：

df.pivot_table(index='gene', columns='cell', values='mutation')

發生這種情況：

DataError: No numeric types to aggregate

我想使用屏蔽來執行，同時捕捉的存在樞軸在至少一個突變：

 A B C 
gene   
one 1 1 1 
two 0 1 0 
three 1 1 0

來源

2016-12-16 Thomas Matthew

解決方案與drop_duplicates和pivot_table：

df = df.drop_duplicates(['cell','gene']) 
     .pivot_table(index='gene', 
        columns='cell', 
        values='mutation', 
        aggfunc=len, 
        fill_value=0) 
print (df) 
cell A B C 
gene   
one 1 0 1 
three 1 0 0 
two 1 1 1

與drop_duplicates，groupby與骨料size和最後重塑另一種解決方案由unstack：

df = df.drop_duplicates(['cell','gene']) 
     .groupby(['cell', 'gene']) 
     .size() 
     .unstack(0, fill_value=0) 
print (df) 
cell A B C 
gene   
one 1 0 1 
three 1 0 0 
two 1 1 1

來源

2016-12-16 06:12:45 jezrael

錯誤消息不是在運行pivot_table時產生的。您可以在pivot_table的索引中包含多個值。我不相信這是pivot方法。但是，您可以通過將聚合更改爲適用於字符串的東西來解決您的問題，而不是數字。大多數聚合函數在數字列上運行，並且上面編寫的代碼會產生與列的數據類型相關的錯誤，而不是索引錯誤。

df.pivot_table(index='gene', 
       columns='cell', 
       values='mutation', 
       aggfunc='count', fill_value=0)

如果您只希望每個單元格有1個值，您可以執行groupby並將所有內容合計爲1，然後拆除某個級別。

df.groupby(['cell', 'gene']).agg(lambda x: 1).unstack(fill_value=0)

來源

2016-12-16 05:53:28

樞軸熊貓數據幀使用掩蔽

回答

相關問題