熊貓：根據字典拆分和編輯文件

我是熊貓新手，在解決以下問題時遇到了一些麻煩。我有兩個文件需要用來創建輸出。第一個文件包含功能和相關基因的列表。文件的例子（有明顯完全由數據）熊貓：根據字典拆分和編輯文件

File 1: 

Function Genes 
Emotions HAPPY,SAD,GOOFY,SILLY 
Walking LEG,MUSCLE,TENDON,BLOOD 
Singing VOCAL,NECK,BLOOD,HAPPY

我讀入字典使用：

from collections import * 

FunctionsWithGenes = defaultdict(list) 

def read_functions_file(File): 
    Header = File.readline() 
    Lines = File.readlines() 
    for Line in Lines: 
     Function, Genes = Line[0], Line[1] 
     FunctionsWithGenes[Function] = Genes.split(",") # the genes for each function are in the same row and separated by commas

第二個表包含我需要在一個.txt的所有信息包含基因的例如列文件：

chr start end Gene Value MoreData 
chr1 123 123 HAPPY 41.1 3.4 
chr1 342 355 SAD 34.2 9.0 
chr1 462 470 LEG 20.0 2.7

，我在使用閱讀：

import pandas as pd 

df = pd.read_table(File)

數據幀包含多列，其中之一是「基因」。此列可以包含可變數量的條目。我想通過FunctionsWithGenes字典中的「Function」鍵分割數據框。到目前爲止，我有：

df = df[df["Gene"].isin(FunctionsWithGenes.keys())] # to remove all rows with no matching entries

現在我需要基於基因功能以某種方式分拆數據框。我想也許是想增加一個新的基因功能的列，但不知道這是否會起作用，因爲一些基因可以有多個功能。

來源

2014-09-23 user2165857

你可以添加一個簡短的演示片的每個文件的內容是什麼？它可以幫助你理解你想要的東西。 – Ajean 2014-09-23 20:06:20

我有點糊塗了由代碼的最後一行：

df = df[df["Gene"].isin(FunctionsWithGenes.keys())]

自FunctionsWithGenes鍵是實際的功能（Emotions等...），但基因列具有值。由此產生的DataFrame將始終爲空。

如果我正確理解你，你想分割表格，以便屬於一個函數的所有基因都在一個表中，如果是這樣的話，你可以使用簡單的字典理解，我設置了一些變量與你相似：

>>> for function, genes in FunctionsWithGenes.iteritems(): 
...  print function, genes 
... 
Walking ['LEG', 'MUSCLE', 'TENDON', 'BLOOD'] 
Singing ['VOCAL', 'NECK', 'BLOOD', 'HAPPY'] 
Emotions ['HAPPY', 'SAD', 'GOOFY', 'SILLY'] 
>>> df 
    Gene Value 
0 HAPPY 3.40 
1 SAD 4.30 
2 LEG 5.55

然後我分裂了的DataFrame這樣的：

>>> FunctionsWithDf = {function:df[df['Gene'].isin(genes)] 
...  for function, genes in FunctionsWithGenes.iteritems()}

現在FunctionsWithDf是映射Function到DataFrame與人字典L行，其Gene列是在FunctionsWithGenes[Function]

的值。例如：

>>> FunctionsWithDf['Emotions'] 
    Gene Value 
0 HAPPY 3.4 
1 SAD 4.3 
>>> FunctionsWithDf['Singing'] 
    Gene Value 
0 HAPPY 3.4

來源

2014-09-24 00:40:09 Mike

實際上有沒有辦法在一個數據框中獲得結果，但是對於屬於重複函數的行重複條目？ – user2165857 2014-09-29 16:08:55

熊貓：根據字典拆分和編輯文件

回答

相關問題