2011-05-26 36 views
1

我正在使用具有不同數據(每個組織數據的蛋白質)的9個文件。每個文件代表一個不同的組織,並具有蛋白質表達值(如數字)。我正在嘗試將數據合併到一個data.frame中。我用合併不同列長度和操縱列的幾個數據幀。

read.delim("fileName.txt") 

所有的文件。在那之後,我用一個列表中的所有數據幀

l <- list(data.frame1,..etc) 

然後我用了plyr庫和do.call(rbind.fill,l)

我的問題:

1)我要遍歷的9個data.frames列表中找到獨特的數據在其中,並在直方圖繪製。如果我發現多個具有相同名稱但不同組織的條目,則應將其添加到正確組織標籤上方的直方圖中。那就是 - 我轉到列表中的第一個data.frame,從中取出第一個條目,搜索在其他data.frames中是否找到該條目,如果是,則將其添加到直方圖中。

直方圖在x軸上有9個組織,y軸是我的文件中的值。我不知道如何讓直方圖(和代碼)正確地更改名稱以及如何在正確的位置顯示條形圖。

另外我不知道如何建立軸來獲取每個欄下的組織名稱。

我有沒有做我想要的一些基本代碼:

i=1 

for(val in list2[1:9]) 
{ 
    if(val appears in one of the other data.frames) 
      plot a bar over the correct tissue. 

    hist(val[i,8],breaks=11,col="blue",density=13,angle=45, 
      labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver", 
      "BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1]) 
    dev.new() #each hist in a new window 
    i = i + 1 

} 

謝謝 yigeal

這是代碼的輸出結束的幾行: 讀後與read.delim( 「nameOfFile.txt」)的文件

dput(BloodErythroleukemicCellLineFile) 
"Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein", 
    "Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598", 
    "Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609", 
    "Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610", 
    "Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613", 
    "Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614", 
    "Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622", 
    "Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646", 
    "Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B", 
    "Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a", 
    "Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691", 
    "Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700", 
    "Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714", 
    "Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)", 
    "Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721", 
    "Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76", 
    "Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782", 
    "Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787", 
    "Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800", 
    "Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828", 
    "Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837", 
    "Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878", 
    "Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891", 
    "Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6", 
    "Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog", 
    "Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B", 
    "Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin", 
    "Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin" 
    ), class = "factor")), .Names = c("proteinIdentifier", "protein", 
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage", 
"expression_value", "expression_percentile", "organism", "tissue", 
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA, 
-4802L)) 

它是更長的時間在控制檯

+1

我編輯了你的問題,使其更易讀。請每個問題只問一個問題。有關plyr庫的全部內容,請參閱手冊。 '?rbind.fill'會告訴你所有你需要知道的信息。 – 2011-05-26 14:41:36

+2

你可以爲你的兩個data.frames(或者至少是它們的頂部行)提供dput輸出,所以我們有一些東西可以使用嗎? – 2011-05-26 15:00:48

回答

1

在你的問題中找到問題的核心並不容易。 在使用一些共同項目合併數據幀(或域),您可以使用合併()函數,如:

merge(dataframe1, dataframe2, by=c('column_name1','column_name2'), suffixes=c('.from_df1','.from_df2')) 

如果要選擇行或列,你可以做這樣的:

dataframe1[dataframe$column1 == 'some_value", c('col1', 'col2')] 

etc ... 這對你有幫助嗎?