組成行和選擇特定值（R）

我想在「type1」列和「type2」列之間以interconection（兩種方式）形成一組行。邏輯是：如果「type1」中的字符串與「type2」列中的同一行中的字符串在同一組中。但是，如果「類型2」不止一行，則所有這些都在同一組中。組成行和選擇特定值（R）

請看前3行：「gain_765」和「loss_1136」是相關的。但是，「loss_1136」與「gain_766」有關，而次「gain_766」與「loss_765」有關。然後這些是我的組：1-「gain_765」，2-「loss_1136」，3-「gain_766」，4-「loss_765」。

在這個組裏，我想在組的第一行用「chrx」中的字符串做一個新行; 「startx」和「starty」中的最小值; 「endx」和「endy」中的值更大。按照我的數據的一個例子：

type1  chrx  startx endx chry starty endy type2 
gain_765 chr15 9681969 9685418 chr15 9660912 9712719 loss_1136 
gain_766 chr15 9706682 9852347 chr15 9660912 9712719 loss_1136 
gain_766 chr15 9706682 9852347 chr15 9765125 9863990 loss_765 
gain_780 chr20 9706682 9852347 ch20 9765125 9863990 loss_769 
gain_760 chr15 9706682 9852347 chr15 9660912 9712719 loss_1137 
gain_760 chr15 9706682 9852347 chr15 9765125 9863990 loss_763

爲第一組（1號線3），這是預期的輸出：

chr  start  end 
chr15 9660912 9863990

現在，請大家在4號線一看：「gain_780」被僅與「loss_769」有關。是這個組（只是第4行）預期的輸出：

chr  start  end 
chr20  9706682 9863990

現在，第5行和第6行組由「gain_760」組成; 「loss_1137」和「loss_763」。在這最後一種情況下，預期的輸出是：

chr  start  end 
chr15  9660912 9863990

但是，我有很多這種情況在成千上萬行。因此，我需要所有結果的獨特輸出，如下所示：

chr  start  end 
chr15 9660912 9863990 
chr20 9706682 9863990 
chr15 9660912 9863990

乾杯。

來源

2014-02-11 user3091668

在你的小例子中，第一組似乎也包含'gain_760'，因爲它連接到第一組中的'loss_1136' ......我錯了嗎？ – digEmAll

你是對的！這是我的錯，對此很抱歉。我修改了這個例子。請現在看看。謝謝 – user3091668

所有重複的type1字符串都是後續行。然後，如果「gain_765」與多個「type2」字符串相關，則它總是顯示在下面的行中。它回答你的問題？ – user3091668

你可以做如下：

library(igraph) 

DF <- read.csv(text= 
"type1,chrx,startx,endx,chry,starty,endy,type2 
gain_765,chr15,9681969,9685418,chr15,9660912,9712719,loss_1136 
gain_766,chr15,9706682,9852347,chr15,9660912,9712719,loss_1136 
gain_766,chr15,9706682,9852347,chr15,9765125,9863990,loss_765 
gain_780,chr20,9706682,9852347,ch20,9765125,9863990,loss_769 
gain_760,chr15,9706682,9852347,chr15,9660912,9712719,loss_1137 
gain_760,chr15,9706682,9852347,chr15,9765125,9863990,loss_763", 
stringsAsFactors=F) 

# create a graph with the relations type1 --> type2 
# you can visualize it using: plot(g) 
g <- graph.data.frame(DF[,c('type1','type2')]) 

# decompose in the connected components 
subgraphs <- decompose.graph(g,mode="weak") 

# create the sub data.frames using the subgraphs vertices 
subDFs <- lapply(subgraphs, 
       FUN=function(sg){ 
         v <- V(sg)$name; 
         DF[DF$type1 %in% v | DF$type2 %in% v,]; 
        } 
       ) 

# create the single-line data.frames for each group 
subRes <- lapply(subDFs, 
       FUN=function(sd){ 
         data.frame(chrx=sd$chrx[1], 
            start=min(c(sd$startx,sd$starty)), 
            end=max(c(sd$endx,sd$endy))) 
        } 
       ) 

# merge the result in one single data.frame 
res <- do.call(rbind.data.frame,subRes) 

res 
> 
    chrx start  end 
1 chr15 9660912 9863990 
2 chr20 9706682 9863990 
3 chr15 9660912 9863990

步驟2和3（創造subgraphs和subDFs）可以在一個步驟通過將函數的代碼在第3步中的函數中完成第二步。
我離開他們分開更清楚。

來源

2014-02-11 20:19:39 digEmAll

組成行和選擇特定值（R）

回答

相關問題