R - 過早退出循環

我重新提出一個問題，我試圖簡化我的數據集並給出我想要的輸出示例。如果這仍然複雜，請隨時發表評論，可能會幫助我澄清這一點。R - 過早退出循環

我有一個表，我已經分組的功能，具有類似的rt和mz。

  orig_feat mz_mid rt_mid similar_feature 
1   f_1 685.4350 466.5    f_1 
2   f_2 260.1655 245.0    f_2 
185   f_2 260.1665 256.5   f_185 
408   f_2 260.1670 239.0   f_408 
2334  f_2 260.1650 250.0   f_2334 
3   f_3 288.1980 276.0    f_3 
7   f_3 288.1990 289.0    f_7 
414   f_3 288.1970 275.0   f_414 
2181  f_3 288.1980 270.0   f_2181 
2969  f_3 288.1965 297.5   f_2969 
4   f_4 537.3915 454.5    f_4 
2271  f_4 537.3965 435.5   f_2271 
5   f_5 439.2990 153.5    f_5 
6   f_6 325.0690 210.5    f_6 
10   f_6 325.0685 227.0   f_10 
747   f_6 325.0685 184.5   f_747 
2068  f_6 325.0695 225.0   f_2068 
2929  f_6 325.0685 218.0   f_2929 
2970  f_6 325.0680 237.0   f_2970 
31   f_7 288.1980 276.0    f_3 
71   f_7 288.1990 289.0    f_7 
4141  f_7 288.1970 275.0   f_414 
21811  f_7 288.1980 270.0   f_2181 
29691  f_7 288.1965 297.5   f_2969

我想列出每個組的條目。所有具有相同$ orig_feat的行都應該進行「分組」，對於這些「分組」中的每一個，我都需要一個包含所有功能的向量。請參閱下面的示例輸出。

$grf_1 
[1] "f_1" 

$grf_2 
[1] "f_2" "f_185" "f_408" "f_2334" 

$grf_3 
[1] "f_3" "f_7" "f_414" "f_2181" "f_2969" 

$grf_4 
[1] "f_4" "f_2771" 

$grf_5 
[1] "f_5" 

$grf_6 
[1] "f_6" "f_10" "f_747" "f_2068" "f_2929" "f_2970"

但重要的是我希望這是非冗餘（如gf_3：包含F_7，f_414，f_2181，f_2696，所以當我到達F_7我不會讓一個組F_7作爲F_3組媒體鏈接包含f_7組中的所有功能）

下面是我的代碼，因爲它代表。目前，產生的輸出在grf_3之後停止。我不知道爲什麼它似乎過早地退出循環。

mkFeatGroupsList<-function(simFeatsTab){ 
    features_seen<-vector() 
    GroupingList<-list() 
    counter=1 
    for (i in 1:length(unique(simFeatsTab$orig_feat))){ 
    orig_feat2Grp<-simFeatsTab$orig_feat[i] 
    if (orig_feat2Grp%in%features_seen == TRUE) next 
    matchingFeats<-subset(simFeatsTab,orig_feat==orig_feat2Grp)$feature 
    grFeatNm<-paste("grf_",counter,sep="") 
    GroupingList[[grFeatNm]]<-matchingFeats 
    features_seen<-c(features_seen,matchingFeats) 
    counter=counter+1 
    } 
    return(GroupingList) 
}

因爲您需要測試數據。

> dput(simFeatsTab.10.30.test) 
structure(list(orig_feat = structure(c(1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 
7L, 7L), .Label = c("f_1", "f_2", "f_3", "f_4", "f_5", "f_6", 
"f_7"), class = "factor"), mz_mid = c(685.435, 260.1655, 260.1665, 
260.167, 260.165, 288.198, 288.199, 288.197, 288.198, 288.1965, 
537.3915, 537.3965, 439.299, 325.069, 325.0685, 325.0685, 325.0695, 
325.0685, 325.068, 288.198, 288.199, 288.197, 288.198, 288.1965 
), rt_mid = c(466.5, 245, 256.5, 239, 250, 276, 289, 275, 270, 
297.5, 454.5, 435.5, 153.5, 210.5, 227, 184.5, 225, 218, 237, 
276, 289, 275, 270, 297.5), similar_feature = c("f_1", "f_2", 
"f_185", "f_408", "f_2334", "f_3", "f_7", "f_414", "f_2181", 
"f_2969", "f_4", "f_2271", "f_5", "f_6", "f_10", "f_747", "f_2068", 
"f_2929", "f_2970", "f_3", "f_7", "f_414", "f_2181", "f_2969" 
)), .Names = c("orig_feat", "mz_mid", "rt_mid", "similar_feature" 
), class = "data.frame", row.names = c("1", "2", "185", "408", 
"2334", "3", "7", "414", "2181", "2969", "4", "2271", "5", "6", 
"10", "747", "2068", "2929", "2970", "31", "71", "4141", "21811", 
"29691"))

來源

2015-11-16 user2814482

我繼續這樣：

分割你的數據幀由orig_feat（我把它叫做feat）
使用sapply
環通，以獲得相關功能相關功能並消除重複項目

換算成：

feat.split <- split(feat, my.df$orig_feat) 

sim.feat <- sapply(feat.split, function(x){x$similar_feature}) 

for (i in 2:length(sim.feat)) 
    { 
    # Get all of the previous features 
    prev.feat <- do.call("c", sim.feat[1:(i-1)]) 

    # Remove features already used 
    sim.feat[[i]] <- sim.feat[[i]][!sim.feat[[i]] %in% prev.feat] 
    }

來源

2015-11-16 14:02:13 nico

謝謝，這是偉大的。現在刪除以前的功能後，有一些空的元素。我試圖刪除它們，但它不起作用。其他建議？ sim.feat <-lapply（sim.feat，function（f）f [length（f）> 0]） – user2814482

@ user2814482：嘗試'sim.feat [sapply（sim.feat，length）> 0] – nico

另一種解決方案可以使用igraph包：

require(igraph) 
x<-graph.data.frame(df[,c(1,4)]) 
#You can also take a look with plot(x) 
res<-clusters(x) 
split(names(res$membership),res$membership) 
#$`1` 
#[1] "f_1" 
#$`2` 
#[1] "f_2" "f_185" "f_408" "f_2334" 
#$`3` 
#[1] "f_3" "f_7" "f_414" "f_2181" "f_2969" 
#$`4` 
#[1] "f_4" "f_2271" 
#$`5` 
#[1] "f_5" 
#$`6` 
#[1] "f_6" "f_10" "f_747" "f_2068" "f_2929" "f_2970"

來源

2015-11-16 14:08:21 nicola

R - 過早退出循環

回答

相關問題