2011-05-18 14 views
0

我有一個像變量k的數據幀。列all_possible_names包含ILMN代碼的更多標識符。 現在我想在all_possible_names列中搜索數據框標識符中可用的標識符。R:匹配數據幀並過濾這些

z <- matrix(c(0,0,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,1,1,1,1,0,0,0,"RND1 | AB849382 | uc001aeu.1","WDR | AB361738 | uc001aif.1", "PLAC8 | AB271612 | uc001amd.1","TYBSA | AB859482","GRA | AB758392 | uc001aph.1","TAF | AB142353"), nrow=6, 
dimnames=list(c("ILMN_1651838","ILMN_1652371","ILMN_1652464","ILMN_1652952","ILMN_1653026","ILMN_1653103"),c("A","B","C","D","all_possible_names"))) 
k<-as.data.frame(z) 

search<-c("AB361738","RND1", "LIS") 
identifier <- as.data.frame(search) 

的結果必須是這樣的:

search Names 
1 AB361738 WDR | AB361738 | uc001aif.1 
2  RND1 RND1 | AB849382 | uc001aeu.1 
3  LIS NA 

創建這個數據幀後,最終的輸出可以被創建。列名只能包含以uc0開頭的命名。

最終的結果比將是:

search Names 
1 AB361738 uc001aif.1 
2  RND1 uc001aeu.1 
3  LIS NA 

誰能幫助我?

非常感謝, Lisanne

回答

1

可能不是最好的辦法,但辦法:

firstStep<-lapply(srch, grep, k$all_possible_names, fixed=TRUE, value=TRUE) 
res<-lapply(firstStep, function(subres){ 
     prts<-unlist(strsplit(subres, " | ", fixed=TRUE)) 
     prts[which(substr(prts, 1, 3)=="uc0")] 
    }) 

返回的結果爲一個列表,因爲你可能不相信,只有一個結果每個搜索字符串。

+0

謝謝!這對我行得通。 – Lisann 2011-05-18 10:12:45