2015-12-15 97 views
5

我有一個數據幀,名爲dd2。我需要粘貼Left.Gene.SymbolsRight.Gene.Symbols中的值,我可以通過簡單地使用下面的代碼來完成這些操作,但如果缺少值,我不希望粘貼NDA。我希望它看起來像在combination列中,如result所示。忽略NA值,同時在R中粘貼兩個列值

mycode的

#to remove NAs 
dd2[dd2 == 'NA'] <- NA 
#pasting values together 
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*")) 

數據

dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP", 
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT", 
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id", 
"Left.Gene.Symbols", "Right.Gene.Symbols"))) 

結果

customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combination 
[1,] "AMLM12001KP"  "AK2"    NA     AK2* 
[2,] "AMLM12001KP"  "HFM1"   "PPT"     HFM1*PPT 
[3,] "AMLM12001KP"  "HFM1"   NA     HFM1* 
[4,] "AMLM12001KP"  "HFM1"   "GGT"     HFM1*GGT 
[5,] "AMLM12001KP"  "HFM1"   NA     HFM1* 
+1

@RonakShah對不起只是糾正了。 – MAPK

回答

3

你可以這樣做,用空字符""暫時替換NA的值。

cbind(
    dd2, 
    combination = paste(dd2[,2], replace(dd2[,3], is.na(dd2[,3]), ""), sep = "*") 
) 
#  customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combinations 
# [1,] "AMLM12001KP"  "AK2"    NA     "AK2*"  
# [2,] "AMLM12001KP"  "HFM1"   "PPT"    "HFM1*PPT" 
# [3,] "AMLM12001KP"  "HFM1"   NA     "HFM1*"  
# [4,] "AMLM12001KP"  "HFM1"   "GGT"    "HFM1*GGT" 
# [5,] "AMLM12001KP"  "HFM1"   NA     "HFM1*"  

當然,將您的列名替換爲上面的列號。我沒有寫他們,因爲他們太長了。

+0

非常感謝。所以如果我在第2列也有NAs,我可以直接使用替換(dd2 [,2],is.na(dd2 [,2])? – MAPK

+1

@MAPK - 是的,但是這個調用是'replace(dd2 [,2],is.na(dd2 [,2])「」)'如果你願意的話,你可以爲整個矩陣做。 –

2

一種使用方式ifelse

ifelse(is.na(dd2[,3]),paste0(dd2[,2],"*"),paste(dd2[,2],dd2[,3],sep="*")) 

#[1] "AK2*"  "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*" 
+0

不能在這裏使用sub,因爲一些基因名稱的名字中包含NA。如在MPNA,TTNA中,它將刪除NA部分? – MAPK

+1

@MAPK更新了答案 –

2

我們可以使用NAerqdapsprintf

library(qdap) 
sprintf('%s*%s', dd2[,2],NAer(dd2[,3],'')) 
#[1] "AK2*"  "HFM1*PPT" "HFM1*" "HFM1*GGT" "HFM1*"