2017-04-13 24 views
0

我有相應疾病OMIM基因列表(約15000個基因),看起來像這樣:安排數據行中的R

SLC6A8,CRTR,CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3) 
BCAP31,BAP31,DXS1357E,DDCH Deafness, dystonia, and cerebral hypomyelination 
ABCD1,ALD,AMN Adrenoleukodystrophy, 300100 (3), X-linked recessive 
PLXNB3,PLXN6 NA 

對於某些疾病,我們與疾病相關的多個基因名。我想這個組織,所以我必須每行只有一個genename和相關疾病:

SLC6A8 Cerebral creatine deficiency syndrome 1, 300352 (3) 
CRTR Cerebral creatine deficiency syndrome 1, 300352 (3) 
CCDS1 Cerebral creatine deficiency syndrome 1, 300352 (3) 

難道這在R上做了什麼?

+1

*「?難道這在R上做」 *的data.frame - 最有可能的,但你到目前爲止嘗試過什麼?這不是一個代碼寫入服務。 – nrussell

+0

我一直在使用R的幾件事情,但在這種情況下,我不知道我到底需要看什麼。我只想提示,不一定是代碼! – VasoGene

回答

1

不完全確定你有什麼樣的數據結構。這裏有一個快速的解決方案,希望對您有所幫助在找什麼:

splitFn <- function(x) expand.grid(df[x,"a"] %>% as.character %>% strsplit(., ",") %>% unlist, df[x, "b"]) 
ldply(1:nrow(df), splitFn) 

     Var1            Var2 
1 SLC6A8 Cerebral creatine deficiency syndrome 1, 300352(3) 
2  CRTR Cerebral creatine deficiency syndrome 1, 300352(3) 
3  CCDS1 Cerebral creatine deficiency syndrome 1, 300352(3) 
4 BCAP31 Deafness, dystonia, and cerebral hypomyelination 
5  BAP31 Deafness, dystonia, and cerebral hypomyelination 
6 DXS1357E Deafness, dystonia, and cerebral hypomyelination 
7  DDCH Deafness, dystonia, and cerebral hypomyelination 
8  ABCD1 Adrenoleukodystrophy, 300100(3), X-linked recessive 
9  ALD Adrenoleukodystrophy, 300100(3), X-linked recessive 
10  AMN Adrenoleukodystrophy, 300100(3), X-linked recessive 
11 PLXNB3            <NA> 
12 PLXN6            <NA> 

我會用

df <- structure(list(a = structure(c(4L, 2L, 1L, 3L), .Label = c("ABCD1,ALD,AMN", 
"BCAP31,BAP31,DXS1357E,DDCH", "PLXNB3,PLXN6", "SLC6A8,CRTR,CCDS1" 
), class = "factor"), b = structure(c(1L, 3L, 2L, NA), .Label = c(" Cerebral 
creatine deficiency syndrome 1, 300352(3)", 
"Adrenoleukodystrophy, 300100(3), X-linked recessive", "Deafness, dystonia, and cerebral hypomyelination" 
), class = "factor")), .Names = c("a", "b"), row.names = c(NA, 
-4L), class = "data.frame")```