2016-01-18 44 views
0

我想將列的內容分成兩行,並複製行名稱。每個變量只包含兩個數字(11,12,13,14,21,22等或NA)。這是用於轉換爲STRUCTURE格式的一種常見的羣體遺傳學格式。將列的內容拆分爲兩行,用於轉換爲STRUCTURE格式

我有這樣的:

population  X354045 X430045 X995019 
Crater   <NA>  11  22 
Teton   11  31  11 

我想有這樣的:

population  X354045 X430045 X995019 
Crater   <NA>  1   2 
Crater   <NA>  1   2 
Teton   1  3   1 
Teton   1  1   1 

回答

2

這是一個data.table的問題,所以我只是建議這應該是相當快/高效率內置tstrsplit功能爲此事

讀你的數據

library(data.table) 
DT <- fread('population  X354045 X430045 X995019 
Crater   NA  11  22 
       Teton   11  31  11') 

解決方案(如果你有一個data.frame,使用setDT(DT)爲了轉換爲data.table

DT[, lapply(.SD, function(x) unlist(tstrsplit(x, ""))), by = population] 
# population X354045 X430045 X995019 
# 1:  Crater  NA  1  2 
# 2:  Crater  NA  1  2 
# 3:  Teton  1  3  1 
# 4:  Teton  1  1  1 
1

好了,這裏是我會怎麼做。讓我們創建一些數據:

vector <- c(10, 11, 12, NA, 13, 14, 15) 

首先,我們需要一個可以讓你打破每個兩位數成兩個數字(和NAS分爲兩個NAS):

as.numeric(sapply(vector, function(x) (x %% c(1e2,1e1)) %/% c(1e1,1e0))) 
# 1 0 1 1 1 2 NA NA 1 3 1 4 1 5 

現在我們需要做的就是這適用於所有相關的列:

DF <- data.frame(population = c("Crater", "Teton"), X354045 = c(NA, 11), X430045 = c(11, 31), X995019 = c(22, 11)) 
DF2 <- apply(DF[-1], 2, function(y) as.numeric(sapply(y, function(x) (x %% c(1e2,1e1)) %/% c(1e1,1e0)))) 

最後,我們還是用了新的人口柱結合起來:

population <- as.character(rep(DF$population, each = 2)) 
DF3 <- cbind(population, data.frame(DF2)) 
1
dd <- read.table(header = TRUE, text = 'population  X354045 X430045 X995019 
Crater   NA  11  22 
Teton   11  31  11') 

nr <- nrow(dd) 
dd <- dd[rep(1:2, each = nr), ] 

#  population X354045 X430045 X995019 
# 1  Crater  NA  11  22 
# 1.1  Crater  NA  11  22 
# 2  Teton  11  31  11 
# 2.1  Teton  11  31  11 


dd[, -1] <- lapply(dd[, -1], function(x) { 
    idx <- (seq_along(x) %% 2 == 0) + 1L 
    substr(x, idx, idx) 
}) 

#  population X354045 X430045 X995019 
# 1  Crater <NA>  1  2 
# 1.1  Crater <NA>  1  2 
# 2  Teton  1  3  1 
# 2.1  Teton  1  1  1 

或者只是

dd <- dd[rep(1:2, each = nr), ] 
dd[, -1] <- lapply(dd[, -1], function(x) 
    Vectorize(substr)(x, rep(1:2, nr), rep(1:2, nr))) 

會工作


而且在data.table感謝@DavidArenburg

library('data.table') 
dd <- read.table(header = TRUE, text = 'population  X354045 X430045 X995019 
    Crater   NA  11  22 
       Teton   11  31  11') 


setDT(dd)[rep(1:2, each = .N), lapply(.SD, substr, 1:2, 1:2), by = population] 

# population X354045 X430045 X995019 
# 1:  Crater  NA  1  2 
# 2:  Crater  NA  1  2 
# 3:  Teton  1  3  1 
# 4:  Teton  1  1  1 

或類似的,同樣的想法,但避免了by部分

dd <- setDT(dd)[rep(1:2, each = .N)] 
dd[, 2:4 := dd[ ,lapply(.SD, substr, 1:2, 1:2), .SD = -1]] 

,如果你是一個大數據的工作集

+1

感謝@DavidArenburg更新,現在大家一定會想到我使用數據表! – rawr

+0

你明確表示這是我的一個好主意 –