我在這裏問了一個問題Finding the index based on two data frames of strings,我得到了一個完美的答案。 現在我一直面臨着另一個我無法解決的問題。如果我的第二個數據是多列,然後我就可以解決它的基礎上根據不同長度的字符串操縱兩個數據幀
setDT(strs)[, c('colids1','colids2') := lapply(.SD, function(x) toString(which(colSums(lut == x, na.rm=TRUE) > 0))), by = 1:nrow(strs)][]
只要這是確定作爲我的第二個數據序列(STR)在所有列 長度相同,但如果他們改變(不相同的長度),那麼這是行不通的,並給我一個錯誤。
所以我們說,我的第一個數據是
lut <- structure(list(V1 = c("O75663", "O95400", "O95433", NA, NA),
V2 = c("O95456", "O95670", NA, NA, NA), V3 = c("O75663",
"O95400", "O95433", "O95456", "O95670"), V4 = c("O95456",
"O95670", "O95801", "P00352", NA), V1 = c("O75663", "O95400",
"O95433", NA, NA), V2 = c("O95456", "O95670", NA, NA, NA),
V3 = c("O75663", "O95400", "O95433", "O95456", "O95670"),
V4 = c("O95456", "O95670", "O95801", "P00352", NA)), .Names = c("V1",
"V2", "V3", "V4", "V1", "V2", "V3", "V4"), row.names = c(NA,
-5L), class = "data.frame")
和我的第二個數據是
strs <- structure(list(strings = structure(c(2L, 3L, 4L, 5L, 6L, 7L,
1L, 1L), .Label = c("", "O75663", "O95400", "O95433", "O95456",
"O95670", "O95801"), class = "factor"), strings2 = structure(c(4L,
2L, 6L, 5L, 3L, 1L, 1L, 1L), .Label = c("", "O75663", "O95433",
"O95456", "P00352", "P00492"), class = "factor"), strings3 = structure(c(4L,
6L, 7L, 8L, 2L, 3L, 5L, 1L), .Label = c("", "O75663", "O95400",
"O95456", "O95670", "O95801", "P00352", "P00492"), class = "factor"),
strings4 = structure(c(2L, 5L, 3L, 4L, 1L, 1L, 1L, 1L), .Label = c("",
"O95400", "O95456", "O95801", "P00492"), class = "factor"),
strings5 = structure(c(8L, 2L, 7L, 1L, 3L, 6L, 5L, 4L), .Label = c("O75663",
"O95400", "O95433", "O95456", "O95670", "O95801", "P00352",
"P00492"), class = "factor")), .Names = c("strings", "strings2",
"strings3", "strings4", "strings5"), class = "data.frame", row.names = c(NA,
-8L))
這就是我試圖做
df<- setDT(strs)[, paste0('colids_',seq_along(strs)) := lapply(.SD, function(x) toString(which(colSums(lut == x, na.rm=TRUE) > 0))), by = 1:nrow(strs)][]
它的工作原理,如果長度strs是相同的,但它不起作用,當長度變化時,我給這裏的例子
錯誤很明顯。試試這個'strs [c(1:3,5)] < - lapply(strs [c(1:3,5)],as.character)'然後運行你的'data.table'語句。由此產生的'df'是否符合您的期望? – Sumedh
@Sumedh謝謝你的消息,它不能解決問題。我做了你所說的然後我做了df < - setDT(strs)[,paste0('colids _',seq_along(strs)):= lapply(.SD,function(x)toString(which(colSums(lut == x,na.rm = TRUE)> 0))),by = 1:nrow(strs)] []然後得到同樣的錯誤。 – nik
@Sumedh我一直在嘗試在網絡上提供的每一個評論,但我不知道爲什麼它不工作! – nik