2017-02-22 21 views
-3

這些只是在R中,我應該如何比較兩個數據表並給出所需的輸出,如下所示?

Sample1 
         col1       col2 
1  123001 124003 125699 348348 3493493 1230404 99930202 
2 29238822 293831232 992922 348348 3493493 1230404 99930202 

的Sample1已在每個單元(由空格分隔)

Sample2 
     col1 col2 
1  123001 CAT 
2 99930202 PIG 
3  124003 ODG 
4 1230404 CHAIN 
5 29238822 BAT 
6 293831232 MOUSE 
7 3493493 KIWI 
8  125699 JIN 
9  992922 ANIME 
10 348348 UPPE 

如果存在匹配於在樣品2表中的數字組數字,相應的一個巨大的數據幀的樣本sample2表的Col2中的值應該拉高。

最終期望的輸出將在下面。

Output 
        col1        col2   col3 
1  123001 124003 125699 348348 3493493 1230404 99930202  CAT ODG JIN 
2 29238822 293831232 992922 348348 3493493 1230404 99930202 BAT MOUSE ANIME 
       col4 
1 UPPE KIWI CHAIN PIG 
2 UPPE KIWI CHAIN PIG 

我試過使用不同的方法合併,加入,sqldf但無法得到我的輸出。任何人都可以幫忙嗎?

+0

你應該生成指定列名的數據幀。並且請詳細解釋你打算做什麼。此外,我建議您輸出您的數據框Sampl1和樣本2.它可以幫助您瞭解他們如何正確地查看和框架您的問題。 – user5249203

+0

對不起,修改了這個問題,謝謝! – ETA123

+0

分隔符是空格 – ETA123

回答

0
sample1$col3 = sapply(strsplit(sample1$col1, " "), function(a) 
        paste(sample2$text[match(a, sample2$id)], collapse = " ")) 

如果你想使用它的多列,然後做一個函數爲我們上面那樣,然後應用在它所需的列。

myf = function(x){ 
    return(sapply(strsplit(x, " "), function(a) 
     paste(sample2$text[match(a, sample2$id)], collapse = " "))) 
} 

sapply(sample1, myf) 

DATA

sample1 = structure(list(col1 = c("123001 124003 125699", "29238822 293831232 992922" 
     ), col2 = c("348348 3493493 1230404 99930202", "348348 3493493 1230404 99930202" 
     ), col3 = c("CAT ODG JIN", "BAT MOUSE ANIME")), .Names = c("col1", 
     "col2", "col3"), row.names = c(NA, -2L), class = "data.frame") 

sample2 = structure(list(id = c(123001L, 99930202L, 124003L, 1230404L, 
     29238822L, 293831232L, 3493493L, 125699L, 992922L, 348348L), 
      text = c("CAT", "PIG", "ODG", "CHAIN", "BAT", "MOUSE", "KIWI", 
      "JIN", "ANIME", "UPPE")), .Names = c("id", "text"), row.names = c(NA, 
     -10L), class = "data.frame") 
+1

這是一個很好的解決方案,使用帶R的單線轉換一列。您將如何應用此解決方案來轉換具有多列的數據框(多於OP中的2列)? – Uwe

1

這個問題可能有重複,所以我沒有花太多時間在Google上。

您可以使用meltdcastdata.table包嘗試此解決方案:

molten <- melt(Sample1, measure.vars = c("col1", "col2")) 
splitted <- molten[, strsplit(value, " "), by = .(rowid(variable), variable)] 
splitted[, V1 := as.integer(V1)] 
joined <- Sample2[splitted, on = c(id = "V1")] 
dcast(joined, rowid ~ variable, paste, collapse = " ", value.var = c("id", "text")) 
# rowid     id_col1       id_col2  text_col1 
#1:  1  123001 124003 125699 348348 3493493 1230404 99930202  CAT ODG JIN 
#2:  2 29238822 293831232 992922 348348 3493493 1230404 99930202 BAT MOUSE ANIME 
#    text_col2 
#1: UPPE KIWI CHAIN PIG 
#2: UPPE KIWI CHAIN PIG 

這種方法是獨立於Sample1列數的,也獨立於組數字在每個單元的大小。

數據

Sample1 <- 
structure(list(col1 = c("123001 124003 125699", "29238822 293831232 992922" 
), col2 = c("348348 3493493 1230404 99930202", "348348 3493493 1230404 99930202" 
)), .Names = c("col1", "col2"), row.names = c(NA, -2L), class = c("data.table", 
"data.frame")) 
Sample2 <- 
structure(list(id = c(123001L, 99930202L, 124003L, 1230404L, 
29238822L, 293831232L, 3493493L, 125699L, 992922L, 348348L), 
    text = c("CAT", "PIG", "ODG", "CHAIN", "BAT", "MOUSE", "KIWI", 
    "JIN", "ANIME", "UPPE")), .Names = c("id", "text"), row.names = c(NA, 
-10L), class = c("data.table", "data.frame")) 
相關問題