與合併/綁定/連接兩個數據幀的R問題

我是R的初學者，所以如果問題在其他地方提出，我提前道歉。這是我的問題：與合併/綁定/連接兩個數據幀的R問題

我有兩個數據幀，df1和df2，具有不同數量的行和列。這兩個框架只有一個共同稱爲「customer_no」的變量（列）。我希望合併框架僅匹配基於「customer_no」的記錄，並僅匹配df2中的行。兩個data.frames對於每個customer_no都有多行。

我試過如下：

merged.df <- (df1, df2, by="customer_no",all.y=TRUE)

的問題是，這種分配DF1的值DF2地方，而不是它應該是空的。我的問題是：

1）如何告訴命令將不匹配的列留空？ 2）如何從合併文件中看到哪個行來自哪個df？我猜如果我解決上述問題，這應該很容易看到空列。

我錯過了我的命令，但不知道是什麼。如果問題已在其他地方得到解答，那麼您是否還適合用R語言在英語中重新翻譯它？

謝謝！

數據例如：

df1: 
customer_no country year 
    10   UK  2001 
    10   UK  2002 
    10   UK  2003 
    20   US  2007 
    30   AU  2006 


df2:   
customer_no income 
    10   700 
    10   800 
    10   900 
    30   1000

合併後的文件應該是這樣的：

merged.df: 
customer_no income country year 
    10     UK  2001 
    10     UK  2002 
    10     UK  2003 
    10   700 
    10   800 
    10   900 
    30     AU  2006 
    30   1000

所以：它把列一起，它的最後一個右後增加的DF2值基於相同的customer_no的df1並且僅匹配來自df2的customer_no（merged.df沒有customer_no 20）。另外，它會留下所有其他單元。

在STATA中，我使用append但不確定在R ...也許加入？

謝謝！

來源

2014-10-08 Billaus

添加的數據。希望它足夠清楚......感謝您的幫助！ – Billaus 2014-10-08 14:16:10

這看起來更像一個合併/加入，是否有美國入境退出的原因？ – DMT 2014-10-08 14:22:10

DMT，是的原因是因爲它不在df2中。合併的df排除僅在df1中的值（不在df2中）。 – Billaus 2014-10-08 14:27:38

嘗試：

df1$id <- paste(df1$customer_no, 1, sep="_") 
df2$id <- paste(df2$customer_no, 2, sep="_") 

res <- merge(df1, df2, by=c('id', 'customer_no'),all=TRUE)[,-1] 
res1 <- res[res$customer_no %in% df2$customer_no,] 
res1 
# customer_no country year income 
#1   10  UK 2001  NA 
#2   10  UK 2002  NA 
#3   10  UK 2003  NA 
#4   10 <NA> NA 700 
#5   10 <NA> NA 800 
#6   10 <NA> NA 900 
#8   30  AU 2006  NA 
#9   30 <NA> NA 1000

如果你想改變NA到''，

res1[is.na(res1)] <- '' #But, I would leave it as `NA` as there are `numeric` columns.

或者，使用rbindlist從data.table（使用原來的數據集）

library(data.table) 
indx <- df1$customer_no %in% df2$customer_no 
rbindlist(list(df1[indx,], df2),fill=TRUE)[order(customer_no)] 

# customer_no country year income 
#1:   10  UK 2001  NA 
#2:   10  UK 2002  NA 
#3:   10  UK 2003  NA 
#4:   10  NA NA 700 
#5:   10  NA NA 800 
#6:   10  NA NA 900 
#7:   30  AU 2006  NA 
#8:   30  NA NA 1000

來源

2014-10-08 14:36:24 akrun

太棒了！謝謝！！這真是一場噩夢......這樣的解脫！:)）） – Billaus 2014-10-08 15:02:23

@Billaus沒問題。很高興幫助。 – akrun 2014-10-08 15:02:51

你可以也可以使用smartbind的功能gtools包。

require(gtools) 
res <- smartbind(df1[df1$customer_no %in% df2$customer_no, ], df2) 
res[order(res$customer_no), ] 
#  customer_no country year income 
# 1:1   10  UK 2001  NA 
# 1:2   10  UK 2002  NA 
# 1:3   10  UK 2003  NA 
# 2:1   10 <NA> NA 700 
# 2:2   10 <NA> NA 800 
# 2:3   10 <NA> NA 900 
# 1:4   30  AU 2006  NA 
# 2:4   30 <NA> NA 1000

來源

2014-10-08 14:46:37 shadow

這也適用！謝謝！！ – Billaus 2014-10-08 15:02:39

嘗試：

df1$income = df2$country = df2$year = NA 
rbind(df1, df2) 
    customer_no country year income 
1   10  UK 2001  NA 
2   10  UK 2002  NA 
3   10  UK 2003  NA 
4   20  US 2007  NA 
5   30  AU 2006  NA 
6   10 <NA> NA 700 
7   10 <NA> NA 800 
8   10 <NA> NA 900 
9   30 <NA> NA 1000

來源

2014-10-08 15:12:37 rnso

與合併/綁定/連接兩個數據幀的R問題

回答

相關問題