我想將一個數據框追加到另一個（主數據框）。問題是他們的列中只有一部分是常見的。另外，它們列的順序可能不同。如果某些列常見，則將數據框追加到主數據框中

主數據框：

a b c 
r1 1 2 -2 
r2 2 4 -4 
r3 3 6 -6 
r4 4 8 -8

新數據框中：

 d a c 
r1 -120 10 -20 
r2 -140 20 -40

預期結果：

a b c 
r1 1 2 -2 
r2 2 4 -4 
r3 3 6 -6 
r4 4 8 -8 
r5 10 NaN -20 
r6 20 NaN -40

的是是否有這樣做的聰明方式？ This是一個類似的問題，但設置是不同的。

來源

2015-12-14 Szilard

查看dplyr包中的bind_rows函數。默認情況下，它會爲你做一些很好的事情，比如填寫一個data.frame中的列，但不是其他的NAs，而不僅僅是失敗。這裏有一個例子：

# Use the dplyr package for binding rows and for selecting columns 
library(dplyr) 

# Generate some example data 
a <- data.frame(a = rnorm(10), b = rnorm(10)) 
b <- data.frame(a = rnorm(5), c = rnorm(5)) 

# Stack data frames 
bind_rows(a, b) 

Source: local data frame [15 x 3] 

      a   b   c 
1 2.2891895 0.1940835   NA 
2 0.7620825 -0.2441634   NA 
3 1.8289665 1.5280338   NA 
4 -0.9851729 -0.7187585   NA 
5 1.5829853 1.6609695   NA 
6 0.9231296 1.8052112   NA 
7 -0.58-0.6928449   NA 
8 0.2033514 -0.6673596   NA 
9 -0.8576628 0.5163021   NA 
10 0.6296633 -1.2445280   NA 
11 2.1693068   NA -0.2556584 
12 -0.1048966   NA -0.3132198 
13 0.2673514   NA -1.1181995 
14 1.0937759   NA -2.5750115 
15 -0.8147180   NA -1.5525338

要在你的問題解決問題，你想選擇在主data.frame第一列。如果a是主版本data.frame，並且b包含要添加的數據，則可以使用dplyr中的select函數獲取您首先需要的列。

# Select all columns in b with the same names as in master data, a 
# Use select_() instead of select() to do standard evaluation. 
b <- select_(b, names(a)) 

# Combine 
bind_rows(a, b) 

Source: local data frame [15 x 2] 

      a   b 
1 2.2891895 0.1940835 
2 0.7620825 -0.2441634 
3 1.8289665 1.5280338 
4 -0.9851729 -0.7187585 
5 1.5829853 1.6609695 
6 0.9231296 1.8052112 
7 -0.58-0.6928449 
8 0.2033514 -0.6673596 
9 -0.8576628 0.5163021 
10 0.6296633 -1.2445280 
11 2.1693068   NA 
12 -0.1048966   NA 
13 0.2673514   NA 
14 1.0937759   NA 
15 -0.8147180   NA

來源

2015-12-14 21:53:08 ialm

試試這個：

library(plyr) # thanks to comment @ialm 
df <- data.frame(a=1:4,b=seq(2,8,2),c=seq(-2,-8,-2)) 
new <- data.frame(d=c(-120,-140),a=c(10,20),c=c(-20,40)) 

# we use %in% to pull the columns that are the same in the master 
# then we use rbind.fill to put in this dataframe below the master 
# filling any missing data with NA values 
res <- rbind.fill(df,new[,colnames(new) %in% colnames(df)]) 

> res 
    a b c 
1 1 2 -2 
2 2 4 -4 
3 3 6 -6 
4 4 8 -8 
5 10 NA -20 
6 20 NA 40

來源

2015-12-14 21:56:37

另一個選項是使用rbind.fill從plyr包

帶給您的樣本數據

toread <- " 
a b c 
1 2 -2 
2 4 -4 
3 6 -6 
4 8 -8" 
master <- read.table(textConnection(toread), header = TRUE) 
toread <- " 
d a c 
-120 10 -20 
-140 20 -40" 
to.append <- read.table(textConnection(toread), header = TRUE)

綁定數據

library(plyr) 
rbind.fill(master, to.append)

來源

2015-12-14 21:58:03 Wyldsoul

如果您使用'dplyr'，爲什麼不直接使用'bind_rows（）'？ – ialm

@ialm更仔細的閱讀，這個答案*不使用任何'dplyr'函數（它只是加載包）。值得注意的是，在加載'dplyr'後加載'plyr' *將會用'plyr'版本掩蓋'dplyr :: summarize'和'dplyr :: mutate'，不推薦使用。 – Gregor

@Gregor是的，我現在明白了。出於你在註釋中突出顯示的原因，如果你在'dplyr'後面加載'plyr'，我會發出一個警告，我相信Hadley建議你在'dplyr'之前加載'plyr'，如果你需要使用這兩個軟件包的話。 – ialm

這裏發佈的dplyr和plyr這兩個解決方案對於這項任務來說很自然，分別使用bind_rows和rbind.fill，儘管它也可以作爲基線R中的一行。基本上，我會循環第一個數據幀，抓取第二個數據幀的相應列，如果它存在或者返回所有NaN值。

rbind(A, sapply(names(A), function(x) if (x %in% names(B)) B[,x] else rep(NaN, nrow(B)))) 
#  a b c 
# r1 1 2 -2 
# r2 2 4 -4 
# r3 3 6 -6 
# r4 4 8 -8 
# 5 10 NaN -20 
# 6 20 NaN -40

來源

2015-12-14 22:51:43 josliber

如果某些列常見，則將數據框追加到主數據框中

回答

帶給您的樣本數據

綁定數據

相關問題