2016-10-28 95 views
0

我有2個數據幀合併凌亂dataframes [R

df1=data.frame(Col1=c('2','4','CN','CANADA',NA),Col2=c('s1','s2','s3','s4','s5')) 
> df1 
Col1 Col2 
1  2 s1 
2  4 s2 
3  CN s3 
4 CANADA s4 
5 <NA> s5 
df2=data.frame(index=1:5,code=c('AB','CA','US','CN','UK'),name=c('ALBERTA','CANADA','USA','CHINA','UK'),REGION=c('NA','NA','NA','FE','EU')) 
> df2 
index code name REGION 
1  1 AB ALBERTA  NA 
2  2 CA CANADA  NA 
3  3 US  USA  NA 
4  4 CN CHINA  FE 
5  5 UK  UK  EU 

我想

df3=data.frame(df1,code=c('CA','CN','CN','CA',NA),name=c('CANADA','CHINA','CHINA','CANADA',NA),REGION=c('NA','FE','FE','NA',NA)) 
    Col1 Col2 code name REGION 
1  2 s1 CA CANADA  NA 
2  4 s2 CN CHINA  FE 
3  CN s3 CN CHINA  FE 
4 CANADA s4 CA CANADA  NA 
5 <NA> s5 <NA> <NA> <NA> 

我已經值調用它:

df1$code=df2[df2$index[df1$Col1],2] 

填補它在不正確,與合併兩次

m1=merge(df1,df2,by.x='Col1',by.y='index',all.x=TRUE) 
m2=merge(m1,df2,by.x='Col1',by.y='name',all.x=1) 

我相信我在這裏失去了一些東西。感謝您的幫助

+0

哦,是的,我的數據包含〜500k行和45列,但這是它的基礎知識 – alex

回答

1

也許不是一個很好的解決方案,但它適用於這個例子:

ind <- sapply(df1$Col1, function(x)which(df2[,c("index", "code", "name")] == as.character(x),arr.ind = T)[1]) 
cbind(df1, df2[ind,]) 
     Col1 Col2 index code name REGION 
2  2 s1  2 CA CANADA  NA 
4  4 s2  4 CN CHINA  FE 
4.1  CN s3  4 CN CHINA  FE 
2.1 CANADA s4  2 CA CANADA  NA 
NA <NA> s5 NA <NA> <NA> <NA> 
+0

此解決方案工作出色!我修改了我的整個數據集。謝謝! :) – alex

-1

據我瞭解這個問題,DF1的Col1中包含混合的信息。所以我的方法是分開不同的數據類型。那麼它應該很容易正確合併。

chr <- as.character(df1$Col1) 

index_df1 <- chr 
index_df1[!grepl("^[0-9]*$", chr)] <- NA 
index_df1 <- as.numeric(index_df1) 

code_df1 <- chr 
code_df1[!grepl("^[A-Z]{2}$", chr)] <- NA 

name_df1 <- chr 
name_df1[!grepl("^[A-Z]{3,}$", chr)] <- NA 

df1 <- data.frame(df1, index_df1, code_df1, name_df1)