我有2個數據幀合併凌亂dataframes [R
df1=data.frame(Col1=c('2','4','CN','CANADA',NA),Col2=c('s1','s2','s3','s4','s5'))
> df1
Col1 Col2
1 2 s1
2 4 s2
3 CN s3
4 CANADA s4
5 <NA> s5
df2=data.frame(index=1:5,code=c('AB','CA','US','CN','UK'),name=c('ALBERTA','CANADA','USA','CHINA','UK'),REGION=c('NA','NA','NA','FE','EU'))
> df2
index code name REGION
1 1 AB ALBERTA NA
2 2 CA CANADA NA
3 3 US USA NA
4 4 CN CHINA FE
5 5 UK UK EU
我想
df3=data.frame(df1,code=c('CA','CN','CN','CA',NA),name=c('CANADA','CHINA','CHINA','CANADA',NA),REGION=c('NA','FE','FE','NA',NA))
Col1 Col2 code name REGION
1 2 s1 CA CANADA NA
2 4 s2 CN CHINA FE
3 CN s3 CN CHINA FE
4 CANADA s4 CA CANADA NA
5 <NA> s5 <NA> <NA> <NA>
我已經值調用它:
df1$code=df2[df2$index[df1$Col1],2]
填補它在不正確,與合併兩次
m1=merge(df1,df2,by.x='Col1',by.y='index',all.x=TRUE)
m2=merge(m1,df2,by.x='Col1',by.y='name',all.x=1)
我相信我在這裏失去了一些東西。感謝您的幫助
哦,是的,我的數據包含〜500k行和45列,但這是它的基礎知識 – alex