2017-03-06 51 views
0

我有2個dataframes這樣合併2個數據幀返回匹配測量名稱

ID <- c("A","B","C") 
Type <- c("PASS","PASS","FAIL") 
Measurement <- c("Length","Height","Breadth") 
Function <- c("Volume","Area","Circumference") 
df1 <- data.frame(ID,Type,Measurement,Function) 

ID <- c("A","B","C","C") 
Type <- c("PASS","PASS","FAIL","FAIL") 
Measurement <- c("Length","Height","Breadth","Breadth_DSPT") 
df2 <- data.frame(ID,Type,Measurement) 

我想,它返回匹配測量,還返回行的方式來合併這2個數據幀的所有行是具有由另一個字符串連接的匹配度量。

所需的輸出

ID Type Measurement  Function 
    A PASS  Length  Volume 
    B PASS  Height   Area 
    C FAIL  Breadth Circumference 
    C FAIL Breadth_DSPT Circumference 

我使用合併功能類似這樣拿到第3行,但我們如何匹配數據幀中的測量名稱去返回匹配的所有行?

df <- merge(df1,df2,by=c("ID","Type","Measurement"),all.x=T) 
+0

好像你試圖合併基於部分匹配的數據框?看看是否有幫助:http://stackoverflow.com/questions/10617377/merge-data-with-partial-match-in-r –

+0

如果我理解正確,你需要使用模糊連接:https:// cran。 r-project.org/web/packages/fuzzyjoin/fuzzyjoin.pdf,因爲「寬度」和「寬度_DSPT」是相似的,但不相同。 – Mislav

+0

串聯總是隻是「string_newpart」? – thelatemail

回答

3

一個實現它的方式是使用sqldf包:

library(sqldf) 

sqldf("select df1.ID, df1.Type, df2.Measurement, df1.Function 
     from df1 left join df2 on (df1.ID = df2.ID and 
           df1.Type = df2.Type and 
           df2.Measurement like df1.Measurement||'%')") 

# ID Type Measurement  Function 
# 1 A PASS  Length  Volume 
# 2 B PASS  Height   Area 
# 3 C FAIL  Breadth Circumference 
# 4 C FAIL Breadth_DSPT Circumference 

在加入(df2.Measurement like df1.Measurement||'%')的最後一句是指df2$Measurement必須等於df1$Measurement後跟任意字符串,但您可以使用SQL的%_指定更靈活的條件。

2

如果你只是對字符串的結尾串聯,你可以這樣做:

merge(
    transform(df2, tmpmeas = sub("_.+$", "", Measurement)), 
    df1, 
    by.x=c("ID","Type","tmpmeas"), by.y=c("ID","Type","Measurement") 
)[-3] 
# ID Type Measurement  Function 
#1 A PASS  Length  Volume 
#2 B PASS  Height   Area 
#3 C FAIL  Breadth Circumference 
#4 C FAIL Breadth_DSPT Circumference 
-1

您可以使用data.table庫來做到這一點。首先將您的dataframe轉換爲datatable,使用setkey然後merge設置每個表的密鑰。

dt1 <- data.table(df1) 
dt2 <- data.table(df2) 
setkey(dt1,ID) 
setkey(dt2,ID) 
merge(dt1,dt2) 

# ID Type.x Measurement.x  Function Type.y Measurement.y 
# 1: A PASS  Length  Volume PASS  Length 
# 2: B PASS  Height   Area PASS  Height 
# 3: C FAIL  Breadth Circumference FAIL  Breadth 
# 4: C FAIL  Breadth Circumference FAIL Breadth_DSPT 
+0

您的解決方案僅適用於此特定示例,不適用於一般情況。 – Scarabee