2014-01-30 66 views
0

我有以下兩個數據幀:的R - 兩個數據幀連續比較行並返回一個值

df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"), 
      temp=c("10","15","16","25","13","17","20","5","16","25","30","37")) 


df2 <- data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"), 
       max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"), 
       group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8")) 

我想:

  1. 。連續的每一行,檢查是否df1中的month列中的值與列中的值df2,df1$month == df2$period中的值匹配。

  2. 如果第1步是不正確的,即df1$month != df2$period,然後重複步驟1和df2下一行中與值比較df1的值,依此類推,直到df1$month == df2$period

  3. 如果df1$month == df2$period,檢查是否在df1temp列中的值小於或等於在max_temp柱的df2df1$temp <= df$max_temp

  4. 如果df1$temp <= df$max_temp,在df2該行中返回值的group列,該值增加df1,在新的一列叫做"new_group"

  5. 如果步驟3不是TRUE,即df1$temp > df$max_temp,然後返回到步驟1,並在與df1下一行df2比較同一行。

輸出數據幀我想的一個例子是:

df3 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"), 
      temp=c("10","15","16","25","13","17","20","5","16","25","30","37"), 
      new_group=c("1","1","1","2","3","4","4","5","6","7","7","8")) 

我已經與ifelse功能玩耍,需要一些幫助或重新方向。謝謝!

+0

你故意讓您的數據字符串? –

+0

數據文件實際上是製表符分隔的文本文件,我使用read.table將其上傳到R中作爲數據幀。作爲一名R新手,我不知道數據是字符串。 – user3201532

+0

圍繞數字的引號告訴你你已經有了字符串。另外,要小心字符串冒充爲因素,你會得到'read.table(.... stringsAsFactors = TRUE)'(這很煩人的是默認值) –

回答

1

我發現計算new_group的過程很難遵循所述。據我所知,您正在嘗試在df1中創建一個名爲new_group的變量。對於df1i排,new_group值是df2group值的第一行是:

  1. 被索引i或更高
  2. 擁有period值匹配df1$month[i]
  3. 擁有max_temp值不低於比df1$temp[i]

我通過使用sapply來調用這個函數的df1行指數:

fxn = function(idx) { 
    # Potentially matching indices in df2 
    pm = idx:nrow(df2) 

    # Matching indices in df2 
    m = pm[df2$period[pm] == df1$month[idx] & 
     as.numeric(as.character(df1$temp[idx])) <= 
     as.numeric(as.character(df2$max_temp[pm]))] 

    # Return the group associated with the first matching index 
    return(df2$group[m[1]]) 
} 
df1$new_group = sapply(seq(nrow(df1)), fxn) 
df1 
# month temp new_group 
# 1  1 10   1 
# 2  1 15   1 
# 3  1 16   1 
# 4  1 25   2 
# 5  2 13   3 
# 6  2 17   4 
# 7  2 20   4 
# 8  3 5   5 
# 9  3 16   6 
# 10  3 25   7 
# 11  3 30   7 
# 12  3 37   8 
+0

感謝您的有用代碼。不計算new_group的值,而只是將df2 $ group的值放入df1 $ new_group列中。希望這更清楚。乾杯。 – user3201532

+0

是的,我發佈的代碼執行此操作。看起來你對於SO非常陌生,並沒有接受你之前提出的問題所收到的良好答案。如果我的解決方案或@ RicardoSaporta的解決方案解決了您的問題,請記住通過選中綠色複選框來接受它。 – josliber

1
library(data.table) 
dt1 <- data.table(df1, key="month") 
dt2 <- data.table(df2, key="period") 

## add a row index 
dt1[, rn1 := seq(nrow(dt1))] 

dt3 <- 
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1") 

## Keep only the columns you want 
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE] 

    month temp max_temp new_group 
1:  1 1  19   1 
2:  1 3  19   1 
3:  1 4  19   1 
4:  1 7  19   1 
5:  2 2  1   3 
6:  2 5  1   3 
7:  2 6  1   4 
8:  3 10  18   5 
9:  3 4  18   5 
10:  3 7  18   5 
11:  3 8  18   5 
12:  3 9  18   5