的R - 兩個數據幀連續比較行並返回一個值

我有以下兩個數據幀：的R - 兩個數據幀連續比較行並返回一個值

df1 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"), 
      temp=c("10","15","16","25","13","17","20","5","16","25","30","37")) 


df2 <- data.frame(period=c("1","1","1","1","1","1","1","1","2","2","2","2","2","2","3","3","3","3","3","3","3","3","3","3","3","3"), 
       max_temp=c("9","13","16","18","30","37","38","39","10","15","16","25","30","32","8","10","12","14","16","18","19","25","28","30","35","40"), 
       group=c("1","1","1","2","2","2","3","3","3","3","4","4","5","5","5","5","5","6","6","6","7","7","7","7","8","8"))

我想：

。連續的每一行，檢查是否df1中的month列中的值與列中的值df2,即df1$month == df2$period中的值匹配。
如果第1步是不正確的，即df1$month != df2$period，然後重複步驟1和df2下一行中與值比較df1的值，依此類推，直到df1$month == df2$period。
如果df1$month == df2$period，檢查是否在df1的temp列中的值小於或等於在max_temp柱的df2，即df1$temp <= df$max_temp。
如果df1$temp <= df$max_temp，在df2該行中返回值的group列，該值增加df1，在新的一列叫做"new_group"。
如果步驟3不是TRUE，即df1$temp > df$max_temp，然後返回到步驟1，並在與df1下一行df2比較同一行。

輸出數據幀我想的一個例子是：

df3 <- data.frame(month=c("1","1","1","1","2","2","2","3","3","3","3","3"), 
      temp=c("10","15","16","25","13","17","20","5","16","25","30","37"), 
      new_group=c("1","1","1","2","3","4","4","5","6","7","7","8"))

我已經與ifelse功能玩耍，需要一些幫助或重新方向。謝謝！

來源

2014-01-30 user3201532

你故意讓您的數據字符串？ –

數據文件實際上是製表符分隔的文本文件，我使用read.table將其上傳到R中作爲數據幀。作爲一名R新手，我不知道數據是字符串。 – user3201532

圍繞數字的引號告訴你你已經有了字符串。另外，要小心字符串冒充爲因素，你會得到'read.table（.... stringsAsFactors = TRUE）'（這很煩人的是默認值） –

我發現計算new_group的過程很難遵循所述。據我所知，您正在嘗試在df1中創建一個名爲new_group的變量。對於df1i排，new_group值是df2的group值的第一行是：

被索引i或更高
擁有period值匹配df1$month[i]
擁有max_temp值不低於比df1$temp[i]

我通過使用sapply來調用這個函數的df1行指數：

fxn = function(idx) { 
    # Potentially matching indices in df2 
    pm = idx:nrow(df2) 

    # Matching indices in df2 
    m = pm[df2$period[pm] == df1$month[idx] & 
     as.numeric(as.character(df1$temp[idx])) <= 
     as.numeric(as.character(df2$max_temp[pm]))] 

    # Return the group associated with the first matching index 
    return(df2$group[m[1]]) 
} 
df1$new_group = sapply(seq(nrow(df1)), fxn) 
df1 
# month temp new_group 
# 1  1 10   1 
# 2  1 15   1 
# 3  1 16   1 
# 4  1 25   2 
# 5  2 13   3 
# 6  2 17   4 
# 7  2 20   4 
# 8  3 5   5 
# 9  3 16   6 
# 10  3 25   7 
# 11  3 30   7 
# 12  3 37   8

來源

2014-01-30 03:36:32 josliber

感謝您的有用代碼。不計算new_group的值，而只是將df2 $ group的值放入df1 $ new_group列中。希望這更清楚。乾杯。 – user3201532

是的，我發佈的代碼執行此操作。看起來你對於SO非常陌生，並沒有接受你之前提出的問題所收到的良好答案。如果我的解決方案或@ RicardoSaporta的解決方案解決了您的問題，請記住通過選中綠色複選框來接受它。 – josliber

library(data.table) 
dt1 <- data.table(df1, key="month") 
dt2 <- data.table(df2, key="period") 

## add a row index 
dt1[, rn1 := seq(nrow(dt1))] 

dt3 <- 
unique(dt1[dt2, allow.cartesian=TRUE][, new_group := group[min(which(temp <= max_temp))], by="rn1"], by="rn1") 

## Keep only the columns you want 
dt3[, c("month", "temp", "max_temp", "new_group"), with=FALSE] 

    month temp max_temp new_group 
1:  1 1  19   1 
2:  1 3  19   1 
3:  1 4  19   1 
4:  1 7  19   1 
5:  2 2  1   3 
6:  2 5  1   3 
7:  2 6  1   4 
8:  3 10  18   5 
9:  3 4  18   5 
10:  3 7  18   5 
11:  3 8  18   5 
12:  3 9  18   5

來源

2014-01-30 04:39:56

的R - 兩個數據幀連續比較行並返回一個值

回答

相關問題