獲取最小分組由兩列的獨特組合

我想在R中實現的是如下：給定一個表格（在我的情況下是數據框） - 我想獲得每個獨特組合的最低價格兩個列。獲取最小分組由兩列的獨特組合

例如，下表給出：

+-----+-----------+-------+----------+----------+ 
| Key | Feature1 | Price | Feature2 | Feature3 | 
+-----+-----------+-------+----------+----------+ 
| AAA |   1 | 100 | whatever | whatever | 
| AAA |   1 | 150 | whatever | whatever | 
| AAA |   1 | 200 | whatever | whatever | 
| AAA |   2 | 110 | whatever | whatever | 
| AAA |   2 | 120 | whatever | whatever | 
| BBB |   1 | 100 | whatever | whatever | 
+-----+-----------+-------+----------+----------+

我想要的結果，看起來像：

+-----+-----------+-------+----------+----------+ 
| Key | Feature1 | Price | Feature2 | Feature3 | 
+-----+-----------+-------+----------+----------+ 
| AAA |   1 | 100 | whatever | whatever | 
| AAA |   2 | 110 | whatever | whatever | 
| BBB |   1 | 100 | whatever | whatever | 
+-----+-----------+-------+----------+----------+

所以我工作的線沿線的一個解決方案：

s <- lapply(split(data, list(data$Key, data$Feature1)), function(chunk) { 
     chunk[which.min(chunk$Price),]})

但結果是1 xn矩陣 - 所以我需要unsplit的結果。另外 - 它似乎很慢。我怎樣才能改進這個邏輯？我見過解決方案指向data.table包的方向。我應該使用該軟件包重新寫入嗎？

更新

偉大的答案傢伙 - 謝謝！但是 - 我的原始數據框包含更多的列（Feature2 ...），我需要它們在過濾之後全部返回。沒有最低的價格（關鍵/特徵1的組合）中的行可以丟棄，所以我不感興趣，他們的特徵2 /特點3

來源

2015-07-10 Jochen van Wylick

使用什麼邏輯值的其他列將被採取？例如，如果'Feature2'對於同一個key-feature1具有不同的值，哪個值必須包含在輸出中？ – nicola

屬於最低價格的價值。所以這件事需要作爲一個行過濾器。所以AAA-1，AAA-2，BBB-1的「無論」。其餘的行可以被丟棄。 –

值可以使用dplyr包：

library(dplyr) 

data %>% group_by(Key, Feature1) %>% 
     slice(which.min(Price))

來源

2015-07-10 15:22:47 jeremycg

偉大的作品 - 但我需要獲得結果中的所有列。我簡化了一下這個例子。實際上，數據中包含更多的列，這些列是我在結果中需要的。 –

好的，請參閱編輯 – jeremycg

既然你提到data.table包，我這裏提供一種使用包解決方案：

library(data.table) 
setDT(df)[,.(Price=min(Price)),.(Key, Feature1)] #initial question 
setDT(df)[,.SD[which.min(Price)],.(Key, Feature1)] #updated question

df是你的樣品data.frame。

更新：測試使用mtcars數據

df<-mtcars 
library(data.table) 
setDT(df)[,.SD[which.min(mpg)],by=am] 
    am mpg cyl disp hp drat wt qsec vs gear carb 
1: 1 15.0 8 301 335 3.54 3.57 14.60 0 5 8 
2: 0 10.4 8 472 205 2.93 5.25 17.98 0 3 4

來源

2015-07-10 15:24:28 user227710

鹼基r的解決辦法是aggregate(Price ~ Key + Feature1, data, FUN = min)

來源

2015-07-10 15:25:01 christoph

非常優雅 - 但我需要將所有列都返回到結果中。我簡化了一下這個例子。實際上，數據中包含更多的列，這些列是我在結果中需要的。 –

你的意思是你想在你的原始數據框中返回最小值？如果是這種情況，請使用'ave（data $ Price，data $ Key，data $ Feature，FUN = min）'。 – christoph

否 - 查看已更新的問題 - 我只想要最低值的行（對於Key + Feature1的唯一組合） - 但只包含所有原始值。我試過你的代碼，它只返回3列：Key，Feature1和Price - 但我也需要所有其他原始列。 –

使用R基本aggregate

> aggregate(Price~Key+Feature1, min, data=data) 
    Key Feature1 Price 
1 AAA  1 100 
2 BBB  1 100 
3 AAA  2 110

See this post其他辦法。

來源

2015-07-10 15:27:51

獲取最小分組由兩列的獨特組合

回答

相關問題