2013-01-20 21 views
4

如何按R中的組計算訂單統計信息我想根據列彙總結果,然後每個組只返回1行。根據某種順序,該行應該是該組的第n個元素。理想情況下,我只想使用基函數。如何在R中按組創建訂單統計?

x <- data.frame(Group=c("A","A", "A", "C", "C"), 
       Name=c("v", "u", "w", "x", "y"), 
       Quantity=c(3,3,4,2,0)) 
> x 
    Group Name Quantity 
1  A v  3 
2  A u  3 
3  A w  4 
4  C x  2 
5  C y  0 

我想根據數量和名稱的順序取第n位。對於N = 2這是

Group Name Quantity 
1  A u  3 
5  C y  0 

For N=1 
    Group Name Quantity 
3  A w  4 
4  C x  2 

我試過以下,但我得到一個無形的錯誤消息。

aggregate.data.frame(x, list(x$Group), function(y){ max(y[,'Quantity'])}) 
Error in `[.default`(y, , "Quantity") (from #1) : incorrect number of dimensions" 

回答

0

我去

do.call(rbind, by(x, x$Group, function(x) 
     x[order(-x$Quantity, x$Name),][1,])) 

每別人的建議。我發現它適合我的思維過程比其他發佈的解決方案好一點(我很欣賞)。

1

一些集合合併魔術:

f <- function(x, N) { 
    sel <- function(x) {         # Choose the N-th highest value from the set, or lowest element if there < N unique elements. Is there a built-in for this? 
    z <- unique(x)          # This assums that you wan the N-th highest unique value. Simply don't filter by unique if not. 
    z[order(z, decreasing=TRUE)][min(N, length(z))] 
    } 

    xNq <- aggregate(Quantity ~ Group, data=x, sel)  # Choose the N-th highest quantity within each "Group" 
    xNm <- merge(x, xNq)         # Add the matching "Name" values 
    x <- aggregate(Name ~ Quantity + Group, data=xNm, sel) # Choose the N-th highest Name in each group 
    x[c('Group', 'Name', 'Quantity')]      # Put into original order 
} 


> f(x, 2) 
## Group Name Quantity 
## 1  A u  3 
## 2  C y  0 

> f(x, 1) 
## Group Name Quantity 
## 1  A w  4 
## 2  C x  2 
2
x <- 
    data.frame(
     Group = c("A","A", "A", "C", "C", "A", "A") , 
     Name = c("v", "u", "w", "x", "y" ,"v", "u") , 
     Quantity = c(3,3,4,2,0,4,1) 
    ) 

# sort your data to start.. 
# note that Quantity vs. Group and Name 
# are sorted in different directions, 
# so the -as.numeric() flips them 
x <- 
    x[ 
     order( 
      -as.numeric(x$Group) , 
      x$Quantity , 
      -as.numeric(x$Name) , 
      decreasing = TRUE 
     ) , 
    ] 
# once your data frame is sorted the way you want your Ns to occur, the rest is easy 

# rank your data.. 
# just create the numerical order, 
# but within each group.. 
# (or you could add those ranks directly to the data frame if you like) 
ranks <- 
    unlist( 
     tapply( 
      order(x$Group) , 
      as.numeric(x$Group) , 
      order 
     ) 
    ) 

# N = 1 
x[ ranks == 1 , ] 

# N = 2 
x[ ranks == 2 , ] 
+0

我認爲你的'N'和'等級'應該同意。 'x [ranking == 2,] $ Name'根據需要返回c('v','y')'而不是'c('u','y')'。起初,我陷入了同樣的陷阱。 –

+0

@MatthewLundberg良好的呼叫:) thx –

+0

通過編輯,您將獲取每個組內的「Name」的最小值,這對於示例來說恰好是正確的,因爲rank中只有一個「Name」值1例,但一般不正確。 –

1
# define ordering function, increasing on Quantity, decreasing on Name 
in.order <- function(group) with(group, group[order(Quantity, -rank(Name)), ]) 

# set desired rank for each Group 
N <- 2 

# get Nth row by Group, according to in.order 
group.rows <- by(x, x$Group, function(group) head(tail(in.order(group), N), 1)) 

# collapse rows into data.frame 
do.call(rbind, group.rows) 

# Group Name Quantity 
# A  A u  3 
# C  C y  0 

你看到的錯誤與aggregate.data.frame的原因是因爲該功能根據by在每個柱適用FUN,參數,而不是完整的data.frame的每個子集(這就是by函數的用途,如上所示) 。使用aggregate時,無論您提供給FUN的是應接受的列,而是data.frame s。在你的例子中,你試圖索引向量ydata.frame,因此尺寸錯誤。

+0

+1這是最簡單的解決方案!我可能會建議在in.order函數中爲dec/inc順序添加一個參數.. – agstudy

+0

@agstudy這是一個有效的建議。如果我自己使用這個,我肯定會這麼做。儘管如此,爲了簡潔起見,我將保持原樣。 –