如何按組添加唯一值到R data.frame

我希望通過對第二個變量進行分組來統計唯一值的數量，然後將計數添加到現有data.frame作爲新列。例如，如果現有的數據幀是這樣的：如何按組添加唯一值到R data.frame

color type 
1 black chair 
2 black chair 
3 black sofa 
4 green sofa 
5 green sofa 
6 red sofa 
7 red plate 
8 blue sofa 
9 blue plate 
10 blue chair

我要添加每個color，存在於數據的唯一types計數：

color type unique_types 
1 black chair   2 
2 black chair   2 
3 black sofa   2 
4 green sofa   1 
5 green sofa   1 
6 red sofa   2 
7 red plate   2 
8 blue sofa   3 
9 blue plate   3 
10 blue chair   3

我希望使用ave，但似乎無法找到不需要很多行的直接方法。我有> 100,000行，所以我也不確定效率有多重要。

這有點類似於這樣的問題：Count number of observations/rows per group and add result to data frame

來源

2013-07-02 Bryan

使用ave（因爲你問它明確）：

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

確保type是字符向量，而不是因素。

既然你也說你的數據是巨大的，因此這樣的速度/性能可能是一個因素，我建議一個data.table解決方案，以及。

require(data.table) 
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+ 
# if you don't want df to be modified by reference 
ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN在v1.9.6實現，並且是一個更快的等效的length(unique(.))。另外它還可以與data.frames/data.tables一起使用。

其他的解決方案：

使用plyr：

require(plyr) 
ddply(df, .(color), mutate, count = length(unique(type)))

使用aggregate：

agg <- aggregate(data=df, type ~ color, function(x) length(unique(x))) 
merge(df, agg, by="color", all=TRUE)

來源

2013-07-02 09:24:36 Arun

下面是與dplyr包的解決方案 - 它有n_distinct()爲length(unique())包裝。

df %>% 
    group_by(color) %>% 
    mutate(unique_types = n_distinct(type))

來源

2015-04-27 12:50:35

這也可以在向量化由組操作通過與table或tabulate

如果df$color組合unique實現而不是factor，然後

要麼

table(unique(df)$color)[as.character(df$color)] 
# black black black green green red red blue blue blue 
# 2  2  2  1  1  2  2  3  3  3

或者

tabulate(unique(df)$color)[as.integer(df$color)] 
# [1] 2 2 2 1 1 2 2 3 3 3

如果df$color是character然後就

table(unique(df)$color)[df$color]

如果df$color是integer然後就

tabulate(unique(df)$color)[df$color]

來源

2016-03-24 11:27:57

如何按組添加唯一值到R data.frame

回答

相關問題