2015-12-21 55 views
2

查找每個ID的最常見類別的高效優雅的data.table語法是什麼?我保持指示NA位置(用於其他目的)的布爾矢量組的模式值(最常見)的簡明R data.table語法

dt = data.table(id=rep(1:2,7), category=c("x","y",NA)) 
print(dt) 

在該玩具實例中,忽略NA,xid==1yid==2常見類別。

回答

5

如果你想忽略NA的,你必須與!is.na(category),組由idcategoryby = .(id, category))首先排除,然後創建.N頻率變量:這給

dt[!is.na(category), .N, by = .(id, category)] 

id category N 
1: 1  x 3 
2: 2  y 3 
3: 2  x 2 
4: 1  y 2 

訂購此款id會給你一個更清晰的畫面:

dt[!is.na(category), .N, by = .(id, category)][order(id)] 

導致:

id category N 
1: 1  x 3 
2: 1  y 2 
3: 2  y 3 
4: 2  x 2 

如果你只是想這表明頂部結果行:

dt[!is.na(category), .N, by = .(id, category)][order(id, -N), head(.SD,1), by = id] 

或:

dt[!is.na(category), .N, by = .(id, category)][, .SD[which.max(N)], by = id] 

這兩個給:

id category N 
1: 1  x 3 
2: 2  y 3 
+0

這樣做有可能導致放棄只有NAs的組,也就是說,可能會將它們加入回來,或者在這種情況下加入它們'dt [!is.na(category)] [,.N,by =。(id,或者只給非NA的排序優先選擇:'[order(-N)] [。(unique(dt $ id)),on =。(id),.SD [1L],by = id] dt [,.N,by =。(id,category)] [order(is.na(category),-N),.SD [1L],by = id]' – Frank