2017-03-22 26 views
8

切我想這樣做位數切口(切成具有相等數量的點的n個二進制位)爲每個組位數由組中data.table

qcut = function(x, n) { 
    quantiles = seq(0, 1, length.out = n+1) 
    cutpoints = unname(quantile(x, quantiles, na.rm = TRUE)) 
    cut(x, cutpoints, include.lowest = TRUE) 
} 

library(data.table) 
dt = data.table(A = 1:10, B = c(1,1,1,1,1,2,2,2,2,2)) 
dt[, bin := qcut(A, 3)] 
dt[, bin2 := qcut(A, 3), by = B] 

dt 
A  B bin  bin2 
1: 1 1 [1,4] [6,7.33] 
2: 2 1 [1,4] [6,7.33] 
3: 3 1 [1,4] (7.33,8.67] 
4: 4 1 [1,4] (8.67,10] 
5: 5 1 (4,7] (8.67,10] 
6: 6 2 (4,7] [6,7.33] 
7: 7 2 (4,7] [6,7.33] 
8: 8 2 (7,10] (7.33,8.67] 
9: 9 2 (7,10] (8.67,10] 
10: 10 2 (7,10] (8.67,10] 

這裏不進行分組的切口是正確的 - 數據位於在垃圾桶裏。但是小組的結果是錯誤的。

我該如何解決這個問題?

+2

'DT [,qcut(A,3),由= B]'工作雖然 – Cath

回答

8

這是一個處理因素的錯誤。請檢查它是否已知(或在開發版本中修復),否則將其報告給data.table錯誤跟蹤器。

qcut = function(x, n) { 
    quantiles = seq(0, 1, length.out = n+1) 
    cutpoints = unname(quantile(x, quantiles, na.rm = TRUE)) 
    as.character(cut(x, cutpoints, include.lowest = TRUE)) 
} 

dt[, bin2 := qcut(A, 3), by = B] 
#  A B bin  bin2 
# 1: 1 1 [1,4] [1,2.33] 
# 2: 2 1 [1,4] [1,2.33] 
# 3: 3 1 [1,4] (2.33,3.67] 
# 4: 4 1 [1,4] (3.67,5] 
# 5: 5 1 (4,7] (3.67,5] 
# 6: 6 2 (4,7] [6,7.33] 
# 7: 7 2 (4,7] [6,7.33] 
# 8: 8 2 (7,10] (7.33,8.67] 
# 9: 9 2 (7,10] (8.67,10] 
#10: 10 2 (7,10] (8.67,10] 
+5

而不改變功能,'DT [,BIN2:(,3)qcut(A)中,由= as.character = B]'也可以,如果試圖將其轉換爲一個因子('dt [,bin2:= as.factor(as.character(qcut(A,3))),by = B]')將會拋出一個錯誤。 .. – Cath

+0

是的,如果你定義了每個組的因素,最後一列(組合)將僅僅從組1中獲得屬性(比如級別),我想https://github.com/Rdatatable/data.table/issues/ 967 – Frank