2016-09-20 90 views
1

我一直在瘋狂的東西基本...[R計數和列表每列的唯一行滿足條件

我想記數和列表以逗號分隔列中的每個唯一的ID上來的數據框架,如:

df<-data.frame(id = as.character(c("a", "a", "a", "b", "c", "d", "d", "e", "f")), x1=c(3,1,1,1,4,2,3,3,3), 
x2=c(6,1,1,1,3,2,3,3,1), 
x3=c(1,1,1,1,1,2,3,3,2)) 

> > df 
    id x1 x2 x3 
1 a 3 6 1 
2 a 1 1 1 
3 a 1 1 1 
4 b 1 1 1 
5 c 4 3 1 
6 d 1 2 2 
7 d 3 3 3 
8 e 1 3 3 
9 f 3 1 2 

我想獲得唯一ID的數量滿足條件,> 1:

res = data.frame(x1_counts =5, x1_names="a,c,d,e,f", x2_counts = 4, x2_names="a,c,d,f", x3_counts = 3, x3_names="d,e,f") 

> res 
    x1_counts x1_names x2_counts x2_names x3_counts x3_names 
1   5 a,c,d,e,f   4 a,c,d,f   3 d,e,f 

我試圖與data.table但似乎很令人費解,即 DT = as.data.table(df) res <-DT [,list(x1 = length(unique(id [which(x1> 1)])),x2 = length(unique(id [which(x2> 1)]))),通過= ID)

但我不能得到它的權利,我不會得到我需要做的data.table,因爲它不是一個真正的分組,我正在尋找。你能指導我走正確的道路嗎?非常感謝!

回答

2

可以重塑你的數據,以長格式,然後做摘要:

library(data.table) 
(melt(setDT(df), id.vars = "id")[value > 1] 
    [, .(counts = uniqueN(id), names = list(unique(id))), variable]) 
    # You can replace the list to toString if you want a string as name instead of list 

# variable counts  names 
#1:  x1  5 a,c,d,e,f 
#2:  x2  4 a,c,d,e 
#3:  x3  3  d,e,f 

爲了得到你所需要的東西,重塑回寬幅:

dcast(1~variable, 
     data = (melt(setDT(df), id.vars = "id")[value > 1] 
       [, .(counts = uniqueN(id), names = list(unique(id))), variable]), 
     value.var = c('counts', 'names')) 

# . counts_x1 counts_x2 counts_x3 names_x1 names_x2 names_x3 
# 1: .   5   4   3 a,c,d,e,f a,c,d,e d,e,f 
+1

謝謝!!!! !我太遙遠了,沒想到我不得不將數據融化!再次感謝Psidon! – user971102