2016-11-13 73 views
0

我從這樣頻率數據表中的多個列

require(data.table) 
dt <- data.table(a= c("a","a","b","b","b"), b= c("a","a","c","c","e"), c=c("d","d","b","b","b")) 

我想計數頻率數據表中的所有列。我知道如何一個接一個地做,但我想用一條指令來做,因爲我的數據有很多列。

結果必然是這一個:

dt[,a1:=.N, by = c("a")] 
dt[,a2:=.N, by = c("b")] 
dt[,a3:=.N, by = c("c")] 
+1

使用'爲()'循環。 –

+0

@RichScriven你可以給我看一個例子。 –

+3

嘗試'nm1 < - paste0(「a」,seq_along(dt)); for(j in seq_along(dt)){[nm1 [j]:= .N,by = c(names(dt)[ j])] }' – akrun

回答

-1
require(data.table) 
dt <- data.table(a= c("a","a","b","b","b"), 
       b= c("a","a","c","c","e"), 
       c=c("d","d","b","b","b")) 
#dt 
# a b c 
#1: a a d 
#2: a a d 
#3: b c b 
#4: b c b 
#5: b e b 

l=lapply(seq_along(colnames(dt)), 
     function(i) dt[,eval(colnames(dt)[i]),with=F][, x:=.N,by=eval(colnames(dt)[i])]) 
#l 
#[[1]] 
# a x 
#1: a 2 
#2: a 2 
#3: b 3 
#4: b 3 
#5: b 3 

#[[2]] 
# b x 
#1: a 2 
#2: a 2 
#3: c 2 
#4: c 2 
#5: e 1 

#[[3]] 
# c x 
#1: d 2 
#2: d 2 
#3: b 3 
#4: b 3 
#5: b 3 


df = as.data.frame(l) 

# replacing alternate column names with concatenating "_count" to it 
colnames(df)[seq(2,length(colnames(df)),2)]= 
paste0(colnames(df)[seq(1,length(colnames(df)),2)],"_count") 

#df 
# a a_count b b_count c c_count 
#1 a  2 a  2 d  2 
#2 a  2 a  2 d  2 
#3 b  3 c  2 b  3 
#4 b  3 c  2 b  3 
#5 b  3 e  1 b  3