2016-01-13 30 views
0

我想知道我怎麼可以使用循環功能來計算應用功能爲每個分組

apply(table(data$people,data$event),2,function(x) mean(x[x>0])) 

對於顏色的每個級別。我的意思是,我想爲Color的每個級別計算上述函數。

people <-c("R1","R2","R2","R3","R3","R4","R4","R4","R4","R3","R3","R3","R3","R2","R2","R2","R5","R6") 
event<-c("a","b","b","M","s","f","y","b","a","a","a","a","s","c","c","b","m","a") 
Colour<-c("red","blue","green","pink","red","blue","grean","red","red","black","pink","blue","blue","green","blue","green","green","red") 

data<-data.frame(people,event,Colour) 
+1

由於此問題與算法設計無關,因此請留下'算法'標籤。 – Gregor

+1

你想要的輸出是什麼?你想做什麼並不是很清楚。 –

+0

讓我試着把話放在嘴裏,然後告訴我我是否正確:對於每一個'Colour',你想要計算每個'event'處的'people'的數量,並將其總結爲平均數'全體*參加*活動的人員(平均包括非零出勤率)。是嗎? – Gregor

回答

0

做你的功能,每個組,讓我們先讓它的功能:

your_function = function(data) { 
    apply(table(data$people,data$event),2,function(x) mean(x[x>0])) 
} 

然後我們就可以通過顏色多達分割你的數據和應用的功能,每個子數據幀:

dat_split = split(data, f = data$Colour) 
results = lapply(dat_split, your_function) 

results 
# $black 
# a b c f m M s y 
# 1 NaN NaN NaN NaN NaN NaN NaN 
# 
# $blue 
# a b c f m M s y 
# 1 1 1 1 NaN NaN 1 NaN 
# 
# $grean 
# a b c f m M s y 
# NaN NaN NaN NaN NaN NaN NaN 1 
# ... 

就我個人而言,我不覺得這非常友好。 data.tabledplyr使數據框的子集容易處理。我會從一開始就使用dplyr,如下所示:

library(dplyr) 
data %>% group_by(people, Colour, event) %>% 
    summarize(n = n()) %>% 
    group_by(Colour, event) %>% 
    summarize(mean = mean(n)) %>% 
    tidyr::spread(key = event, value = mean) 

# Source: local data frame [6 x 9] 
# 
# Colour  a  b  c  f  m  M  s  y 
# (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) 
# 1 black  1 NA NA NA NA NA NA NA 
# 2 blue  1  1  1  1 NA NA  1 NA 
# 3 grean NA NA NA NA NA NA NA  1 
# ... 
+2

如果您在第一個版本的'results'中使用'sapply'而不是'lapply',那麼您將獲得更好看的表格。 – alistaire

+0

@Gregor,另一個問題是,當我將第一個解決方案應用於我的數據集時,它的工作原理是錯誤的,但是第二個解決方案出現此錯誤:錯誤:所有列必須命名爲 有關此問題的任何想法? – shoorideh

+0

之前沒有看到過這個錯誤。你所有的專欄都有名字嗎? – Gregor