2016-04-19 118 views
0

我首次使用R。我有以下的數據集(即我其實跟工作一個非常大的數據集的樣機):獲取R中的數據計數

Type  Date   Size  Color 
L shape 2008-04-14 161 blue  
L shape 2010-10-16 654 yellow 
L shape 2005-07-03 149 blue 
L shape 2006-08-16 657 yellow 
L shape 2007-04-08 229 yellow 
L shape 2004-03-17 784 green 
Y shape 2014-02-22 917 pink 
Y shape 2012-05-04 186 green 
Y shape 2006-11-25 641 yellow 
Y shape 2015-09-07 493 blue 
Y shape 2011-07-06 953 green 

我想找回每種顏色的occurrances的數量爲每個類型,日期爲每種類型以及每種類型的尺寸的最小值,最大值和平均值。輸出應該是這樣的:

Type  Colors Dates   Mean Size Min Size Max Size 
L shape  3   2008-04-14 439   149   784 
       2010-10-16   
       2005-07-03   
       2006-08-16   
       2007-04-08   
       2004-03-17   

Y shape  4   2014-02-22 638   186   953 
       2012-05-04   
       2006-11-25   
       2015-09-07   
       2011-07-06   

這是我迄今所做的:

cat <- big_catalog 

na.rm = TRUE 

library(plyr) 

mydata <-ddply(cat, c("Type", "Date", "Size", "Color"), summarize, 
       Colors = length(Color), 
       Dates = (Date), 
       Mean_Size = mean(Size), 
       Minimum_Size = min(Size), 
       Maximum_Size = max(Size) 
) 

但我結束了這一點:

Type Date Size Color Colors Dates Mean Size Min Size Max Size 
L shape 2008-04-14 161 blue 2 2008-04-14 161 161 161 
L shape 2010-10-16 654 yellow 3 2010-10-16 654 654 654 
L shape 2005-07-03 149 blue 2 2005-07-03 149 149 149 
L shape 2006-08-16 657 yellow 3 2006-08-16 657 657 657 
L shape 2007-04-08 229 yellow 2 2007-04-08 229 229 229 
L shape 2004-03-17 784 green 1 2004-03-17 784 784 784 
Y shape 2014-02-22 917 pink 1 2014-02-22 917 917 917 
Y shape 2012-05-04 186 green 2 2012-05-04 186 186 186 
Y shape 2006-11-25 641 yellow 1 2006-11-25 641 641 641 
Y shape 2015-09-07 493 blue 1 2015-09-07 493 493 493 
Y shape 2011-07-06 953 green 2 2011-07-06 953 953 953 

我顯然需要循環這個,但我對R很新,我不知道該怎麼做。

+1

由每列,只是組由'Type'列不羣。 (因爲你希望一切都按「每種類型」完成)。儘管你對'Date'的要求是多行的,其他所有行都是單行復雜的事情... – Gregor

回答

0

像....

df <- read.table(text= 
"Type  Date   Size  Color 
Lshape 2008-04-14 161 blue  
Lshape 2010-10-16 654 yellow 
Lshape 2005-07-03 149 blue 
Lshape 2006-08-16 657 yellow 
Lshape 2007-04-08 229 yellow 
Lshape 2004-03-17 784 green 
Yshape 2014-02-22 917 pink 
Yshape 2012-05-04 186 green 
Yshape 2006-11-25 641 yellow 
Yshape 2015-09-07 493 blue 
Yshape 2011-07-06 953 green", header=TRUE) 

by(df, df$Type, function(x){ 
    data.frame(Colors = length(unique(x$Color)), 
      Dates = paste(x$Date, collapse=";"), 
      Mean.size = mean(x$Size), 
      Min.size = min(x$Size), 
      Max.size = max(x$Size)) 
}) 
+0

非常感謝,這確實有幫助。 – Mike