2014-02-23 56 views
0

我試圖找到使用tapply時的彙總結果的解釋。在以下示例中,因素「Reg2」的摘要統計信息是錯誤的。有人可以幫助我們理解這種行爲嗎?某些因素的tapply和錯誤彙總統計信息

> edf=data.frame(pri=c(8258, 14253, 11123, 11311), 
      reg=c("Reg1", "Reg2", "Reg2", "Reg1")) 
> tapply(edf$pri, edf$reg, sum) 
Reg1 Reg2 
19569 25376 
> tapply(edf$pri, edf$reg, length) 
Reg1 Reg2 
    2 2 
> tapply(edf$pri, edf$reg, mean) 
    Reg1 Reg2 
9784.5 12688.0 
> tapply(edf$pri, edf$reg, min) 
Reg1 Reg2 
8258 11123 
> tapply(edf$pri, edf$reg, summary) 
$Reg1 
    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    8258 9021 9784 9784 10550 11310 

$Reg2 
    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    11120 11910 12690 12690 13470 14250 

> by(edf$pri, edf$reg, summary) 
edf$reg: Reg1 
    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    8258 9021 9784 9784 10550 11310 

edf$reg: Reg2 
    Min. 1st Qu. Median Mean 3rd Qu. Max. 
    11120 11910 12690 12690 13470 14250 
> do.call("rbind",tapply(edf$pri, edf$reg, summary)) 
     Min. 1st Qu. Median Mean 3rd Qu. Max. 
Reg1 8258 9021 9784 9784 10550 11310 
Reg2 11120 11910 12690 12690 13470 14250 
> str(edf) 
'data.frame': 4 obs. of 2 variables: 
$ pri: num 8258 14253 11123 11311 
$ reg: Factor w/ 2 levels "Reg1","Reg2": 1 2 2 1 

回答

1

?summary

digits: integer, used for number formatting with ‘signif()’ (for 
      ‘summary.default’) or ‘format()’ (for ‘summary.data.frame’). 


tapply(edf$pri, edf$reg, summary, digits = 42) 

## $Reg1 
##  Min. 1st Qu. Median  Mean 3rd Qu.  Max. 
## 8258.00 9021.25 9784.50 9784.50 10547.75 11311.00 

## $Reg2 
## Min. 1st Qu. Median Mean 3rd Qu. Max. 
## 11123.0 11905.5 12688.0 12688.0 13470.5 14253.0 
+0

因此,我們需要使用總結照顧!使用它時,我們不需要設置數字參數。謝謝! – Robert