2014-02-06 35 views
1

如何找到女性的平均值/中位數(任何其他此類事物)?我已經嘗試了幾段代碼來訪問女性數據,但沒有成功。任何幫助真的很感激。如何在R數據框中有條件地選擇列

> jalal <- read.csv("jalal.csv", header=TRUE,sep=",") 
> which(jalal$sex==F) 
integer(0) 
> jalal 
    age sex weight eye.color hair.color 
1 23 F 93.8  blue  black 
2 21 M 180.8  amber  gray 
3 22 F 196.5  hazel  gray 
4 22 M 256.2  amber  black 
5 21 M 219.6  blue  gray 
6 16 F 152.1  blue  gray 
7 21 F 183.3  gray chestnut 
8 18 M 179.1  brown  blond 
9 15 M 206.1  blue  white 
10 19 M 211.6  brown  blond 
11 20 F 209.4  blue  white 
12 21 M 194.0  brown  auburn 
13 22 F 204.1  green  black 
14 21 F 157.4  hazel  red 
15 15 F 238.0  green  gray 
16 20 F 154.8  gray  gray 
17 16 F 245.8  gray  gray 
18 23 M 198.2  gray  red 
19 19 M 169.1  green  brown 
20 24 M 198.0  green  gray 
> subset(jalal, subset=(sex =F)) -> females 
> females 
[1] age  sex  weight  eye.color hair.color 
<0 rows> (or 0-length row.names) 
> subset(jalal, subset=(sex ==F)) -> females 
> females 
[1] age  sex  weight  eye.color hair.color 
<0 rows> (or 0-length row.names) 

這裏是什麼在jalal.csv:

"age","sex","weight","eye.color","hair.color" 
23,"F",93.8,"blue","black" 
21,"M",180.8,"amber","gray" 
22,"F",196.5,"hazel","gray" 
22,"M",256.2,"amber","black" 
21,"M",219.6,"blue","gray" 
16,"F",152.1,"blue","gray" 
21,"F",183.3,"gray","chestnut" 
18,"M",179.1,"brown","blond" 
15,"M",206.1,"blue","white" 
19,"M",211.6,"brown","blond" 
20,"F",209.4,"blue","white" 
21,"M",194,"brown","auburn" 
22,"F",204.1,"green","black" 
21,"F",157.4,"hazel","red" 
15,"F",238,"green","gray" 
20,"F",154.8,"gray","gray" 
16,"F",245.8,"gray","gray" 
23,"M",198.2,"gray","red" 
19,"M",169.1,"green","brown" 
24,"M",198,"green","gray" 

回答

5

您正在尋找aggregate。下面是按性別返回中位年齡和體重forumla:

aggregate(cbind(age, weight) ~ sex, data=jalal, FUN=median) 
## sex age weight 
## 1 F 20.5 189.9 
## 2 M 21.0 198.1 

要獲得只包含女性的數據幀,這裏是[語法:

jalal[jalal$sex == 'F',] 

注意周圍'F'報價。裸露的F表示FALSE。這就是爲什麼你的第二個subset表達式失敗。

subset(jalal, subset=(sex =='F')) 
## age sex weight eye.color hair.color 
## 1 23 F 93.8  blue  black 
## 3 22 F 196.5  hazel  gray 
## 6 16 F 152.1  blue  gray 

...

在評論,要求爲婦女藍眼睛的平均值的方法。第一種方法是過濾數據幀只藍眼睛的人:

aggregate(cbind(age, weight) ~ sex, data=jalal[jalal$eye.color == 'blue',], FUN=mean) 
## sex  age weight 
## 1 F 19.66667 151.7667 
## 2 M 18.00000 212.8500 

但這似乎hackish的,畢竟,我們不是過濾對婦女的數據幀。所以這裏有一個公式,可以根據性別和眼睛的顏色給出平均年齡和體重。從此,你可以找到藍眼睛的女人,綠眼的男人,等:

aggregate(cbind(age, weight) ~ sex + eye.color, data=jalal, FUN=mean) 
## sex eye.color  age weight 
## 1 M  amber 21.50000 218.5000 
## 2 F  blue 19.66667 151.7667 
## 3 M  blue 18.00000 212.8500 
## 4 M  brown 19.33333 194.9000 
## 5 F  gray 19.00000 194.6333 
## 6 M  gray 23.00000 198.2000 
## 7 F  green 18.50000 221.0500 
## 8 M  green 21.50000 183.5500 
## 9 F  hazel 21.50000 176.9500 

注行2和3的平均這裏匹配在之前表達的結果。

+0

此外,我想知道如果'樂趣'可以計數,而不是隻是意味着/中位數/加權的意思!就像我如何使用聚合來計算有棕色或黑色眼睛的人數?我找不到在'?aggregate'中計數的函數 - 基本上我想知道如何在'aggregate'中找到「fun」函數的列表 –

+1

count是R中的一個向量長度Pass'FUN = length '爲此。創建1列('jalal $ count < - 1')最簡單,並使用'count'代替公式中的'cbind(age,weight)'。 –

+0

@Mathew Lundberg:我可以發現使用聚合函數的第三個最重的人有多大?我試着這樣做,但沒有幫助:'>聚合(年齡〜體重,數據= jalal,FUN =等級)' –

1

下面是一個使用data.table包的替代解決方案:

require(data.table) 
jalal <- as.data.table(jalal) 

於子集上的女性:

jalal[sex == "F"] 

要計算平均數,中位數等:

> jalal[sex == "F", mean(weight)] 
[1] 183.52 
> jalal[sex == "F", list(mean(weight), median(age))] 
     V1 V2 
1: 183.52 20.5 
+1

您可以命名列:'列表(MeanWeight =平均值(體重),MedianAge =中位數))' –

+0

謝謝!我仍然在學習data.table語法。 –

0

正是如此你看到所有的主要選項,這裏是dplyr的解決方案:

library(dplyr) 
jalal %.% 
    group_by(sex, eye.color) %.% 
    summarise(age = mean(age), weight = median(weight)) 
相關問題