2013-02-01 33 views
2

我有一個樣本數據框「z」如下:如何非常確定變量組中的觀察值?

deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961 

我喜歡創造相結合的變量種族和性別的新變量「group」。這個新變量唯一地標識daaframe「z」中的觀察組。預期的輸出是

group 
    1 
    2 
    3 
    4 
    4 
    6 
    5 
    1 
    3 
    4 

我想知道我們如何在R中編碼?

+0

您可能正在尋找'interaction()'。 – joran

回答

2

這是諸如此類的事情,我在想:

dat <- read.table(text = "deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961",header = TRUE,sep = "") 

dat$sex <- factor(dat$sex,exclude = NULL) 
dat$race <- factor(dat$race,exclude = NULL) 

with(dat,interaction(sex,race)) 

[1] Female.White Male.White Female.Black Male.Black Male.Black Female.NA NA.Black  Female.White Female.Black 
[10] Male.Black 
Levels: Female.Black Male.Black NA.Black Female.White Male.White NA.White Female.NA Male.NA NA.NA 

它看起來像你想包括港定居,而不是把它們,因此明確factor電話。顯然,可以使用as.integer將結果因子轉換爲整數,但實際的數字不可能按照您指定的順序排列,因爲R會按字母順序排列事情,而不是它們在數據框中的顯示方式。

+0

@ joran:太棒了。非常感謝!! – Metrics

1

你可以使用:

dat <- read.table(text="deaths sex race smokes pyears 
10 Female White 0 1410 
14 Male White 1 1974 
14 Female Black 0 1974 
16 Male Black 1 2256 
17 Male Black 0 2397 
18 Female NA 1 2538 
19 NA Black 0 2679 
20 Female White 1 2820 
20 Female Black 0 2820 
21 Male Black 1 2961", header=TRUE) 

library(qdap) 
factor(paste2(dat[, 2:3], ,FALSE)) 

#for numeric: 
as.numeric(factor(paste2(dat[, 2:3], ,FALSE))) 

但作爲Joran指出你的數字期望是不一樣的R將如何使他們。您需要在factor內部使用levels來根據需要訂購等級。

+0

感謝泰勒替代解決方案! – Metrics