2012-02-16 65 views
0

我有以下(虛設)數據:數據整形和R中的邏輯索引

d <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 5L, 
5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("apple", "grapefruit", 
"orange", "peach", "pear"), class = "factor"), type = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("large", 
"small"), class = "factor"), location = structure(c(1L, 2L, 3L, 
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("P1", 
"P2", "P3"), class = "factor"), diameter = c(17.2, 19.1, 18.5, 
23.3, 22.9, 19.4, 11.1, 11.8, 6.8, 3.2, 7.9, 5.6, 8.4, 9.2, 9.7, 
17.1, 19.4, 18.9, 11.8, 10.6, 10.1, 18.8, 17.9, 13.2, 8.5, 8.9, 
7.2, 10.1, 8.7, 6.6)), .Names = c("group", "type", "location", 
"diameter"), class = "data.frame", row.names = c(NA, -30L)) 

我想創建從該新的數據幀,從「直徑」變量導出比率爲每個級別3個因素:「位置」,「類型」和「組」。

P3.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P3"]/diameter[group=="pear" & type=="large" & location=="P1"]) 
P2.P1.L <- with(d, diameter[group=="pear" & type=="large" & location=="P2"]/diameter[group=="pear" & type=="large" & location=="P1"]) 
P3.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P3"]/diameter[group=="pear" & type=="small" & location=="P1"]) 
P2.P1.S <- with(d, diameter[group=="pear" & type=="small" & location=="P2"]/diameter[group=="pear" & type=="small" & location=="P1"]) 

最後data.frame會是這個樣子:

group, type, P2.P1, P3.P1 
pear, large, 1.75, 2.469 
pear, small, 0.613, 1.063 
apple, large, ..., ... 
apple, small, ..., ... 

很顯然,我能做到這一點像我上面的說明 - 邏輯索引每個實例中的3個因素的正確水平。問題是,在我的真實數據中,我有大約40個關於「組」因素的等級(儘管在「類型」中仍然只有2個等級)。我想要一個解決方案,使我可以使用邏輯索引與「位置」或許「類型」,然後遍歷「組」的所有級別。例如,像:

with(d, by(d, group, function(x) diameter[type=="large" & location=="P3"]/diameter[type=="large" & location=="P1"])) 

但是這並不完全做到我想要什麼(用「組== X」也不行索引)。

一個解決方案將跟蹤每個比率與其「組」和「類型」因子水平的關聯,然後將這些數據放入新數據框中,如上面所需的輸出所示,將是驚人的。任何有關如何解決這個問題的建議都將非常感謝。

回答

2

您可以使用dcast將數據轉換爲更寬的格式。

library(reshape2) 
d <- dcast(d, group + type ~ location) 

它是那麼簡單的計算需要的比例,例如:

transform(d, P2.P1=P2/P1, P3.P1=P3/P1) 
+0

那太好了,謝謝。 ......我現在真的要花時間學習哈德利的數據處理軟件包。 – Steve 2012-02-16 04:07:06

相關問題