2013-05-14 27 views
2

比例最高的我有一個數據幀,看起來像這樣:找到data.frame

x <- data.frame(sector=rep(1:5, each=2), 
       subspecies=rep(c("Type A", "Type B"), 5), 
       proportion= c(.2, 1-.2, .3, 1-.3, .4, 
           1-.4, .5, 1-.5, .6, 1-.6)) 

x$dominance <- NA 

x[,1] <- sort(x[,1]) 

x 
    sector subspecies proportion dominance 
1  1  Type A  0.2  NA 
2  1  Type B  0.8  NA 
3  2  Type A  0.3  NA 
4  2  Type B  0.7  NA 
5  3  Type A  0.4  NA 
6  3  Type B  0.6  NA 
7  4  Type A  0.5  NA 
8  4  Type B  0.5  NA 
9  5  Type A  0.6  NA 
10  5  Type B  0.4  NA 

在每個扇區1-5,如果一個類型是比例最高的,然後我需要添加「顯性」到'優勢'欄,或者如果類型B是最高比例,那麼我需要在'優勢'欄添加'B優勢'。如果有平局,我需要在'優勢'欄添加'領帶'。

這應該是輸出數據幀:

x$dominance <- c("B dominant", "B dominant", "B dominant", "B dominant", "B dominant", 
       "B dominant", "tie", "tie", "A dominant", "A dominant") 
x 
    sector subspecies proportion dominance 
1  1  Type A  0.2 B dominant 
2  1  Type B  0.8 B dominant 
3  2  Type A  0.3 B dominant 
4  2  Type B  0.7 B dominant 
5  3  Type A  0.4 B dominant 
6  3  Type B  0.6 B dominant 
7  4  Type A  0.5  tie 
8  4  Type B  0.5  tie 
9  5  Type A  0.6 A dominant 
10  5  Type B  0.4 A dominant 
+0

Thansk @Josh採取手動編輯所有的數據和答案的時間 – 2013-05-14 16:27:51

回答

3

這裏是一個基礎R解決方案

compare <- function(x) { 
    ## return subspecies of max proportion 
    res <- x[which(x$proportion == max(x$proportion)), "subspecies"] 
    if(length(res) > 1L) { ## if tied length(res) == 2 
    out <- "Tie" 
    } else { ## Simple string replacement 
    out <- paste(sub("Type ", "", res), "Dominant") 
    ## or you could use 
    #out <- if(res == "Type A") {"A Dominant"} else {"B Dominant")} 
    } 
    out 
} 

x$dominance <- unsplit(lapply(split(x, x$sector), compare), x$sector) 

> x 
    sector subspecies proportion dominance 
1  1  Type A  0.2 B Dominant 
2  1  Type B  0.8 B Dominant 
3  2  Type A  0.3 B Dominant 
4  2  Type B  0.7 B Dominant 
5  3  Type A  0.4 B Dominant 
6  3  Type B  0.6 B Dominant 
7  4  Type A  0.5  Tie 
8  4  Type B  0.5  Tie 
9  5  Type A  0.6 A Dominant 
10  5  Type B  0.4 A Dominant 
+0

所以,顯然「比較」已經是基數R的一個函數。 – Frank 2013-05-14 16:45:11

+0

@Frank好了,'compare'似乎是免費的。 – 2013-05-14 17:00:13

4
library(data.table) 
DT <- data.table(x) 

DT[, dominance := {p.a <- proportion[subspecies =="Type A"] 
        p.b <- proportion[subspecies =="Type B"] 
        if (p.a > p.b) "A dominant" else if (p.b > p.a) "B dominant" else "tie"} 
        , by=sector] 


    sector subspecies proportion dominance 
1:  1  Type A  0.2 B dominant 
2:  1  Type B  0.8 B dominant 
3:  2  Type A  0.3 B dominant 
4:  2  Type B  0.7 B dominant 
5:  3  Type A  0.4 B dominant 
6:  3  Type B  0.6 B dominant 
7:  4  Type A  0.5  tie 
8:  4  Type B  0.5  tie 
9:  5  Type A  0.6 A dominant 
10:  5  Type B  0.4 A dominant 
+0

這是後到前是不是?在部門1中,A是0.2,B是0.6,所以它應該是「B佔優」的,是的? – 2013-05-14 15:45:33

+0

謝謝@GavinSimpson。這是我的一個馬虎的錯誤(我複製並粘貼了'主導'這個文字,這兩個部分都是相同的)。編輯並更正。 – 2013-05-14 15:51:08

+0

大家好,它看起來沒有任何數據框是我所追求的 - 看到我已經添加到開放後 – luciano 2013-05-14 16:03:26

2

與基礎R:

do.call(rbind, 
    by(x, x$sector, 
     FUN=function(sec) 
      transform(sec, 
        dominance=if (anyDuplicated(proportion)) 'tie' 
           else subspecies[which.max(proportion)])) 

) 
#  sector subspecies proportion dominance 
# 1.1  1  Type A  0.2 Type B 
# 1.2  1  Type B  0.8 Type B 
# 2.3  2  Type A  0.3 Type B 
# 2.4  2  Type B  0.7 Type B 
# 3.5  3  Type A  0.4 Type B 
# 3.6  3  Type B  0.6 Type B 
# 4.7  4  Type A  0.5  tie 
# 4.8  4  Type B  0.5  tie 
# 5.9  5  Type A  0.6 Type A 
# 5.10  5  Type B  0.4 Type A 

你可以打破它分爲兩個部分,如果這樣做提高了可讀性。

f <- function(sec) 
    transform(sec, dominance=if (anyDuplicated(proportion)) 'tie' 
          else subspecies[which.max(proportion)])) 
do.call(rbind, by(x, x$sector, f))