2017-09-04 29 views
0

我有這樣一個數據集:求和的一個變量的產品

test <- 
    data.frame(
     variable = c("A","A","B","B","C","D","E","E","E","F","F","G"), 
     confidence = c(1,0.6,0.1,0.15,1,0.3,0.4,0.5,0.2,1,0.4,0.9),   
     freq  = c(2,2,2,2,1,1,3,3,3,2,2,1), 
     weight  = c(2,2,0,0,1,3,5,5,5,0,0,4) 
    ) 

> test 
    variable confidence freq weight 
1   A  1.00 2  2 
2   A  0.60 2  2 
3   B  0.10 2  0 
4   B  0.15 2  0 
5   C  1.00 1  1 
6   D  0.30 1  3 
7   E  0.40 3  5 
8   E  0.50 3  5 
9   E  0.20 3  5 
10  F  1.00 2  0 
11  F  0.40 2  0 
12  G  0.90 1  4 

我想每個變量的信心來計算權重的總和,是這樣的: Ecuation,其中i是變量(A,B,C ...)

發展上面的公式:

w[1]c[1]+w[1]c[2]=2*1+2*0.6=3.2 
w[2]c[1]+w[2]c[2] 
w[3]c[3]+w[3]c[4] 
w[4]c[3]+w[4]c[4] 
w[5]c[5] 
w[6]c[6] 
w[7]c[7]+w[7]c[8]+w[7]c[9] 
w[8]c[7]+w[8]c[8]+w[8]c[9] 
w[9]c[7]+w[9]c[8]+w[9]c[9] 
… 

結果應該是這樣的:

> test 
    variable confidence freq weight SWC 
1   A  1.00 2  2 3.2 
2   A  0.60 2  2 3.2 
3   B  0.10 2  0 0.0 
4   B  0.15 2  0 0.0 
5   C  1.00 1  1 1.0 
6   D  0.30 1  3 0.9 
7   E  0.40 3  5 5.5 
8   E  0.50 3  5 5.5 
9   E  0.20 3  5 5.5 
10  F  1.00 2  0 0.0 
11  F  0.40 2  0 0.0 
12  G  0.90 1  4 3.6 

請注意,每個觀測值的置信度值不同,但每個變量具有相同的權重,所以我需要的總和對於每個相同的變量觀測值都是相同的。

首先,我試圖讓一個循環迭代每個變量與次數:

> table(test$variable) 

A B C D E F G 
2 2 1 1 3 2 1 

,但我不能使它工作。那麼,我計算出的位置,其中每個變量開始,要儘量使for循環迭代只在這些值:

> tpos = cumsum(table(test$variable)) 
> tpos = tpos+1 
> tpos 
A B C D E F G 
3 5 6 7 10 12 13 
> tpos = shift(tpos, 1) 
> tpos 
[1] NA 3 5 6 7 10 12 
> tpos[1]=1 
> tpos 
[1] 1 3 5 6 7 10 12 

# tpos is a vector with the positions where each variable (A, B, c...) start 

> tposn = c(1:nrow(test))[-tpos] 
> tposn 
[1] 2 4 8 9 11 
> c(1:nrow(test))[-tposn] 
[1] 1 3 5 6 7 10 12 

# then i came up with this loop but it doesn't give the correct result 

for(i in 1:nrow(test)[-tposn]){ 
    a = test$freq[i]-1 
    test$SWC[i:i+a] = sum(test$weight[i]*test$confidence[i:i+a]) 
    } 

也許有這種更簡單的方法? tapply?

回答

3

通過使用dplyr

library(dplyr) 

test %>% 
    group_by(variable) %>% 
    mutate(SWC=sum(confidence*weight)) 

# A tibble: 12 x 5 
# Groups: variable [7] 
variable confidence freq weight SWC 
<fctr>  <dbl> <dbl> <dbl> <dbl> 
1  A  1.00  2  2 3.2 
2  A  0.60  2  2 3.2 
3  B  0.10  2  0 0.0 
4  B  0.15  2  0 0.0 
5  C  1.00  1  1 1.0 
6  D  0.30  1  3 0.9 
7  E  0.40  3  5 5.5 
8  E  0.50  3  5 5.5 
9  E  0.20  3  5 5.5 
10  F  1.00  2  0 0.0 
11  F  0.40  2  0 0.0 
12  G  0.90  1  4 3.6 
+0

隨着基R,'AVE(測試,測試$變量,FUN =函數(x)的總和(X [ '信心'] * X [ '重量'])) ' –

+0

很好用,非常感謝!但是在運行你的代碼之後,SWC輸出不會在數據框中「保存」(如果我運行'test',它不在那裏) – Hoju

+0

^我想我已經解決了它,我只是在你之前添加了'test < - '碼。 – Hoju

相關問題