2017-06-22 65 views
0

的頻率我有一個DF:子集數據幀和應用功能計算每個因子水平

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"),plot=c("1", "1", "1","2","2","2", "3","3","3","3"), interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B")) 

我想通過plot到子集的數據。對於每個plot子集,我想計算每個唯一的interact類型的頻率。輸出應該是這樣的:

df<- data.frame(region= c("1", "1", "1","1", "2","2", 
"2"),plot=c("1", 
"1", "2","2", "3","3","3"), interact=c("A_B", "C_D", "E_F","C_D", 
"D_E", "C_B","A_B"), freq= c(1,2,1,2,2,1,1)) 

然後我想使計算的DF的每個plot子集以下功能:

sum<-sum(df$freq) # Calculate sum of `freq` for each plot subset (this calculates the total number of interactions) 
prop<-unique(df$freq)/sum #Divide each level of `freq` by the sum (this finds the proportion of each interaction type to the total number of interactions) 
prop2<-prop^2 # Square this proportion 
D<-sum(prop2) # Find the sum of these proportion for each plot subset 
simp<-1/D)# Use this to calculate simpsons diversity 

我想使用的功能是相似在下頁解釋:http://rfunctions.blogspot.com.ng/2012/02/diversity-indices-simpsons-diversity.html。然而,引用的版本是在寬數據集上執行的,我的數據集將會很長。

最後,我將有值的每個情節DF:

result<- 
     Plot div 
      1  1.8 
      2  1.8 
      3  2.6 

回答

0

我用dplyr然而導致對其plot3是不同的,我不知道爲什麼。你能提供你的結果,每次計算或檢查我的,讓我知道錯誤在哪裏?

另外。如果您有興趣的計算多樣性指數,你可以熟悉vegan封裝,特別是功能

df<- data.frame(region= c("1", "1", "1","1","1","1","1","1", "2","2"), 
       plot=c("1", "1", "1","2","2","2", "3","3","3","3"), 
       interact=c("A_B", "C_D","C_D", "E_F","C_D","C_D", "D_E", "D_E","C_B","A_B")) 

library(dplyr) 

df1 <- df %>% group_by(region, plot, interact) %>% summarise(freq = n()) 
df2 <- df1 %>% group_by(plot) %>% mutate(sum=sum(freq), prop=freq/sum, prop2 = prop^2) 
df2 

A tibble: 7 x 7 
# Groups: plot [3] 
    region plot interact freq sum  prop  prop2 
    <fctr> <fctr> <fctr> <int> <int>  <dbl>  <dbl> 
1  1  1  A_B  1  3 0.3333333 0.1111111 
2  1  1  C_D  2  3 0.6666667 0.4444444 
3  1  2  C_D  2  3 0.6666667 0.4444444 
4  1  2  E_F  1  3 0.3333333 0.1111111 
5  1  3  D_E  2  4 0.5000000 0.2500000 
6  2  3  A_B  1  4 0.2500000 0.0625000 
7  2  3  C_B  1  4 0.2500000 0.0625000 


df2 %>% group_by(plot) %>% summarise(D=sum(prop2), simp=1/D) 

A tibble: 3 x 3 
    plot   D  simp 
    <fctr>  <dbl> <dbl> 
1  1 0.5555556 1.800000 
2  2 0.5555556 1.800000 
3  3 0.3750000 2.666667 

這裏是使用功能從vegan包的方法。

首先,你需要使用傳播創造一個「矩陣」與你互動作爲單獨的列

library(vegan) 
library(tidyr) 
library(dplyr) 

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n()) 
df6 <-spread(data=df5, key = interact, value = freq, fill=0) 
df6 

# A tibble: 3 x 6 
# Groups: plot [3] 
    plot A_B C_B C_D D_E E_F 
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> 
1  1  1  0  2  0  0 
2  2  0  0  2  0  1 
3  3  1  1  0  2  0 

比你計算的多樣性,給作爲數據矩陣DF6無1列,這是情節。最後,您可以將計算出的多樣性作爲列添加到df6中。

simp <-diversity(x=df6[,-1], index = "invsimpson") 
df6$simp <- simp 
df6 

# A tibble: 3 x 7 
# Groups: plot [3] 
    plot A_B C_B C_D D_E E_F  simp 
* <fctr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
1  1  1  0  2  0  0 1.800000 
2  2  0  0  2  0  1 1.800000 
3  3  1  1  0  2  0 2.666667 

或甚至do()短,tidy()broom

df5 <- df %>% group_by(plot, interact) %>% summarise(freq = n()) 

library(broom) 

df5 %>% spread(key = interact, value = freq, fill=0) %>% 
    do(tidy(diversity(x=.[,-1], index = "invsimpson"))) 
+0

謝謝!這個解決方案非常有效。我對我的情節進行了編輯3.我對這個計算做了一個錯誤,你的回答是正確的。 – Danielle

+0

此外,我想使用來自純素的多樣性(),但我的理解是它需要矩陣格式。您是否有一種有效的方法將多樣性()集成到子集上?再次感謝。 – Danielle

+0

當然,這也有可能:)我通過使用'diversity()'函數添加方法來編輯我的帖子。看一看。 – MikolajM