計算數據幀內變量的重複次數，並計算出它的成比例出現

R相對較新，因此對於無能爲力提前道歉。計算數據幀內變量的重複次數，並計算出它的成比例出現

我在多個國家的多個國家的多個站點使用幾個（非常大的）觀測數據集。我需要計算第x周在第x周提交觀察數據的網站總數（本質上是存在/缺失數據）中記錄了特定物種的網站的比例。我有一個數據集，其中給出了每個人的詳細信息物種觀察，以及每週觀測總數的另一個。因此，我需要一些功能來計算該週記錄的物種數量，然後將其除以同一周內記錄任何物種觀測數據的總數。觀察記錄爲一週（1-53）和一年（1995-2011）。 species.data的

例（上市爲csv，便於粘貼）：

SITE_ID, SPECIES, WEEKNO, YEAR 
1289, Attenb., 1, 1995 
1538, Attenb., 1, 1995 
1894, Attenb., 2, 1995 
1286, Attenb., 4, 1995 
1238, Attenb., 7, 1995 
1892, Attenb., 7, 1995

和示例total.obs.data的：

YEAR, WEEKNO, TOTALOBS, 
1995, 1, 100 
1995, 2, 780 
1995, 3, 100 
1995, 4, 189 
1995, 5, 382 
1995, 6, 100 
1995, 7, 899 
1995, 8, 129

（所以我在這裏就沒有說在1995年第一週的比例是2/100，並且能夠構建GLM或GAM）

來源

2012-07-01 user1494636

你的問題並不難。您可以使用重塑和一些子集的組合來輕鬆完成此操作。但請提供可重複使用的樣本數據集。例如，第二數據集中的物種在哪裏？ – ECII

如果它是一個大數據集'data.table'包可能是你的朋友。 –

正如@TylerRinker所評論的那樣，請定義「超大」數據集的含義。有大型，大型和大型數據集。 – ECII

讓我試一試，同時注意上面評論中已經提到的問題的所有限制

#Create the data frame with the total observations 
tot.obs<-data.frame(year=rep(1995,10), weekno=1:10, obs=floor(runif(n=10,80,100))) 
#Create the variable week-year 
tot.obs$week.year<-paste(tot.obs$week,tot.obs$year,sep="-") 

#Create the data frame species observations 
species.data<-data.frame(site=factor(floor(runif(n=5,2000,3000))), week=c(1,1,2,4,7), year=rep(1995,5),observ=rep(1,5)) 
species.data$week.year<-paste(species.data$week,species.data$year,sep="-") 
species.data$total.obs<-NA 

#Match the total observations form the tot.obs data frame to the species data frame. You can probably do it much faster but here is a "quick and dirty way" 

for (i in 1:dim(species.data)[1]){ 
    species.data$total.obs[i]<-tot.obs$obs[tot.obs$week.year==species.data$week.year[i]] 
} 

#Calculates the percentage of the total observation that each center contributes 
species.data$per.obs<-species.data$observ/ species.data$total.obs 

#For the presentation of the data, reshape is your friend 
library(reshape) 
species.data.melt<-melt(species.data,id.vars=c("site","week.year"), measure.vars="per.obs") 

cast(species.data.melt,site~week.year, fun.aggregate=sum) 


site  1-1995  2-1995  4-1995  7-1995 
1 2436 0.00000000 0.00000000 0.01010101 0.00000000 
2 2501 0.00000000 0.01123596 0.00000000 0.00000000 
3 2590 0.00000000 0.00000000 0.00000000 0.01123596 
4 2608 0.01030928 0.00000000 0.00000000 0.00000000 
5 2942 0.01030928 0.00000000 0.00000000 0.00000000

否則，如果你不感興趣，每個中心事物的觀察更容易：

species.data.melt2<-melt(species.data,id.vars=c("week.year"), measure.vars="observ") 
species.obs.total<-data.frame(cast(species.data.melt2,week.year~value, fun.aggregate=sum)) 
colnames(species.obs.total)[2]<-"aggregated.total" 
species.obs.total$total<-NA 

for (i in 1:dim(species.obs.total)[1]){ 
    species.obs.total$total[i]<-tot.obs$obs[tot.obs$week.year==species.obs.total$week.year[i]] 
} 

species.obs.total$perc<-species.obs.total$aggregated.total/ species.obs.total$total 
species.obs.total 


    week.year aggregated.total total  perc 
1 1-1995    2 97 0.02061856 
2 2-1995    1 89 0.01123596 
3 4-1995    1 99 0.01010101 
4 7-1995    1 89 0.01123596

來源

2012-07-01 20:29:57 ECII

目前的數據過於簡單，支持多複雜的測試。該xtabs函數創建一個矩陣對象可通過一週的總計可分爲：

> xtblspec <- xtabs(~ SPECIES+ SITE_ID +WEEKNO + YEAR , data=dat)  
> xtblspec 
, , WEEKNO = 1, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 0 1 1 0 0 

, , WEEKNO = 2, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 0 0 0 0 1 

, , WEEKNO = 4, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 1 0 0 0 0 

, , WEEKNO = 7, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 1 0 0 0 1 0 
#------------- 

weekobs <- totobs[ match(as.numeric(dimnames(xtblspec[ 1, , ,])$WEEKNO) ,totobs$WEEKNO) , 
        "TOTALOBS"] 
#[1] 100 780 189 899

要獲得正確設置了具體意見的矩陣，使矩陣divsions將正常工作，你需要有WEEKNO作爲第一尺寸：

xtblspec <- xtabs(~ WEEKNO +SPECIES+ SITE_ID + YEAR , data=dat) 
> xtblspec/weekobs 
, , SITE_ID = 1238, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.000000000 
    7 0.001112347 

, , SITE_ID = 1286, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.005291005 
    7 0.000000000 

, , SITE_ID = 1289, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.010000000 
    2 0.000000000 
    4 0.000000000 
    7 0.000000000 

, , SITE_ID = 1538, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.010000000 
    2 0.000000000 
    4 0.000000000 
    7 0.000000000 

, , SITE_ID = 1892, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.000000000 
    7 0.001112347 

, , SITE_ID = 1894, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.001282051 
    4 0.000000000 
    7 0.000000000

來源

2012-07-01 22:11:58

計算數據幀內變量的重複次數，並計算出它的成比例出現

回答

相關問題