2012-07-01 53 views
0

R相對較新,因此對於無能爲力提前道歉。計算數據幀內變量的重複次數,並計算出它的成比例出現

我在多個國家的多個國家的多個站點使用幾個(非常大的)觀測數據集。我需要計算第x周在第x周提交觀察數據的網站總數(本質上是存在/缺失數據)中記錄了特定物種的網站的比例。我有一個數據集,其中給出了每個人的詳細信息物種觀察,以及每週觀測總數的另一個。因此,我需要一些功能來計算該週記錄的物種數量,然後將其除以同一周內記錄任何物種觀測數據的總數。 觀察記錄爲一週(1-53)和一年(1995-2011)。 species.data的

例(上市爲csv,便於粘貼):

SITE_ID, SPECIES, WEEKNO, YEAR 
1289, Attenb., 1, 1995 
1538, Attenb., 1, 1995 
1894, Attenb., 2, 1995 
1286, Attenb., 4, 1995 
1238, Attenb., 7, 1995 
1892, Attenb., 7, 1995 

和示例total.obs.data的:

YEAR, WEEKNO, TOTALOBS, 
1995, 1, 100 
1995, 2, 780 
1995, 3, 100 
1995, 4, 189 
1995, 5, 382 
1995, 6, 100 
1995, 7, 899 
1995, 8, 129 

(所以我在這裏就沒有說在1995年第一週的比例是2/100,並且能夠構建GLM或GAM)

+0

你的問題並不難。您可以使用重塑和一些子集的組合來輕鬆完成此操作。但請提供可重複使用的樣本數據集。例如,第二數據集中的物種在哪裏? – ECII

+1

如果它是一個大數據集'data.table'包可能是你的朋友。 –

+0

正如@TylerRinker所評論的那樣,請定義「超大」數據集的含義。有大型,大型和大型數據集。 – ECII

回答

0

讓我試一試,同時注意上面評論中已經提到的問題的所有限制

#Create the data frame with the total observations 
tot.obs<-data.frame(year=rep(1995,10), weekno=1:10, obs=floor(runif(n=10,80,100))) 
#Create the variable week-year 
tot.obs$week.year<-paste(tot.obs$week,tot.obs$year,sep="-") 

#Create the data frame species observations 
species.data<-data.frame(site=factor(floor(runif(n=5,2000,3000))), week=c(1,1,2,4,7), year=rep(1995,5),observ=rep(1,5)) 
species.data$week.year<-paste(species.data$week,species.data$year,sep="-") 
species.data$total.obs<-NA 

#Match the total observations form the tot.obs data frame to the species data frame. You can probably do it much faster but here is a "quick and dirty way" 

for (i in 1:dim(species.data)[1]){ 
    species.data$total.obs[i]<-tot.obs$obs[tot.obs$week.year==species.data$week.year[i]] 
} 

#Calculates the percentage of the total observation that each center contributes 
species.data$per.obs<-species.data$observ/ species.data$total.obs 

#For the presentation of the data, reshape is your friend 
library(reshape) 
species.data.melt<-melt(species.data,id.vars=c("site","week.year"), measure.vars="per.obs") 

cast(species.data.melt,site~week.year, fun.aggregate=sum) 


site  1-1995  2-1995  4-1995  7-1995 
1 2436 0.00000000 0.00000000 0.01010101 0.00000000 
2 2501 0.00000000 0.01123596 0.00000000 0.00000000 
3 2590 0.00000000 0.00000000 0.00000000 0.01123596 
4 2608 0.01030928 0.00000000 0.00000000 0.00000000 
5 2942 0.01030928 0.00000000 0.00000000 0.00000000 

否則,如果你不感興趣,每個中心事物的觀察更容易:

species.data.melt2<-melt(species.data,id.vars=c("week.year"), measure.vars="observ") 
species.obs.total<-data.frame(cast(species.data.melt2,week.year~value, fun.aggregate=sum)) 
colnames(species.obs.total)[2]<-"aggregated.total" 
species.obs.total$total<-NA 

for (i in 1:dim(species.obs.total)[1]){ 
    species.obs.total$total[i]<-tot.obs$obs[tot.obs$week.year==species.obs.total$week.year[i]] 
} 

species.obs.total$perc<-species.obs.total$aggregated.total/ species.obs.total$total 
species.obs.total 


    week.year aggregated.total total  perc 
1 1-1995    2 97 0.02061856 
2 2-1995    1 89 0.01123596 
3 4-1995    1 99 0.01010101 
4 7-1995    1 89 0.01123596 
0

目前的數據過於簡單,支持多複雜的測試。該xtabs函數創建一個矩陣對象可通過一週的總計可分爲:

> xtblspec <- xtabs(~ SPECIES+ SITE_ID +WEEKNO + YEAR , data=dat)  
> xtblspec 
, , WEEKNO = 1, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 0 1 1 0 0 

, , WEEKNO = 2, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 0 0 0 0 1 

, , WEEKNO = 4, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 0 1 0 0 0 0 

, , WEEKNO = 7, YEAR = 1995 

     SITE_ID 
SPECIES 1238 1286 1289 1538 1892 1894 
    Attenb. 1 0 0 0 1 0 
#------------- 

weekobs <- totobs[ match(as.numeric(dimnames(xtblspec[ 1, , ,])$WEEKNO) ,totobs$WEEKNO) , 
        "TOTALOBS"] 
#[1] 100 780 189 899 

要獲得正確設置了具體意見的矩陣,使矩陣divsions將正常工作,你需要有WEEKNO作爲第一尺寸:

xtblspec <- xtabs(~ WEEKNO +SPECIES+ SITE_ID + YEAR , data=dat) 
> xtblspec/weekobs 
, , SITE_ID = 1238, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.000000000 
    7 0.001112347 

, , SITE_ID = 1286, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.005291005 
    7 0.000000000 

, , SITE_ID = 1289, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.010000000 
    2 0.000000000 
    4 0.000000000 
    7 0.000000000 

, , SITE_ID = 1538, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.010000000 
    2 0.000000000 
    4 0.000000000 
    7 0.000000000 

, , SITE_ID = 1892, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.000000000 
    4 0.000000000 
    7 0.001112347 

, , SITE_ID = 1894, YEAR = 1995 

     SPECIES 
WEEKNO  Attenb. 
    1 0.000000000 
    2 0.001282051 
    4 0.000000000 
    7 0.000000000