2013-05-13 105 views
0

我對R很新,並且對於我的帖子沒有提前表示歉意(通常我使用的是dput(),但得到了一個奇怪的輸出並且不知道如何上傳數據集我真的很抱歉)。在R中創建空矩陣並填入日期匹配值

我有一個數據集與6 colums(網站,startdate,enddate,photodate,物種,個人)。例如:

site year startdate enddate photodate species indiv 
M1_7 2012 19/07/2012 10/08/2012 20/07/2012 Sylvicapra grimmia 1 
M1_7 2012 19/07/2012 10/08/2012 23/07/2012 Crocuta crocuta 1 
M1_7 2012 19/07/2012 10/08/2012 23/07/2012 Potamochoerus larvatus 1 
M1_7 2012 19/07/2012 10/08/2012 25/07/2012 Hystrix cristata 1 
M1_7 2012 19/07/2012 10/08/2012 27/07/2012 Potamochoerus larvatus 1 
M1_7 2012 19/07/2012 10/08/2012 27/07/2012 Sylvicapra grimmia 1 
M1_7 2012 19/07/2012 10/08/2012 28/07/2012 Hippotragus equinus  1 
M1_7 2012 19/07/2012 10/08/2012 30/07/2012 Crocuta crocuta 1 
M1_7 2012 19/07/2012 10/08/2012 01/08/2012 Equus q. boehmi 1 
M1_7 2012 19/07/2012 10/08/2012 01/08/2012 Crocuta crocuta 1 
M1_7 2012 19/07/2012 10/08/2012 05/08/2012 Potamochoerus larvatus 1 
M1_7 2012 19/07/2012 10/08/2012 07/08/2012 Hippotragus equinus  1 
M1_9 2012 21/07/2012 11/08/2012 24/07/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 24/07/2012 Crocuta crocuta 2 
M1_9 2012 21/07/2012 11/08/2012 24/07/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 27/07/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 01/08/2012 Alcelaphus b. lichtensteinii 1 
M1_9 2012 21/07/2012 11/08/2012 03/08/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 04/08/2012 Crocuta crocuta 1 
M1_9 2012 21/07/2012 11/08/2012 06/08/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 07/08/2012 Pedetes capensis 1 
M1_9 2012 21/07/2012 11/08/2012 08/08/2012 Pedetes capensis 1 
M1_11 2012 21/07/2012 11/08/2012 26/07/2012 Mellivora capensis 1 
M1_11 2012 21/07/2012 11/08/2012 03/08/2012 Sylvicapra grimmia 1 
M1_11 2012 21/07/2012 11/08/2012 07/08/2012 Hystrix cristata 1 
M1_11 2012 21/07/2012 11/08/2012 08/08/2012 Potamochoerus larvatus 1 

我已經試圖寫一個環路,它創建了一個49列矩陣,其中列1對應於位點,第2列到「開始日期」和「結束日期」的日期之間的序列內的網站,列3:49到物種名稱。在列3:49下的單元格內,我希望在特定日期爲特定物種填充計數數據(單個)所得的數據。

到目前爲止,我只能創建一個與我想要的對應的空矩陣,但一直無法填充數據。這是我使用的代碼:

mlele2012<- read.delim("C:\\multiple regression\\mlele 2012 empty matrix creation.txt") 
africa <- read.delim("C:\\species accumulation curves\\COMPLETE species list.txt") 
specieslistx<-unique(africa) 
specieslistx<-t(specieslistx) 

oldtemp<-NULL 
temp <- rep(0, length(specieslistx)) 

strptime(mlele2012$photodate, "%Y-%m-%d") 
strptime(mlele2012$startdate, "%d/%m/%Y") 
strptime(mlele2012$enddate, "%d/%m/%Y") 

#create empty dataframe with dimensions: no. of sites x no. of dates in each 

for(i in levels(mlele2012$site)) { ##for each site 

    sitetemp <- subset(mlele2012, site == i) ###subset of dataset , for the particular site i## 

    sitetemp$startdate<- as.Date(sitetemp$startdate, "%d/%m/%Y") 
    sitetemp$enddate<- as.Date(sitetemp$enddate, "%d/%m/%Y") 

    sitedatelist<-seq(as.Date(sitetemp$startdate[1]), as.Date(sitetemp$enddate[1]), "days") 

    empty<-matrix(0,length(sitedatelist),length(specieslistx)) 
    sitedatelist1<-as.character(sitedatelist) 
    row.names(empty)<-(sitedatelist1) 
    colnames(empty)<-specieslistx 

    addsitecol<-matrix(0,length(sitedatelist),1) 
    extendempty<-cbind(addsitecol,empty) 
    extendempty[,1]<-i 
    oldtemp<-rbind(oldtemp, extendempty) 
} 

write.csv(oldtemp, "Mlele 2012 dry empty.csv") 

另外,我一直在試圖提取創建相同的格式/尺寸另一個矩陣,但沒有多餘的時間(即僅在「photodate」列日期而不是「startdate」和「enddate」之間的序列)。我希望我能最終以某種方式合併這兩個矩陣,以獲得我最終需要的東西。不幸的是,這個代碼不起作用,儘管似乎沒有錯誤。這是我的代碼的第二部分:

for(i in mlele2012$site) {  
    sitetemp <- subset(mlele2012, site == i) ###subset of dataset "allsites", for the particular site i## 
    for(j in sitetemp$photodate){ 
     datetemp <- subset(sitetemp, photodate == j) ###subset of dataset "africaa", for the particular date i# 
     uniquespperdate <- unique(datetemp$species)###unique species within each date (row) i# 
     temp <- rep(0, length(specieslistx)) #create a temporary vector of 0s with the same length as the species list### 

     for(a in uniquespperdate){ 
     sptemp <- subset(datetemp , species == a) ###subset of dataset "sitetemp", for the particular sp j## 
     countdata<-sum(sptemp$indiv) 
     index <- pmatch(a, names(temp)) ###match the unique species per date to the location on the species list### 
     #there is a problem here, it works when run as a single line but not within a loop 
     temp[index] <- countdata ###for the locations listed in "index", assign the count data to the temporary vector### 
     names(temp)<- specieslistx 
     } 
    }   
    oldtemp <- rbind(oldtemp, temp) ### bind the new temp file to the old temp file, i.e. update the list as the loop runs### 
} 

任何幫助將不勝感激。請讓我知道是否有任何細節可以讓問題更清楚。

+0

也許試試'dput(head(x))'?此外,如果您的所有代碼縮進四個空格,問題將更具可讀性。部分代碼在代碼塊外顯示。 – Frank 2013-05-13 20:42:36

+0

同意第一點(使用'dput(。)'),但有時候沒有經驗的R用戶的策略並不會是最有效的方法。 – 2013-05-13 21:17:13

回答

1

我可以得到大部分的道路上有你的樣品有:

> ftable(xtabs(indiv~site+year+species, data=dat)) 
      species boehmi capensis cristata crocuta equinus grimmia larvatus lichtensteinii 
site year                     
M1_11 2012    0  1  1  0  0  1  1    0 
M1_7 2012    1  0  1  3  2  2  3    0 
M1_9 2012    0  7  0  3  0  0  0    1 

我做輸入使用屬/種爲兩列,因爲你沒有提供所要求的dput版本的數據。

0

有點凌亂,但沒有初始化一個空矩陣,你可以做到以下幾點:

如果df是你的初始數據:

result = do.call("rbind",lapply(levels(df$site),function(x){ 
    do.call("rbind",lapply(levels(df$startdate),function(y){ 
     do.call("rbind",lapply(levels(df$enddate),function(z){ 
      foo <- rep(0,length(levels(df$species))) 
      names(foo) <- levels(df$species) 
      foo[df$species[df$site==x & df$startdate==y & df$enddate==z]] <- df$indiv[df$site==x & df$startdate==y & df$enddate==z] 
      c(x,y,z,foo) 
     })) 
    })) 
})) 

result應該包含你所尋求的矩陣(我希望) 。