2012-10-18 27 views
2

我想從以下數據集和循環得到一個序列圖:R代表循環等一系列情節:獲得總是相同的情節

> head(all5new[c(6,70,22:23)])#This is a snapshot of my dataset. There is more species, see below. 
    setID   fishery blackdog smoothdog 
11  1 TRAWL-PAND.BOR.  0   0 
12  1 TRAWL-PAND.BOR.  0   0 
13  1 TRAWL-REDFISH  0   0 
14  1 TRAWL-PAND.BOR.  0   0 
21 10 TRAWL-PAND.BOR.  0   0 
22 10 TRAWL-PAND.BOR.  0   0 

> elasmo #This is the list of the species for which I would like to have individual barplots 
[1] "blackdog"  "smoothdog" "spinydog"  "mako"   "porbeagle" 
[6] "blue"   "greenland" "portuguese" "greatwhite" "mackerelNS" 
[11] "dogfish"  "basking"  "thresher"  "deepseacat" "atlsharp"  
[16] "oceanicwt" "roughsagre" "dusky"  "sharkNS"  "sand"   
[21] "sandbar"  "smoothhammer" "tiger"  "wintersk"  "abyssalsk" 
[26] "arcticsk"  "barndoorsk" "roundsk"  "jensensk"  "littlesk"  
[31] "richardsk" "smoothsk"  "softsk"  "spinysk"  "thorny"  
[36] "whitesk"  "stingrays" "skateNS"  "manta"  "briersk"  
[41] "pelsting"  "roughsting" "raysNS"  "skateraysNS" "allSHARK"  
[46] "allSKATE"  "PELAGIC"  

這是我的for循環。當我爲一個物種運行代碼時,代碼工作正常,但是當我運行它時,我總是得到相同的barplot。我知道這只是一個快速修復,例如在代碼的某處添加[[i]],但我嘗試了不同的方式,但沒有取得任何成功。

for (i in elasmo) { 

    # CALUCLATE THE CATCH PER UNIT OF EFFORT (KG/SET) FOR ALL SPECIES FOR EACH FISHERY 
    test<-ddply(all5new,.(fishery),summarize, sets=length(as.factor(setID)),LOGcpue=log((sum(i)/length(as.factor(setID))))) 

    #TAKE THE FIRST 10 FISHERY WITH THE HIGHEST LOGcpue 
    x<-test[order(-test$LOGcpue)[1:10],] 

    #REORDER THE FISHERY FACTOR ACCORDINGLY (FOR GGPLOT2, TO HAVE EACH LEVEL IN ORDER) 
    list<-x$fishery 
    x$fishery <- factor(x$fishery, levels =list) 

    #BAR PLOT 
    graph<-ggplot(x, aes(fishery,LOGcpue)) + geom_bar() + coord_flip() + 
    geom_text(aes(label=sets,hjust=0.5,vjust=-1),size=4,angle = 270) 

    #SAVE GRAPH IN NEW DIR 
    ggsave(graph,filename=paste("barplot",i,".png",sep="")) 
} 

下面是熔化後我的數據集的一個子集:mydata

> data.melt<-melt(all5new, id.vars=c("tripID","setID","fishery"), measure.vars = c(22:23)) 
> head(data.melt);dim(data.melt) 
    tripID setID   fishery variable value 
1  1  1 TRAWL-PAND.BOR. blackdog  0 
2  1  1 TRAWL-PAND.BOR. blackdog  0 
3  1  1 TRAWL-REDFISH blackdog  0 
4  1  1 TRAWL-PAND.BOR. blackdog  0 
5  1 10 TRAWL-PAND.BOR. blackdog  0 
6  1 10 TRAWL-PAND.BOR. blackdog  0 
[1] 350100  5 
+3

我想'i'在第一'ddply'語句可能不會得到評估,你希望它是的方式......我可能會嘗試創建一個'*明確*功能ddply'並放入一個'browser()'調用看看那裏發生了什麼...... –

+0

在'for'循環中使用'ddply'是瘋狂的低效率。和@BenBolker是正確的,'我'不會被正確傳遞。給我們一個可重複的例子,也許我們可以幫忙。我建議你'融化'數據集,所以'物種'是一個列,那麼你可以'(漁業,物種)'ddply'。情節也可以列入清單,所以你也可以「清理」那部分。 – Maiasaura

回答

1

下面是我用於生成大量圖表的工作流程,適用於您的數據集(或我的解釋)。這是我認爲plyr的力量的一個很好的例證。對於你的申請,我不認爲計算時間真的很重要。對您來說更重要的是生成易於閱讀的代碼,我認爲plyr對此很有幫助。

#Load packages 
require(plyr) 
require(reshape) 
require(ggplot2) 

#Recreate your data set, with only two species 
setID <- rep(1:5, each=4, times=1) 
fishery <- gl(10, 2) 
blackdog <- sample(1:5, size=20, replace=TRUE) 
smoothdog <- sample(1:5, size=20, replace=TRUE) 
df <- data.frame(setID, fishery, blackdog, smoothdog) 

#Melt the data frame 
dfm <- melt(df, id.vars <- c("setID", "fishery")) 

#Calculate LOGcpue for each fish at each fishery 

cpueDF <- ddply(dfm, c("fishery", "variable"), summarise, LOGcpue = log(sum(value)/length(value))) 

#Plot all the data in one (potentially huge) faceted plot. 
#(I often use huge plots like this for onscreen analysis 
# - obviouly it can't be printed in practice, but you can get a visual overview of the data) 
ggplot(cpueDF, aes(x=fishery, y=LOGcpue)) + geom_bar() + coord_flip() + facet_wrap(~variable) 
ggsave("giant plot.pdf", height=30, width=30, units="in") 

#Print each plot individually to screen, and save it, and put it in a list 
printGraph <- function(df) { 
    p <-ggplot(df, aes(x=fishery, y=LOGcpue)) + 
geom_bar() + coord_flip() 
    print(p) 
    fn <- paste(df$variable[1], ".png") 
    ggsave(fn) 
    printGraph <- p 
} 
plotList <- dlply(cpueDF, .(variable), printGraph) 

#Now pick out the top n fisheries for each fish 
cpueDFtopN <- ddply(cpueDF, .(variable), function(x) head(x[order(x$LOGcpue, decreasing=T),], n=5)) 
ggplot(cpueDFtopN, aes(x=fishery, y=LOGcpue)) + geom_bar() + 
    coord_flip() + facet_wrap(~variable, scales="free") 

enter image description here

+0

謝謝@DrewSteen,當我嘗試cpueDF時,我爲LOGcpue獲得了NAs,點子?我已經添加了我的數據的一個子集。我需要爲每種漁業計算每種物種的千克/套。然後我只想按照降序排列最高LOGcpue的10個第一漁場。 – GodinA

+0

對不起 - 這是因爲count()不按我想的方式工作。將在幾個小時內修復。順便說一句,請輸入您的數據,輸入dput(mydata)並粘貼到代碼窗口 - 這樣我可以直接粘貼到我的R控制檯。 –

+0

對不起,從來沒有做過這個dput命令,這是很多行,即350100!這是人們通常做的事嗎? – GodinA