2017-01-31 57 views
0

我想將柵格化一個非常大的矢量文件到25米,並且已經在「簇」包中取得了一些成功,適應了這個曲線的herehere,這對於特定數據非常適用。R中使用降雪和sfLapply柵格化多邊形

但是,我現在有一個更大的矢量文件,需要柵格化並可以訪問使用降雪的羣集。我不習慣集羣功能,我只是不知道如何設置sfLapply。我一直得到以下類型的錯誤作爲sfLapply被稱爲集羣:

Error in checkForRemoteErrors(val) : 
    one node produced an error: 'quote(96)' is not a function, character or symbol 
Calls: sfLapply ... clusterApply -> staticClusterApply -> checkForRemoteErrors 

我全碼:

library(snowfall) 
library(rgeos) 
library(maptools) 
library(raster) 
library(sp) 

setwd("/home/dir/") 

# Initialise the cluster... 
hosts = as.character(read.table(Sys.getenv('PBS_NODEFILE'),header=FALSE)[,1]) # read the nodes to use 
sfSetMaxCPUs(length(hosts)) # make sure the maximum allowed number of CPUs matches the number of hosts 
sfInit(parallel=TRUE, type="SOCK", socketHosts=hosts, cpus=length(hosts), useRscript=TRUE) # initialise a socket cluster session with the named nodes 
sfLibrary(snowfall) 

# read in required data 

shp <- readShapePoly("my_data.shp") 
BNG <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs" 
crs(shp) <- BNG 

### rasterize the uniques to 25m and write (GB and clipped) ### 
rw <- raster(res=c(25,25), xmn=0, xmx=600000, ymn=0, ymx=1000000, crs=BNG) 

# Number of polygons features in SPDF 
features <- 1:nrow(shp[,]) 

# Split features in n parts 
n <- 96 
parts <- split(features, cut(features, n)) 

rasFunction = function(X, shape, raster, nparts){ 
    ras = rasterize(shape[nparts[[X]],], raster, 'CODE') 
    return(ras) 
} 

# Export everything in the workspace onto the cluster... 
sfExportAll() 

# Distribute calculation across the cluster nodes... 
rDis = sfLapply(n, fun=rasFunction,X=n, shape=shp, raster=rw, nparts=parts) # equivalent of sapply 
rMerge <- do.call(merge, rDis) 

writeRaster(rMerge, filename="my_data_25m", format="GTiff", overwrite=TRUE) 

# Stop the cluster... 
sfStop() 

我已經嘗試了一些事情,改變功能和sfLapply,但我無法得到這個運行。感謝

+1

如果您正在尋找具有光柵化(大)矢量數據的速度,請查看'gdalUtils :: gdal_rasterize'。這通常比'raster :: rasterize'快得多。 – joberlin

+0

好的,謝謝,我也會看看這個 – Sam

+0

我放棄了rasFunction,並將rDis改爲「rDis = sfLapply(1:n,fun = function(x)rasterize(shp [parts [[x]],],rw ,'CODE'))「但是現在我在checkForRemoteErrors(val)中得到錯誤: 96個節點產生錯誤;第一個錯誤:'數據'必須是矢量類型,是'NULL'。難住了。 – Sam

回答

0

好了,所以我放棄了降雪和我看着gdalUtils :: gdal_rasterize代替,發現有很多的好處,使用它

語境&問題(有一個缺點可能有人能夠回答?):我的矢量數據存在於ESRI文件地理數據庫中,需要進行一些預處理光柵化處理。沒問題,rgdal :: readOGR很好。但是,由於gdal_rasterize需要矢量數據的路徑名,因此我在這裏遇到了麻煩,因爲我無法寫出已處理的矢量數據,它們超出地理數據庫外部的shapefile的最大文件大小,並且gdal_rasterize不會接受對象,到.gdbs的路徑或.Rdata/.rds文件。 如何將對象傳遞給gdal_rasterize?

所以我寫出了大小相當於處理器數量的shapefile文件。

最初柵格:: rasterize被使用,因爲我可以簡單地將存儲在內存中的矢量對象傳遞到沒有寫入問題的柵格化(儘管我希望寫入它),將這些數據柵格化爲25米。這花了相當長的時間,即使是平行的。

解決方案:gdal_rasterize並行。

# gdal_rasterize in parallel 
require(gdalUtils) 
require(rgdal) 
require(rgeos) 
require(cluster) 
require(parallel) 
require(raster) 

# read in vector data 
shape <- readOGR("./mygdb.gdb", layer="mydata",stringsAsFactors=F) 

## do all the vector processing etC## 

# split vector data into n parts, the same as number of processors (minus 1) 
npar <- detectCores() - 1 
features <- 1:nrow(shape[,]) 
parts <- split(features, cut(features, npar)) 

# write the vector parts out 
for(n in 1:npar){ 
    writeOGR(shape[parts[[n]],], ".\\parts", paste0("mydata_p",n), driver="ESRI Shapefile") 
} 

# set up and write a blank raster for gdal_rasterize for EACH vector segment created above 
r <- raster(res=c(25,25), xmn=234000, xmx=261000, ymn=229000, ymx=256000, crs=projection(shape))  
for(n in 1:npar){ 
writeRaster(r, filename=paste0(".\\gdal_p",n,".tif"), format="GTiff", overwrite=TRUE) 
} 

# set up cluster and pass required packages and objects to cluster 
cl <- makeCluster(npar) 
clusterEvalQ(cl, sapply(c('raster', 'gdalUtils',"rgdal"), require, char=TRUE)) 
clusterExport(cl, list("r","npar")) 

# parallel apply the gdal_rasterize function against the vector parts that were written, 
# same number as processors, against the pre-prepared rasters 
parLapply(cl = cl, X = 1:npar, fun = function(x) gdal_rasterize(src_datasource=paste0(".\\parts\\mydata_p",x,".shp"), 
dst_filename=paste0(".\\gdal_p",n,".tif"),b=1,a="code",verbose=F,output_Raster=T)) 

# There are now n rasters representing the n segments of the original vector file 
# read in the rasters as a list, merge and write to a new tif. 
s <- lapply(X=1:npar, function(x) raster(paste0(".\\gdal_p",n,".tif"))) 
s$filename <- "myras_final.tif" 
do.call(merge,s) 
stopCluster(cl) 

的時間(/拆分60%矢量讀/寫處理,供產生光柵和光柵掃描& 40%)對該代碼整個作業是大約比光柵::平行光柵化更快9倍。

注意:我最初嘗試通過將向量拆分爲n部分,但只創建1個空白光柵。然後,我同時寫入了來自所有羣集節點的同一個空白柵格,但這破壞了柵格並使其在R/Arc /任何內容中都不可用(儘管通過該函數沒有錯誤)。上面是一個更穩定的方法,但n個空白柵格必須被製作而不是1個,增加了處理時間,再加上合併n個柵格是額外的處理。

警告 - 光柵::光柵化並行沒有writeRaster光柵化函數內部,而是作爲一個單獨的線,這將在原有的運行增加了處理時間,由於存儲到臨時文件等

編輯:爲什麼來自gdal_rasterize的柵格的頻率表與柵格::柵格化不一樣?我的意思是有1億個單元格,我希望有一點區別,但對於一些代碼來說,它只有幾千個不同的單元格。我認爲他們都通過質心光柵化?

0

因爲我不能做註釋的格式:

library(maptools) 
shp <- readShapePoly("my_data.shp") 
BNG <- "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs" 

shp.2 <- spTransform(shp, BNG) 
#Continue as before 

覆蓋一個投影=重新投影數據。

+0

沒有對不起,我的意思是歡迎所有的想法,爲什麼腳本可能會出錯!我知道undefined和reprojecting的區別:) – Sam