2016-03-01 41 views
0

我已經使用ddply和gDistance(package:rgeos)來鏈接兩組多邊形。但是,對於我的大型數據集,該過程非常緩慢,正如許多博客中所解釋的(例如,Why is plyr so slow?)。如何在data.table中調用gDistance?

這些博客表明,使用data.table會更快,但我不能解決如何使它適用於我的情況。特別是,如何在data.table內爲我的棲息地清除數據的一部分應用gDistance?

下面我會更詳細地解釋我正在做什麼,並附上一小部分測試代碼(這裏是:https://www.dropbox.com/sh/9onaeltb81qrd7h/AAAdED0KS6n2EzP74F3o81Lxa?dl=0)。

我有兩個包含多邊形的形狀文件(其中一個爲生境清理補丁,其他屬性爲清理棲息地而受到懲罰),我想在第一個層(棲息地清理)中標識多邊形中最接近每個多邊形的多邊形第二層(受到懲罰的屬性),因爲兩者並不總是完全重疊。另外,還有一個額外的限制;匹配的棲息地清除多邊形不能比環境犯罪大5歲以上。

我已經成功地使用了以下ddply代碼 - 關於如何使用data.table做到這一點,我認爲它會更快... ...?

謝謝!

# # # # # # # # # # # # # # # # # # # Load [R] GIS packages. # # # # # # # # # # # # # # # # # # 
library(rgeos) 
library(raster) 
library(rgdal) 
library(plyr) 

# # # # # # # # # # # # # # # # # # # ENTER shapefile information HERE # # # # # # # # # # # # # # # # # # # # # # 


# What is the name of the punishments shapefile? 
punishments <- "punishments_stack_overflow" 


# What is the name of the PUNISHMENTs directory? 
myDrctry <- "E:/Esri_arcGIS_datasets/SM_data/IBAMA_embargo/final_embargo_list/near_chopped/stack_overflow" # !CHANGE ME - where the shapefiles are stored 


# What is the name of the hab_cl shapefile? 
hab_cl_shp <- "RO_SimU_deforestation_Amazonia_SIRGAS_near" 


# What is the name of the hab_cl data directory? 
my_hab_cl_Drctry <- "E:/Esri_arcGIS_datasets/SM_data/PRODES/Deforestation_per_SimU/near_analysis" #! CHANGE ME 




# # # # # # # # # # # # # # # # # # # # Load the shapefiles # # # # # # # # # # # # # # # # # # # # # # 


# Read in the embargo shapefile 
punishments_need_near <- readOGR(dsn=myDrctry, layer=punishments) 


# Identify the attributes to keep 
myattributes <- c("numero_tad", "data_tad", "CD_BIOMA") 


# Subset the full dataset extracting only the desired attributes 
[email protected] <- [email protected][,myattributes] 


# Load the deforestation data 
hab_cl_patches_near <- readOGR(dsn=my_hab_cl_Drctry, layer=hab_cl_shp) 
proj4string(hab_cl_patches_near)    # check the projection (which is SIRGAS 2000 UTM) 
#[email protected] <- [email protected][,c("year","LAPIG_ID")] # manipulate the columns to match oter dataframes 
#names([email protected]) <- c("ano", "PRODES_ID") 
head(hab_cl_patches_near)             # check that it worked 


# # # # # # # # # # # # # # # # # # # # # # # Run the loop # # # # # # # # # # # # # # # # # # # # # # # 


# Use ddply to calculate nearest distance for each embargo ("numero_tad") 
tmp <- ddply([email protected], .(numero_tad), function(x) { # numero_tad is a unique identifier per punishment 
    ID <- x$numero_tad[1] 
    tmp.punishments <- punishments_need_near[[email protected]$numero_tad == ID,]      
    tmp.patches <- hab_cl_patches_near[([email protected]$ano +5) >= [email protected][,"ano_new"] &  # match the punishments with habitat clearance in last 5 years (ano_new = yr of punishment, ano = yr of habitat clearance) 
            [email protected]$ano <= [email protected][,"ano_new"],]   # and not after the punishment itself 
    obj <- gDistance(tmp.punishments, tmp.patches, byid=TRUE)              # calculate the distance between each punishment and patch of habitat clearance 
    df <- data.frame(numero_tad = ID, PRODES_ID = tmp.patches$PRODES_ID[which.min(obj)], dist = min(obj)) # link punishment with the nearest suitable patch of habitat clearance 
}, .progress='text') # progress bar 

head(tmp) 
+0

聽起來像是使用'.BY'的機會。有關這方面的一些示例,請參閱https://github.com/Rdatatable/data.table/issues/1363 – MichaelChirico

回答

0

您的多邊形存儲在Spatial Polygons Data Frame類對象中。 data.table只適用於data.table類型的對象。您需要將由data.table創建的數據子集轉換爲空間多邊形數據框。

gDistance需要空間對象,但它正在接收數據表對象。

要解決此問題,請將與您的多邊形相對應的列轉換回聚合函數中的多邊形。

你的Dropbox的鏈接被打破,但這裏的(而不是多邊形)使用空間點的例子。我假設同樣的方法適用。

library(sp) 
library(rgeos) 

punishments <- data.frame(numero_tad = sample(1:3, 10, replace=TRUE), ano_new = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10)) 
patches <- data.frame(PRODES_ID = 1:10, ano = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10)) 
head(punishments) 
head(patches) 

coordinates(punishments) <- ~x+y # convert to Spatial object 
coordinates(patches) <- ~x+y # convert to Spatial object 

class(punishments) # "SpatialPointsDataFrame" 
head(punishments) 

link <- function(SD) { 
    coordinates(SD) <- ~x+y # convert from data.table to Spatial object 
    # you'll need to do something like to above to convert back to a polygon 
    tmp.patches <- patches[(patches$ano + 5) >= SD$ano_new & patches$ano < SD$ano_new, ] 
    obj <- gDistance(SD, tmp.patches, byid=TRUE) 
    df <- list(PRODES_ID = tmp.patches$PRODES_ID[which(obj == min(obj), arr.ind=TRUE)[1]], dist = min(obj)) 
    return(df) 
} 

dt <- as.data.table(punishments) 
class(dt) 
head(dt) 
setkey(dt, numero_tad) 

tmp <- dt[,link(.SD), by=numero_tad] 

希望幫助!