如何在data.table中調用gDistance？

我已經使用ddply和gDistance（package：rgeos）來鏈接兩組多邊形。但是，對於我的大型數據集，該過程非常緩慢，正如許多博客中所解釋的（例如，Why is plyr so slow?）。如何在data.table中調用gDistance？

這些博客表明，使用data.table會更快，但我不能解決如何使它適用於我的情況。特別是，如何在data.table內爲我的棲息地清除數據的一部分應用gDistance？

下面我會更詳細地解釋我正在做什麼，並附上一小部分測試代碼（這裏是：https://www.dropbox.com/sh/9onaeltb81qrd7h/AAAdED0KS6n2EzP74F3o81Lxa?dl=0）。

我有兩個包含多邊形的形狀文件（其中一個爲生境清理補丁，其他屬性爲清理棲息地而受到懲罰），我想在第一個層（棲息地清理）中標識多邊形中最接近每個多邊形的多邊形第二層（受到懲罰的屬性），因爲兩者並不總是完全重疊。另外，還有一個額外的限制;匹配的棲息地清除多邊形不能比環境犯罪大5歲以上。

我已經成功地使用了以下ddply代碼 - 關於如何使用data.table做到這一點，我認爲它會更快... ...？

謝謝！

# # # # # # # # # # # # # # # # # # # Load [R] GIS packages. # # # # # # # # # # # # # # # # # # 
library(rgeos) 
library(raster) 
library(rgdal) 
library(plyr) 

# # # # # # # # # # # # # # # # # # # ENTER shapefile information HERE # # # # # # # # # # # # # # # # # # # # # # 


# What is the name of the punishments shapefile? 
punishments <- "punishments_stack_overflow" 


# What is the name of the PUNISHMENTs directory? 
myDrctry <- "E:/Esri_arcGIS_datasets/SM_data/IBAMA_embargo/final_embargo_list/near_chopped/stack_overflow" # !CHANGE ME - where the shapefiles are stored 


# What is the name of the hab_cl shapefile? 
hab_cl_shp <- "RO_SimU_deforestation_Amazonia_SIRGAS_near" 


# What is the name of the hab_cl data directory? 
my_hab_cl_Drctry <- "E:/Esri_arcGIS_datasets/SM_data/PRODES/Deforestation_per_SimU/near_analysis" #! CHANGE ME 




# # # # # # # # # # # # # # # # # # # # Load the shapefiles # # # # # # # # # # # # # # # # # # # # # # 


# Read in the embargo shapefile 
punishments_need_near <- readOGR(dsn=myDrctry, layer=punishments) 


# Identify the attributes to keep 
myattributes <- c("numero_tad", "data_tad", "CD_BIOMA") 


# Subset the full dataset extracting only the desired attributes 
[email protected] <- [email protected][,myattributes] 


# Load the deforestation data 
hab_cl_patches_near <- readOGR(dsn=my_hab_cl_Drctry, layer=hab_cl_shp) 
proj4string(hab_cl_patches_near)    # check the projection (which is SIRGAS 2000 UTM) 
#[email protected] <- [email protected][,c("year","LAPIG_ID")] # manipulate the columns to match oter dataframes 
#names([email protected]) <- c("ano", "PRODES_ID") 
head(hab_cl_patches_near)             # check that it worked 


# # # # # # # # # # # # # # # # # # # # # # # Run the loop # # # # # # # # # # # # # # # # # # # # # # # 


# Use ddply to calculate nearest distance for each embargo ("numero_tad") 
tmp <- ddply([email protected], .(numero_tad), function(x) { # numero_tad is a unique identifier per punishment 
    ID <- x$numero_tad[1] 
    tmp.punishments <- punishments_need_near[[email protected]$numero_tad == ID,]      
    tmp.patches <- hab_cl_patches_near[([email protected]$ano +5) >= [email protected][,"ano_new"] &  # match the punishments with habitat clearance in last 5 years (ano_new = yr of punishment, ano = yr of habitat clearance) 
            [email protected]$ano <= [email protected][,"ano_new"],]   # and not after the punishment itself 
    obj <- gDistance(tmp.punishments, tmp.patches, byid=TRUE)              # calculate the distance between each punishment and patch of habitat clearance 
    df <- data.frame(numero_tad = ID, PRODES_ID = tmp.patches$PRODES_ID[which.min(obj)], dist = min(obj)) # link punishment with the nearest suitable patch of habitat clearance 
}, .progress='text') # progress bar 

head(tmp)

來源

2016-03-01 user3679593

聽起來像是使用'.BY'的機會。有關這方面的一些示例，請參閱https://github.com/Rdatatable/data.table/issues/1363 – MichaelChirico

您的多邊形存儲在Spatial Polygons Data Frame類對象中。 data.table只適用於data.table類型的對象。您需要將由data.table創建的數據子集轉換爲空間多邊形數據框。

gDistance需要空間對象，但它正在接收數據表對象。

要解決此問題，請將與您的多邊形相對應的列轉換回聚合函數中的多邊形。

你的Dropbox的鏈接被打破，但這裏的（而不是多邊形）使用空間點的例子。我假設同樣的方法適用。

library(sp) 
library(rgeos) 

punishments <- data.frame(numero_tad = sample(1:3, 10, replace=TRUE), ano_new = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10)) 
patches <- data.frame(PRODES_ID = 1:10, ano = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10)) 
head(punishments) 
head(patches) 

coordinates(punishments) <- ~x+y # convert to Spatial object 
coordinates(patches) <- ~x+y # convert to Spatial object 

class(punishments) # "SpatialPointsDataFrame" 
head(punishments) 

link <- function(SD) { 
    coordinates(SD) <- ~x+y # convert from data.table to Spatial object 
    # you'll need to do something like to above to convert back to a polygon 
    tmp.patches <- patches[(patches$ano + 5) >= SD$ano_new & patches$ano < SD$ano_new, ] 
    obj <- gDistance(SD, tmp.patches, byid=TRUE) 
    df <- list(PRODES_ID = tmp.patches$PRODES_ID[which(obj == min(obj), arr.ind=TRUE)[1]], dist = min(obj)) 
    return(df) 
} 

dt <- as.data.table(punishments) 
class(dt) 
head(dt) 
setkey(dt, numero_tad) 

tmp <- dt[,link(.SD), by=numero_tad]

希望幫助！

來源

2016-04-10 00:27:39

如何在data.table中調用gDistance？

回答

相關問題