2017-03-01 52 views
3

我有一個包含對象的位置數據集:檢測外排

so <- data.frame(x = rep(c(1:5), each = 5), y = rep(1:5, 5)) 
so1 <- so %>% mutate(x = x + 5, y = y +2) 
so2 <- rbind(so, so1) %>% mutate(x = x + 13, y = y + 7) 
so3 <- so2 %>% mutate(x = x + 10) 
ggplot(aes(x = x, y = y), data = rbind(so, so1, so2, so3)) + geom_point() 

我想知道的是,如果有R中的方法,可以檢測到物體位於數據集中的外部行,因爲我必須從分析中排除這些對象。我想排除在紅色的物體如圖片上的 enter image description here

到目前爲止,我用minmaxifelse但這是tidious,我不能創造的東西,可以推廣到不同的數據集設計不同的x和y。 有沒有這樣做的package?或/和是否有可能解決這樣的問題?

回答

4

你也許可以使用「空間」方法? 可視化你的數據作爲空間對象,您的問題將變得刪除你的補丁的邊界...

這可以非常直接地使用包raster來完成:因此找到boundariesmask您的數據。

library(dplyr) 
library(raster) 

# Your reproducible example 
myDF = rbind(so,so1,so2,so3) 
myDF$z = 1 # there may actually be more 'z' variables 

# Rasterize your data 
r = rasterFromXYZ(myDF) # if there are more vars, this will be a RasterBrick 
par(mfrow=c(2,2)) 
plot(r, main='Original data') 

# Here I artificially add 1 row above and down and 1 column left and right, 
# This is a trick needed to make sure to also remove the cells that are 
# located at the border of your raster with `boundaries` in the next step. 
newextent = extent(r) + c(-res(r)[1], res(r)[1], -res(r)[2], res(r)[2]) 
r = extend(r, newextent) 
plot(r, main='Artificially extended') 
plot(rasterToPoints(r, spatial=T), add=T, col='blue', pch=20, cex=0.3) 

# Get the cells to remove, i.e. the boundaries 
bounds = boundaries(r[[1]], asNA=T) #[[1]]: in case r is a RasterBrick 
plot(bounds, main='Cells to remove (where 1)') 
plot(rasterToPoints(bounds, spatial=T), add=T, col='red', pch=20, cex=0.3) 

# Then mask your data (i.e. subset to remove boundaries) 
subr = mask(r, bounds, maskvalue=1) 
plot(subr, main='Resulting data') 
plot(rasterToPoints(subr, spatial=T), add=T, col='blue', pch=20, cex=0.3) 

# This is your new data (the added NA's are not translated so it's OK) 
myDF2 = rasterToPoints(subr) 

enter image description here

它會幫助你?

+0

我猜你不能在你想要柵格化的數據框中有任何其他變量?只是x和y? – Mateusz1981

+0

實際上你可能有很多變量:-)'rasterFromXYZ'然後會產生一個'RasterBrick'(即一個多層柵格,每個變量在你的data.frame中有一層)。唯一的區別是你需要做'bounds = boundaries(r [[1]],asNA = T)'作爲'邊界'不適用於'RasterBrick'(所以你選擇第一個層來處理它 - 這意味着你的所有變量必須具有相同的空間分佈)。我相應地編輯了答案:它現在對任何情況都有效(1個或多個變量)。 – ztl