2015-08-17 78 views
1

我有一個跨越整個地球的數據點的CSV(the US Geological Service's earthquake feed),我只想篩選美國境內的地震。如何使用KML多邊形過濾地理座標的數據框?

KML文件我已經從美國人口普查局拉:

https://www.census.gov/geo/maps-data/data/kml/kml_nation.html

在R,該rgdal庫可以加載KML文件:

library(rgdal) 
kml = readOGR("kmls/cb_2014_us_nation_20m.kml", 'cb_2014_us_nation_20m') 

如何使用dplyr/plyr /等。僅針對落在由KML指定的邊界內的行來過濾data.frame(地理數據的列爲latitudelongitude)?


編輯,回答後:

這是我從@hrbrmstr's answer用來得到一個快速的可視化:

library(sp) 
library(rgdal) 
# download earthquakes 
url <- "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv" 
fil <- "all_week.csv" 
if (!file.exists(fil)) download.file(url, fil) 
quakes <- read.csv("all_week.csv", stringsAsFactors=FALSE) 
# create spatial object 
sp::coordinates(quakes) <- ~longitude+latitude 

# download nation KML 
url <- "http://www2.census.gov/geo/tiger/GENZ2014/kml/cb_2014_us_nation_20m.zip" 
fil <- "uskml.zip" 
if (!file.exists(fil)) download.file(url, fil) 
unzip(fil, exdir="uskml") 
# read KML file 
us <- rgdal::readOGR("./uskml/cb_2014_us_nation_20m.kml", "cb_2014_us_nation_20m") 
sp::proj4string(quakes) <- sp::proj4string(us) 

length(quakes) 
# 1514 

usquakes = quakes[us,] 
length(usquakes) 
# 1260 

### visualize 
plot(us) 
# plot all quakes 
points(quakes$longitude, quakes$latitude) 
# plot just US 
points(usquakes$longitude, usquakes$latitude) 

產生的圖像:

US Geological Services detected earthquakes for one week; US quakes plotted in red

感謝@ hrbrmstr!

+0

美國本土或美國所有? – hrbrmstr

+0

@hrbrmstr:我會解決這個問題,但我猜你在問,因爲在美國不連續的情況下,多邊形數據結構是不同的?讓我們假裝我只關心大陸,儘管我現在使用的文件是全國人口普查的界限:https://www.census.gov/geo/maps-data/data/kml/kml_nation.html –

回答

4

這會做一個CPL方式:

library(sp) 
library(maptools) 
library(rgeos) # not entirely necessary 
library(rgdal) # not entirely necessary 

url <- "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv" 
fil <- "all_week.csv" 
if (!file.exists(fil)) download.file(url, fil) 

quakes <- read.csv("all_week.csv", stringsAsFactors=FALSE) 
coordinates(quakes) <- ~longitude+latitude 

url <- "http://www2.census.gov/geo/tiger/GENZ2014/shp/cb_2014_us_nation_20m.zip" 
fil <- "us.zip" 
if (!file.exists(fil)) download.file(url, fil) 
unzip(fil, exdir="us") 
us <- readShapePoly("us/cb_2014_us_nation_20m.shp") 

# alternatively 
# us <- rgdal::readOGR("us/cbcb_2014_us_nation_20m.shp", "cb_2014_us_nation_20m") 

# TRUE if in 
in_us_rgeos <- rgeos::gIntersects(quakes, us, byid=TRUE) 

# <NA> if in 
in_us_over <- quakes %over% us 

gIntersects需要更長的時間。 rgdalrgeos可以是一個在某些系統上工作的熊。因人而異

使用美國KML需要(大部分)rgdal

# you wanted KML shapefile tho 
url <- "http://www2.census.gov/geo/tiger/GENZ2014/kml/cb_2014_us_nation_20m.zip" 
fil <- "uskml.zip" 
if (!file.exists(fil)) download.file(url, fil) 
unzip(fil, exdir="uskml") 

us <- rgdal::readOGR("uskml/cb_2014_us_nation_20m.kml", "cb_2014_us_nation_20m") 
proj4string(quakes) <- proj4string(us) 
rgeos::gIntersects(quakes, us, byid=TRUE) 
head(quakes %over% us) 
+0

謝謝!...很快的問題......爲什麼'over'似乎比'gIntersects'快得多? –

+1

'rgeos'是Java代碼的C++端口(可能意味着它攜帶了一些Java代碼包),但'gIntersects'也更靈活(它可以處理所有'Spatial ...'類型)。對於特殊情況,'over'(對於'SpatialPolygons'中的'SpatialPoints'的情況)將調用_super_高效的C代碼(函數是'R_point_in_polygon_sp')。 – hrbrmstr