2017-09-01 117 views
0

我一直在運行,從以下優良的代碼...問題與R和谷歌地圖的地理編碼

https://www.shanelynn.ie/massive-geocoding-with-r-and-google-maps/

它就像一個夢想,但...隨機停止中間過程和拋出錯誤。這發生在使用相同數據集的不同點上。我已經採取了其中一個地址,拋出一個錯誤,並通過代碼手動運行它,它工作正常?我認爲這可能是導致此問題的服務器或超時問題。有沒有其他人使用這個代碼,並有你有類似的問題?你找到解決方案嗎?

錯誤總是看起來像......

contacting http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false...Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false 
Error in geo_reply$status : $ operator is invalid for atomic vectors 
In addition: Warning messages: 
1: In readLines(connect, warn = FALSE) : 
    cannot open URL 'http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false': HTTP status was '500 Internal Server Error' 
2: In geocode(address, output = "all", messaging = TRUE, override_limit = TRUE) : 
geocoding failed for "NICHOLS, ACT, 2613, AUSTRALIA". 
if accompanied by 500 Internal Server Error with using dsk, try google. 

我的地址是在像(約2000條記錄)數據表...

| MAIL_STATE | MAIL_SUBBURB | MAIL_POSTCODE | | ---------- | ------------ | ------------- | | ACT | NICHOLLS | 2613 |

地址是通過使用下面的代碼創建...

addresses = paste0(data$MAIL_SUBURB,", ",data$MAIL_STATE,", ",data$MAIL_POSTCODE,", AUSTRALIA", sep = "") 

完整的代碼,它利用addressses低於...

#define a function that will process googles server responses for us. 
getGeoDetails <- function(address){ 
#use the gecode function to query google servers 
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE) 
#now extract the bits that we need from the returned list 
answer <- data.frame(lat=NA, long=NA, accuracy=NA, formatted_address=NA, address_type=NA, status=NA) 
answer$status <- geo_reply$status 

#if we are over the query limit - want to pause for an hour 
while(geo_reply$status == "OVER_QUERY_LIMIT"){ 
print("OVER QUERY LIMIT - Pausing for 24 hours at:") 
time <- Sys.time() 
print(as.character(time)) 
Sys.sleep(60*60*24) 
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE) 
answer$status <- geo_reply$status 
} 

#return Na's if we didn't get a match: 
if (geo_reply$status != "OK"){ 
return(answer) 
} 
#else, extract what we need from the Google server reply into a dataframe: 
answer$lat <- geo_reply$results[[1]]$geometry$location$lat 
answer$long <- geo_reply$results[[1]]$geometry$location$lng 
if (length(geo_reply$results[[1]]$types) > 0){ 
answer$accuracy <- geo_reply$results[[1]]$types[[1]] 
} 
answer$address_type <- paste(geo_reply$results[[1]]$types, collapse=',') 
answer$formatted_address <- geo_reply$results[[1]]$formatted_address 

return(answer) 
} 

#initialise a dataframe to hold the results 
geocoded <- data.frame() 
# find out where to start in the address list (if the script was interrupted before): 
startindex <- 1 
#if a temp file exists - load it up and count the rows! 
tempfilename <- paste0(infile, '_temp_geocoded.rds') 
if (file.exists(tempfilename)){ 
print("Found temp file - resuming from index:") 
geocoded <- readRDS(tempfilename) 
startindex <- nrow(geocoded) 
print(startindex) 
} 



# Start the geocoding process - address by address. geocode() function takes care of query speed limit. 
for (ii in seq(startindex, length(addresses))){ 
print(paste("Working on index", ii, "of", length(addresses))) 
#query the google geocoder - this will pause here if we are over the limit. 
result = getGeoDetails(addresses[ii]) 
print(result$status)  
result$index <- ii 
#append the answer to the results file. 
geocoded <- rbind(geocoded, result) 
#save temporary results as we are going along 
saveRDS(geocoded, tempfilename) 
} 
+0

這是無關的代碼。我剛剛嘗試http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false,這很有效。我懷疑對谷歌服務器的限制(每秒/分鐘有限的電話號碼) –

+0

@EricLecoutre,謝謝。正如我所說的,這段代碼工作的很好......一直到它失敗的地步!失敗沒有模式。這是隨機的。有沒有一種方法可以在代碼中構建一個節流閥,以減慢每分鐘的請求數量,或者更可能成爲網絡問題,延遲接收結果? –

回答

0

就個人而言,我喜歡這個版本。

# Geocoding a csv column of "addresses" in R 

#load ggmap 
library(ggmap) 

# Select the file from the file chooser 
fileToLoad <- file.choose(new = TRUE) 

# Read in the CSV data and store it in a variable 
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE) 

# Initialize the data frame 
geocoded <- data.frame(stringsAsFactors = FALSE) 

# Loop through the addresses to get the latitude and longitude of each address and add it to the 
# origAddress data frame in new columns lat and lon 
for(i in 1:nrow(origAddress)) 
{ 
    # Print("Working...") 
    result <- geocode(origAddress$addresses[i], output = "latlona", source = "google") 
    origAddress$lon[i] <- as.numeric(result[1]) 
    origAddress$lat[i] <- as.numeric(result[2]) 
    origAddress$geoAddress[i] <- as.character(result[3]) 
} 
# Write a CSV file containing origAddress to the working directory 
write.csv(origAddress, "geocoded.csv", row.names=FALSE) 

enter image description here

+0

我喜歡你的方法的簡單性。也就是說,我必須使用的數據非常簡陋,並且其中有許多不正確的地址。其中一個強制循環失敗,如下所示... 。來自URL的信息:http://maps.googleapis.com/maps/api/geocode/json?address=CRAIGIE,%20ACT,%202632,%20AUSTRALIA&sensor =假 錯誤'[.data.frame'(結果,3):未定義的列選擇 此外:警告消息: 地址解析失敗,狀態ZERO_RESULTS,位置= 「克雷吉,ACT,2632,澳大利亞」 I」需要找出一種方法來處理這些錯誤,以使其可行。 –

+0

好吧,我不是任何方式的R專家,但我想你將不得不以某種方式處理錯誤,並清理數據集。也許這會有所幫助,至少對於設置一些嘗試...趕上塊.. https://www.r-bloggers.com/error-handling-in-r/ – ryguy72

+0

似乎很好... for(i in 1:nrow(origAddress)) { #打印(「正在工作...」) result < - geocode(origAddress $ addresses [i],output =「latlona」,source =「google」) if 。NA(結果$ LON)){ origAddress $ LON [I] < - 99 origAddress $ LAT [I] < - 99個 origAddress $ geoAddress [I] < - 「錯誤」 }否則{ 結果< - 地址解析( origAddAdd $ addresses [i],output =「latlona」,source =「google」) origAddress $ lon [i] < - as.numeric(result [1]) origAddress $ lat [i] < - as.numeric結果[2]) origAddress $ geoAddress [i] < - as.character(result [3]) } } –