我一直在運行,從以下優良的代碼...問題與R和谷歌地圖的地理編碼
https://www.shanelynn.ie/massive-geocoding-with-r-and-google-maps/
它就像一個夢想,但...隨機停止中間過程和拋出錯誤。這發生在使用相同數據集的不同點上。我已經採取了其中一個地址,拋出一個錯誤,並通過代碼手動運行它,它工作正常?我認爲這可能是導致此問題的服務器或超時問題。有沒有其他人使用這個代碼,並有你有類似的問題?你找到解決方案嗎?
錯誤總是看起來像......
contacting http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false...Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false
Error in geo_reply$status : $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In readLines(connect, warn = FALSE) :
cannot open URL 'http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false': HTTP status was '500 Internal Server Error'
2: In geocode(address, output = "all", messaging = TRUE, override_limit = TRUE) :
geocoding failed for "NICHOLS, ACT, 2613, AUSTRALIA".
if accompanied by 500 Internal Server Error with using dsk, try google.
我的地址是在像(約2000條記錄)數據表...
| MAIL_STATE | MAIL_SUBBURB | MAIL_POSTCODE | | ---------- | ------------ | ------------- | | ACT | NICHOLLS | 2613 |
地址是通過使用下面的代碼創建...
addresses = paste0(data$MAIL_SUBURB,", ",data$MAIL_STATE,", ",data$MAIL_POSTCODE,", AUSTRALIA", sep = "")
完整的代碼,它利用addressses低於...
#define a function that will process googles server responses for us.
getGeoDetails <- function(address){
#use the gecode function to query google servers
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE)
#now extract the bits that we need from the returned list
answer <- data.frame(lat=NA, long=NA, accuracy=NA, formatted_address=NA, address_type=NA, status=NA)
answer$status <- geo_reply$status
#if we are over the query limit - want to pause for an hour
while(geo_reply$status == "OVER_QUERY_LIMIT"){
print("OVER QUERY LIMIT - Pausing for 24 hours at:")
time <- Sys.time()
print(as.character(time))
Sys.sleep(60*60*24)
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE)
answer$status <- geo_reply$status
}
#return Na's if we didn't get a match:
if (geo_reply$status != "OK"){
return(answer)
}
#else, extract what we need from the Google server reply into a dataframe:
answer$lat <- geo_reply$results[[1]]$geometry$location$lat
answer$long <- geo_reply$results[[1]]$geometry$location$lng
if (length(geo_reply$results[[1]]$types) > 0){
answer$accuracy <- geo_reply$results[[1]]$types[[1]]
}
answer$address_type <- paste(geo_reply$results[[1]]$types, collapse=',')
answer$formatted_address <- geo_reply$results[[1]]$formatted_address
return(answer)
}
#initialise a dataframe to hold the results
geocoded <- data.frame()
# find out where to start in the address list (if the script was interrupted before):
startindex <- 1
#if a temp file exists - load it up and count the rows!
tempfilename <- paste0(infile, '_temp_geocoded.rds')
if (file.exists(tempfilename)){
print("Found temp file - resuming from index:")
geocoded <- readRDS(tempfilename)
startindex <- nrow(geocoded)
print(startindex)
}
# Start the geocoding process - address by address. geocode() function takes care of query speed limit.
for (ii in seq(startindex, length(addresses))){
print(paste("Working on index", ii, "of", length(addresses)))
#query the google geocoder - this will pause here if we are over the limit.
result = getGeoDetails(addresses[ii])
print(result$status)
result$index <- ii
#append the answer to the results file.
geocoded <- rbind(geocoded, result)
#save temporary results as we are going along
saveRDS(geocoded, tempfilename)
}
這是無關的代碼。我剛剛嘗試http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false,這很有效。我懷疑對谷歌服務器的限制(每秒/分鐘有限的電話號碼) –
@EricLecoutre,謝謝。正如我所說的,這段代碼工作的很好......一直到它失敗的地步!失敗沒有模式。這是隨機的。有沒有一種方法可以在代碼中構建一個節流閥,以減慢每分鐘的請求數量,或者更可能成爲網絡問題,延遲接收結果? –