1
雖然網上刮我碰到下面的問題,對此我認爲有可能是一個更好的解決方案:rvest | Webscraping數據爲長格式
有這樣的數據:
dat <- data.frame(query = c("Washington, USA", "Frankfurt, Germany"))
query
1 Washington, USA
2 Frankfurt, Germany
我想查詢例如Google Maps Api並返回格式化的地址(es)。可能有多種格式。結果應該是以下幾點:
query formatted_address
1 Washington, USA Washington, DC, USA
2 Washington, USA Washington, UT, USA
3 Washington, USA Washington, VA 22747, USA
4 Washington, USA Washington, IA 52353, USA
5 Washington, USA Washington, GA 30673, USA
6 Washington, USA Washington, PA 15301, USA
7 Frankfurt, Germany Frankfurt, Germany
我現在做的是這樣的:
require(RCurl)
require(rvest)
require(magrittr)
build_url <- function(x, base_url = "https://maps.googleapis.com/maps/api/geocode/xml?address="){
paste0(base_url, RCurl::curlEscape(x))
}
l <- lapply(dat$query, function(q){
formatted_address <- q %>% build_url %>% read_xml %>% xml_nodes("formatted_address") %>% xml_text
data.frame(query = q, formatted_address)
})
do.call(rbind, l) # This can be done via data.table::rbindlist as well
有沒有更好的解決辦法?也許更多data.table
或dplyr
風格?
請包括'library' /'require'呼籲讓你的代碼可重複 – jangorecki
肯定。剛剛在data.frame創建時添加了'require'語句 – Rentrop
,除了'stringsAsFactors = FALSE'之外,您已經優化了這個完美的IMO。我建議在lappl中添加一個'sleep',並確保將呼叫數量限制爲2500或更少的IIRC([使用限制](https://developers.google.com/maps/documentation/business/articles/usage_limits)info)。 – hrbrmstr