2016-04-04 263 views
1

我試圖通過訪問Web服務和搜索郵政編碼來建立一個巴西地址的數據框。實際上,我可以接收一個單獨的結果並將其存儲在一個數據框中,但是當我嘗試搜索多個郵政編碼(例如向量中)時,我的數據框只保留最後一個元素。 有人可以幫我嗎?R.如何將循環(for)結果追加到數據框中?

請參見下面的代碼:

############### 
library(httr) 
library(RCurl) 
library(XML) 
library(dplyr) 
############### 

# ZIPs I want to search for: 
vectorzip <- c("71938360", "70673052", "71020510") 
j <- length(vectorzip) 

# loop: 
for(i in 1:j) { 

# Save the URL of the xml file in a variable: 
xml.url <- getURL(paste("http://cep.republicavirtual.com.br/web_cep.php?cep=",vectorzip[i], sep = ""), encoding = "ISO-8859-1") 
xml.url 

# Use the xmlTreeParse-function to parse xml file directly from the web: 
xmlfile <- xmlTreeParse(xml.url) 
xmlfile 
# the xml file is now saved as an object you can easily work with in R: 
class(xmlfile) 

# Use the xmlRoot-function to access the top node: 
xmltop = xmlRoot(xmlfile) 

# have a look at the XML-code of the first subnodes: 
print(xmltop) 

# To extract the XML-values from the document, use xmlSApply: 
zips <- xmlSApply(xmlfile, function(x) xmlSApply(x, xmlValue)) 
zips 
# Finally, get the data in a data-frame and have a look at the first rows and columns: 
zips <- NULL 
zips <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 

View(zips_df)} 
+1

什麼是zips < - NULL行爲zips_df定義的位置? –

+0

用rbind生長一個對象通常不是一個好主意。更好的方法是定義一個特定大小的空數據框(從而分配必要的內存),然後填充行。 – RHertel

回答

0

您希望:

一)定義zips_df
b)定義zips_df的循環之外。
c)不設置zips_df爲空內環路:)

############### 
library(httr) 
library(RCurl) 
library(XML) 
library(dplyr) 
############### 

# ZIPs I want to search for: 
vectorzip <- c("71938360", "70673052", "71020510") 
j <- length(vectorzip) 
zips_df <- data.frame() 

i<-1 
# loop: 
for(i in 1:j) { 

    # Save the URL of the xml file in a variable: 
    xml.url <- getURL(paste("http://cep.republicavirtual.com.br/web_cep.php?cep=",vectorzip[i], sep = ""), encoding = "ISO-8859-1") 
    xml.url 

    # Use the xmlTreeParse-function to parse xml file directly from the web: 
    xmlfile <- xmlTreeParse(xml.url) 
    xmlfile 
    # the xml file is now saved as an object you can easily work with in R: 
    class(xmlfile) 

    # Use the xmlRoot-function to access the top node: 
    xmltop = xmlRoot(xmlfile) 

    # have a look at the XML-code of the first subnodes: 
    print(xmltop) 

    # To extract the XML-values from the document, use xmlSApply: 
    zips <- xmlSApply(xmlfile, function(x) xmlSApply(x, xmlValue)) 
    zips 
    # Finally, get the data in a data-frame and have a look at the first rows and columns: 

    zips_df <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 
} 

    View(zips_df) 

你得到這樣的:

> zips_df 
    resultado.text  resultado_txt.text uf.text cidade.text   bairro.text tipo_logradouro.text logradouro.text 
1    1 sucesso - cep completo  DF Taguatinga Sul (Ãguas Claras)     Rua    09 
2    1 sucesso - cep completo  DF Cruzeiro  Setor Sudoeste    Quadra  300 Bloco O 
3    1 sucesso - cep completo  DF  Guará   Guará I    Quadra QI 11 Conjunto U 
+0

非常感謝Serban! –

0

請儘量提供一個最低工作的例子。你的例子有很多與你的實際問題無關的代碼行。如果您試圖刪除這些不必要的代碼,那麼在保存之前,您可能已經發現了zips <- NULL行擦除了zip文件的信息。其次,你引用了一個zips_df對象,但這不是在你的代碼中創建的。

要回答你的問題:

  • 添加一行創建zips_df爲空數據框對象啓動循環之前:

    vectorzip <- c("71938360", "70673052", "71020510") 
    j <- length(vectorzip) 
    zips_df <- data.frame() 
    
  • 刪除行,你擦除zips對象(zips <- NULL

  • 更改生長線zips_df d ata.frame完整的數據保存到data.frame對象,而不是臨時的「拉鍊」變量:

    zips <- rbind(zips_df, data.frame(t(zips),row.names=NULL)) 
    

我建議刪除「查看」線和檢測帶有印記的data.frame :

print(zips_df) 
resultado.text  resultado_txt.text uf.text cidade.text    bairro.text tipo_logradouro.text logradouro.text 
1    1 sucesso - cep completo  DF Taguatinga Sul (Ã\u0081guas Claras)     Rua    09 
2    1 sucesso - cep completo  DF Cruzeiro   Setor Sudoeste    Quadra  300 Bloco O 
3    1 sucesso - cep completo  DF  Guará     Guará I    Quadra QI 11 Conjunto U 
+0

非常感謝Andre。我感謝你的建議和你的回答! –

相關問題