在R中解析xml - 返回數據框對象

我已經成功獲取示例1 xml作爲R中的數據框對象，但遇到示例2的麻煩。有沒有人對R代碼將數據從mtcars.xml轉換爲數據框有所建議？在R中解析xml - 返回數據框對象

實施例1）

library(XML) 
# Save the URL of the xml file in a variable 

xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml" 

# Use the xmlTreePares-function to parse xml file directly from the web 

xmlfile <- xmlTreeParse(xml.url) 

# Use the xmlRoot-function to access the top node 
xmltop = xmlRoot(xmlfile) 
# have a look at the XML-code of the first subnodes: 
print(xmltop)[1:2] 


# To extract the XML-values from the document, use xmlSApply: 

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

示例2）

library(XML) 
# Save the URL of the xml file in a variable 

doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML")) 


xmlfile <- xmlTreeParse(doc) 

# Use the xmlRoot-function to access the top node 
xmltop = xmlRoot(xmlfile) 
# have a look at the XML-code of the first subnodes: 
print(xmltop)[1:2] 


# To extract the XML-values from the document, use xmlSApply: 

mtcarscat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

來源

2016-01-24 user5831311

對於第一個，'xmlToDataFrame（'http://www.w3schools.com/xml/plant_catalog.xml'）'一氣呵成。 – alistaire

嘗試xpathSApply：

library(XML) 

path <- system.file("exampleData", "mtcars.xml", package="XML") 
doc <- xmlTreeParse(path, useInternal = TRUE) 
root <- xmlRoot(doc) 

read.table(text = xpathSApply(root, "//record", xmlValue), 
      col.names = xpathSApply(root, "//variable", xmlValue))

，並提供：

mpg cyl disp hp drat wt qsec vs am gear carb 
1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 
2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 
3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 
4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 
... etc ...

來源

2016-01-24 11:33:45

下面是與xml2一個辦法：

library(xml2) 
library(purrr) 
library(dplyr) 

catalog_url <- "http://www.w3schools.com/xml/plant_catalog.xml" 
doc <- read_xml(catalog_url) 

# get all the "records" 
plants <- xml_find_all(doc, ".//PLANT") 

# get all the field names 
kids <- xml_name(xml_children(plants[1])) 

# make a data frame 
# - iterate over each record 
# - in each record grab each field 
# - turn each row into a data frame 
# - bind all the data frames together 

map_df(plants, function(plant) { 
    rbind_list(as.list(setNames(map_chr(kids, function(kid) { 
    xml_text(xml_find_one(plant, sprintf(".//%s", kid))) 
    }), kids))) 
}) 

## Source: local data frame [36 x 6] 
## 
##     COMMON    BOTANICAL ZONE  LIGHT PRICE AVAILABILITY 
##     (chr)     (chr) (chr)  (chr) (chr)  (chr) 
## 1   Bloodroot Sanguinaria canadensis  4 Mostly Shady $2.44  031599 
## 2   Columbine Aquilegia canadensis  3 Mostly Shady $9.37  030699 
## 3  Marsh Marigold  Caltha palustris  4 Mostly Sunny $6.81  051799 
## 4    Cowslip  Caltha palustris  4 Mostly Shady $9.90  030699 
## 5 Dutchman's-Breeches Dicentra cucullaria  3 Mostly Shady $6.44  012099 
## 6   Ginger, Wild  Asarum canadense  3 Mostly Shady $9.03  041899 
## 7    Hepatica  Hepatica americana  4 Mostly Shady $4.45  012699 
## 8   Liverleaf  Hepatica americana  4 Mostly Shady $3.99  010299 
## 9 Jack-In-The-Pulpit Arisaema triphyllum  4 Mostly Shady $3.23  020199 
## 10   Mayapple Podophyllum peltatum  3 Mostly Shady $2.98  060599 
## ..     ...     ... ...   ... ...   ...

它可以作出更穩健一點通過尋找所有可能的孩子取名字（一些「記錄」可能有更多或更少的孩子），但它足以讓這個例子。這樣做（按名稱獲取每個元素的值）確保它們以正確的順序返回（元素的順序不是保證）。

來源

2016-01-24 13:32:17 hrbrmstr

在R中解析xml - 返回數據框對象

回答

相關問題