用R的xmlEventParse存儲特定的XML節點值

我有一個很大的XML文件，我需要用xmlEventParse in R解析。不幸的是，在線示例比我需要的更復雜，我只想標記一個匹配的節點標記來存儲匹配的節點文本（不屬性），每個文本在單獨的列表中，請參閱下面的代碼中的註釋：用R的xmlEventParse存儲特定的XML節點值

library(XML) 
z <- xmlEventParse(
    "my.xml", 
    handlers = list(
     startDocument = function() 
     { 
       cat("Starting document\n") 
     }, 
     startElement = function(name,attr) 
     { 
       if (name == "myNodeToMatch1"){ 
        cat("FLAG Matched element 1\n") 
       } 
       if (name == "myNodeToMatch2"){ 
        cat("FLAG Matched element 2\n") 
       } 
     }, 
     text   = function(text) { 
       if (# Matched element 1 ....) 
        # Store text in element 1 list 
       if (# Matched element 2 ....) 
        # Store text in element 2 list 
     }, 
     endDocument  = function() 
     { 
       cat("ending document\n") 
     } 
    ), 
    addContext = FALSE, 
    useTagName = FALSE, 
    ignoreBlanks = TRUE, 
    trim = TRUE) 
z$ ... # show lists ??

我的問題是，如何在R（以專業的方式:)實現此標誌？ Plus：什麼是評估N個任意節點匹配的最佳選擇... if name =「myNodeToMatchN」...避免大小寫匹配的節點？

my.xml可能只是一個天真的XML像

<A> 
    <myNodeToMatch1>Text in NodeToMatch1</myNodeToMatch1> 
    <B> 
    <myNodeToMatch2>Text in NodeToMatch2</myNodeToMatch2> 
    ... 
    </B> 
</A>

來源

2011-09-24 Veronica

如果我們有方便的「my.xml」來嘗試一些事情，那會很好。 –

我會用fileName從example(xmlEventParse)作爲一個重複的例子。它的標籤record有一個屬性id和我們想要提取的文本。我不會使用handler，我會追隨branches的說法。這就像一個處理程序，但可以訪問整個節點而不僅僅是元素。我們的想法是編寫一個閉包，它有一個地方可以保存我們積累的數據，還有一個函數可以處理我們感興趣的XML文檔的每個分支。因此，讓我們從定義閉包開始 - 爲了我們的目的，一個函數返回的功能

ourBranches <- function() {

我們需要存儲，我們積累的結果的場所列表，選擇一個環境，使得插入時間常數（不是列表，我們將不得不追加到並會記憶效率低下）

store <- new.env()

事件解析器期望在發現匹配標記時調用函數列表。我們感興趣的是record標籤。我們編寫的函數將接收XML文檔的節點。我們想提取一個元素id，我們將使用該元素將（文本）值存儲在節點中。我們將這些添加到我們的商店。

record <- function(x, ...) { 
     key <- xmlAttrs(x)[["id"]] 
     value <- xmlValue(x) 
     store[[key]] <- value 
    }

一旦文檔被處理，我們想要一個方便的方式來獲取我們的結果，所以我們在文檔中添加爲自己的目的的功能，獨立節點的

getStore <- function() as.list(store)

，然後通過返回的功能列表完成關閉

list(record=record, getStore=getStore) 
}

這裏的一個棘手的概念是在一個函數定義的環境是功能的一部分，所以每次我們說ourBranches()我們獲得了功能列表和新環境store以保持我們的結果。要使用，請在我們的文件上調用xmlEventParse，並使用一組空的事件處理程序，並訪問我們累積的商店。

> branches <- ourBranches() 
> xmlEventParse(fileName, list(), branches=branches) 
list() 
> head(branches$getStore(), 2) 
$`Hornet Sportabout` 
[1] "18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 " 

$`Toyota Corolla` 
[1] "33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 "

來源

2011-09-25 18:00:17

對於其他人誰可能會嘗試從M.Morgan李爾 - 這裏是完整的代碼

fileName = system.file("exampleData", "mtcars.xml", package = "XML") 

ourBranches <- function() { 
    store <- new.env() 
    record <- function(x, ...) { 
    key <- xmlAttrs(x)[["id"]] 
    value <- xmlValue(x) 
    store[[key]] <- value 
    } 
    getStore <- function() as.list(store) 
    list(record=record, getStore=getStore) 
} 

branches <- ourBranches() 
xmlEventParse(fileName, list(), branches=branches) 
head(branches$getStore(), 2)

來源

2014-07-11 15:21:20 userJT

分支方法不保留事件的順序。換句話說，分支$ getStore（）存儲中'record'的順序與原始xml文件中的不同。另一方面，處理程序方法可以保持順序。這裏是代碼：

fileName <- system.file("exampleData", "mtcars.xml", package="XML") 
records <- new('list') 
variable <- new('character') 
tag.open <- new('character') 
nvar <- 0 
xmlEventParse(fileName, list(startElement = function (name, attrs) { 
    tagName <<- name 
    tag.open <<- c(name, tag.open) 
    if (length(attrs)) { 
    attributes(tagName) <<- as.list(attrs) 
    } 
}, text = function (x) { 
    if (nchar(x) > 0) { 
    if (tagName == "record") { 
     record <- list() 
     record[[attributes(tagName)$id]] <- x 
     records <<- c(records, record) 
    } else { 
     if(tagName == 'variable') { 
     v <- x 
     variable <<- c(variable, v) 
     nvar <<- nvar + 1 
     } 
    } 
    } 
}, endElement = function (name) { 
    if(name == 'record') { 
    print(paste(tag.open, collapse='>')) 
    } 
    tag.open <<- tag.open[-1] 
})) 

head(records,2) 
$``Mazda RX4`` 
[1] "21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4" 

$`Mazda RX4 Wag` 
[1] "21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4" 

variable 
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"

使用處理程序的另一個好處是可以捕獲分層結構。換句話說，也有可能拯救祖先。該過程的關鍵之一是使用全局變量，可以使用「< < - 」替代「< - 」。

來源

2016-11-03 10:34:39

用R的xmlEventParse存儲特定的XML節點值

回答

相關問題