2011-04-07 40 views
3

我正在使用XML包中的R來解析具有以下結構的XML文件。在R中選擇特定的XML節點?

<document id="Something" origId="Text"> 
    <sentence id="Something" origId="thisorig" text="Blah Blah."> 
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/> 
    </sentence> 
    <sentence id="Something" origId="thisorig" text="Blah Blah."> 
     </sentence> 
</document> 

我要選擇具有在其中一個變量</special>標籤的節點和節點,而不在其他變量</special>標籤。

是否有可能與R做任何指針/答案將是非常有益的。

+1

http://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package – Chase 2011-04-07 21:01:14

+0

@Chase:不,這不是我要找因爲,仍然存在這個問題。 – 2011-04-08 04:35:28

回答

4

我增加了幾個案例來測試例外:

<document id="Something" origId="Text"> 
    <sentence id="Something" origId="thisorig" text="Blah Blah."> 
    <special id="id.s0.i0" origId="1" e1="en1" e2="en2" type="" directed="True"/> 
    </sentence> 
    <sentence id="Else" origId="thatorig" text="Blu Blu."> 
     <special id="id.s0.i1" origId="1" e1="en1" e2="en2" type="" directed="True"/> 
    </sentence> 
    <sentence id="Something" origId="thisorig" text="Blah Blah."> 
     <notso id = "hallo" /> 
     </sentence> 
    <sentence id="Something no sentence" origId="thisOther" text="Blah Blah."> 
     </sentence> 
</document> 

library(XML) 
doc = xmlInternalTreeParse("sentence.xml") 
hasSentence = xpathApply(doc, "//sentence/special/..") 
xpathApply(doc, "/document/sentence[not(child::special)]") 
+0

非常感謝Dieter! – 2011-04-08 16:04:07

1

解析XML樹,使用XPath來指定節點的位置。

doc <- xmlTreeParse("test.xml", useInternalNodes = TRUE) 
special_nodes <- getNodeSet(doc, "/document//special") 
+0

如何用這種方法得到非特殊節點? – 2015-01-17 19:24:00