2012-12-28 29 views
1

我使用Arabica封裝Xerces-c來解析XML。下面的代碼示例使用時.getNodeName()方法,但不正確的值使用.getNodeValue時返回正確的名稱()方法:C++ Arabica(通過Xerces-c)getNodeValue()方法不返回實際值

bool readXML(bfs::path xmlfullfile) 
{ 
    // first check to see if the file exists 
    if (!bfs::is_regular_file(xmlfullfile)) return false; 

    Arabica::SAX2DOM::Parser<std::string> domParser; 
    Arabica::SAX::CatchErrorHandler<std::string> eh; 
    Arabica::DOM::Document<std::string> xmlDoc; 
    Arabica::SAX::InputSource<std::string> is; 

    domParser.setErrorHandler(eh); 
    is.setSystemId(xmlfullfile.string()); 
    domParser.parse(is); 

    if(!eh.errorsReported()) 
    { 
    xmlDoc = domParser.getDocument(); 
    xmlDoc.normalize(); 

    Arabica::DOM::NodeList<string_type> objects = xmlDoc.getElementsByTagName("object"); 
    for (size_t t = 0; t < objects.getLength(); t++) 
    { 
     Arabica::DOM::Node<std::string> object = objects.item(t); 
     Arabica::DOM::NodeList<std::string> values = object.getChildNodes(); 
     for (size_t u = 0; u < values.getLength(); u++) 
     { 
     values.item(u).normalize(); 
     string name = values.item(u).getNodeName(); 
     string val = values.item(u).getNodeValue(); 
     cout << "Node streaming = \"" << values.item(u) << "\", meaning that name = \"" << name << "\" and value = \"" << val << "\"" << endl; 
     } 
    } 
    return true; 
    } else { 
    std::cerr << eh.errors() << std::endl; 
    eh.reset(); 
    return false; 
    } 
} 

我試圖解析XML示例是:

<annotation> 
    <filename>1a.jpg</filename> 
    <folder>Sample</folder> 
    <source> 
     <database>Some database</database> 
     <annotation>Annotator</annotation> 
     <image>Some source</image> 
    </source> 
    <size> 
     <width>3264</width> 
     <height>1840</height> 
     <depth>0</depth> 
    </size> 
    <segmented>0</segmented> 
    <object> 
     <name>somename</name> 
     <pose>Unspecified</pose> 
     <truncated>0</truncated> 
     <difficult>0</difficult> 
     <occluded>0</occluded> 
     <bndbox> 
      <xmin>48</xmin> 
      <ymin>671</ymin> 
      <xmax>3213</xmax> 
      <ymax>1616</ymax> 
     </bndbox> 
    </object> 
</annotation> 

輸出類似於此:

Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<name>somename</name>", meaning that name = "name" and value = "" 
Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<pose>Unspecified</pose>", meaning that name = "pose" and valu 
e = "" 
Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<truncated>0</truncated>", meaning that name = "truncated" and 
value = "" 
Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<difficult>0</difficult>", meaning that name = "difficult" and 
value = "" 
Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<occluded>0</occluded>", meaning that name = "occluded" and va 
lue = "" 
Node streaming = " 
       ", meaning that name = "#text" and value = " 
       " 
Node streaming = "<bndbox> 
         <xmin>48</xmin> 
         <ymin>671</ymin> 
         <xmax>3213</xmax> 
         <ymax>1616</ymax> 
       </bndbox>", meaning that name = "bndbox" and value = "" 
Node streaming = " 
     ", meaning that name = "#text" and value = " 
     " 

不太清楚我在做什麼錯。由於getNodeName()返回正確的名稱(當它不是#text當然),getNodeValue()不返回任何東西的事實讓我感到驚訝。

回答

0

我將代碼與其他一些XML庫進行比較後發現了一個解決方案。顯然,節點的值不是簡單的文本字段,必須讓該簡單葉節點的第一個子節點能夠訪問文本值。不知道我這樣做是最好的方法,但這裏的方法是在情況下別人的代碼有相同的問題:

for (size_t u = 0; u < values.getLength(); u++) 
{ 
    string name = values.item(u).getNodeName(); 
    if (name == "#text") continue; 
    string val = values.item(u).getFirstChild().getNodeValue(); 
    cout << "Node streaming = \"" << values.item(u) << "\", meaning that name = \"" << name << "\" and value = \"" << val << "\"" << endl; 
} 

注意:生產代碼應該考慮到一個事實,即並非所有的節點是簡單的葉節點。所以我的代碼只是解決方案的一半。

1

您還在計算僅限空格的文本節點。 在該位置添加不允許文本節點的DTD可能會有所幫助。 一個不驗證的解析器必須報告所有的空白節點, ,不允許對什麼是可忽略的和什麼不可以做出假設。底線,如果你想擺脫空白文本節點, 你將不得不在你的DOM程序中自己編程