2012-09-01 74 views
1

每個人都知道我們可以使用DocumentTraversal的NodeIterator遍歷整個xml文檔。 我的應用程序需要一些額外的工作,所以我決定在java stack <>的支持下編寫我自己的XML遍歷程序。java中的XML文檔遍歷器

這裏是我的代碼(我不擅長編碼,所以代碼和邏輯可能看起來很亂)。

public class test 
{ 
    private static Stack<Node> gStack = new Stack<Node>(); 

    public static void main(String[] args) throws XPathExpressionException 
    { 
     String str = 
      "<section>" 
       + "<paragraph>This example combines regular wysiwyg editing of a document with very controlled editing of semantic rich content. The main content can be" 
       + "edited like you would in a normal word processor. Though the difference is that the content remains schema valid XML because Xopus will not allow you to perform actions" 
       + "on the document that would render it invalid.</paragraph>" 
       + "<paragraph>The table is an example of controlled style. The style of the table is controlled by three attributes:</paragraph>" 
       + "<unorderedlist>" 
       + "<item><paragraph><emphasis>alternaterowcolor</emphasis>, do all rows have the same color, or should the background color alternate?</paragraph></item>" 
       + "<item><paragraph><emphasis>border</emphasis>, a limited choice of border styles.</paragraph></item>" 
       + "<item><paragraph><emphasis>color</emphasis>, a limited choice of colors.</paragraph></item>" 
       + "</unorderedlist>" 
       + "<paragraph>You have quite some freedom to style the table, but you can't break the predefined style.</paragraph>" 
       + "</section>"; 

     Document domDoc = null; 
     try 
     { 
      DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); 
      DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); 
      ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes()); 
      domDoc = docBuilder.parse(bis); 
     } 
     catch (Exception e) 
     { 
      e.printStackTrace(); 
     } 

     Element root = null; 
     NodeList list = domDoc.getChildNodes(); 
     for (int i = 0; i < list.getLength(); i++) 
     { 
      if (list.item(i) instanceof Element) 
      { 
       root = (Element) list.item(i); 
       break; 
      } 
     } 

     NodeList nlist = root.getChildNodes(); 

     System.out.println("root = " + root.getNodeName() + " child count = " + nlist.getLength()); 
     domTraversor(root); 
    } 

    private static void domTraversor(Node node) 
    { 
     if (node.getNodeName().equals("#text")) 
     { 
      System.out.println("textElem = " + node.getTextContent()); 
      if (node.getNextSibling() != null) 
      { 
       gStack.push(node.getNextSibling()); 
       domTraversor(node.getNextSibling()); 
      } 
      else 
      { 
       if (node.getParentNode().getNextSibling() != null) 
        domTraversor(node.getParentNode().getNextSibling()); 
      } 
     } 
     else 
     { 
      if (node.getChildNodes().getLength() > 1) 
      { 
       gStack.push(node); 
       Node n = node.getFirstChild(); 
       if (n.getNodeName().equals("#text")) 
       { 
        System.out.println("textElem = " + n.getTextContent()); 
        if (n.getNextSibling() != null) 
        { 
         gStack.push(n.getNextSibling()); 
         domTraversor(n.getNextSibling()); 
        } 
       } 
       else 
       { 
        gStack.push(n); 
        domTraversor(n); 
       } 
      } 
      else if (node.getChildNodes().getLength() == 1) 
      { 
       Node fnode = node.getFirstChild(); 
       if (fnode.getChildNodes().getLength() > 1) 
       { 
        gStack.push(fnode); 
        domTraversor(fnode); 
       } 
       else 
       { 
        if (!fnode.getNodeName().equals("#text")) 
        { 
         gStack.push(fnode); 
         domTraversor(fnode); 
        } 
        else 
        { 
         System.out.println("textElem = " + fnode.getTextContent()); 
         if (fnode.getNodeName().equals("#text")) 
         { 
          if (node.getNextSibling() != null) 
          { 
           gStack.push(node.getNextSibling()); 
           domTraversor(node.getNextSibling()); 
          } 
          else 
          { 
           if (!gStack.empty()) 
           { 
            Node sibPn = gStack.pop(); 
            if (sibPn.getNextSibling() == null) 
            { 
             sibPn = gStack.pop(); 
            } 
            domTraversor(sibPn.getNextSibling()); 
           } 
          } 
         } 
         else 
         { 
          if (fnode.getNextSibling() != null) 
          { 
           domTraversor(fnode.getNextSibling()); 
          } 
          else 
          { 
           if (!gStack.empty()) 
           { 
            Node sibPn = gStack.pop().getNextSibling(); 
            domTraversor(sibPn); 
           } 
          } 
         } 
        } 
       } 
      } 
     } 
    } 
} 

它和一些xml文檔一起工作正常,但沒有帶有標籤的文檔。

<unorderedlist> 
    <item> 
     <paragraph> 
      <emphasis>alternaterowcolor</emphasis> 
      , do all rows have the same color, or should the background 
      color 
      alternate? 
     </paragraph> 
    </item> 
    <item> 
     <paragraph> 
      <emphasis>border</emphasis> 
      , a limited choice of border styles. 
     </paragraph> 
    </item> 
    <item> 
     <paragraph> 
      <emphasis>color</emphasis> 
      , a limited choice of colors. 
     </paragraph> 
    </item> 
</unorderedlist> 

這裏是如果任何元素有三個以上嵌套子代的場景,我的代碼會停止並且不會繼續。

有沒有人有更好的實施,請提出建議。

+1

不知道你打算做什麼。 XML文檔將被視爲一個樹形結構。要遍歷它們,你需要使用樹遍歷機制,它基本上是一個遞歸函數,像遍歷(節點){對於節點{process(child)中的每個子節點;遍歷(子);}}你的線性traveral不會遍歷樹。 –

+0

爲什麼不使用JAXB解析:http://wiki.processing.org/w/XML_parsing_with_JAXB? – Vikdor

+0

@ d-live你明白了,我需要線性遍歷整個文檔並收集一些元數據,所以我不能使用任何庫。 – Sark

回答

1

嘗試這種方式

Element e; 
NodeList n; 
Document doc=StudyParser.XMLfromString(xmlString); 
String starttag=doc.getFirstChild().getNodeName(); 
    Log.e("start",starttag); 
    n=doc.getElementsByTagName(starttag); 
    for(int i=0;i<n.getLength();i++){ 
     e=(Element)n.item(i); 
     NodeList np = e.getElementsByTagName("item"); 
     for(int j=0;j<np.getLength();j++){ 
      e=(Element)n.item(i); 
      try{ 
      String para=StudyParser.getValue(e, "paragraph"); 
      Log.e("paravalue",para); 
      String emp=StudyParser.getValue(e, "emphasis");  
      Log.e("empval",emp); 
      }catch(Exception e){ 
       e.printStackTrace(); 
      } 
     } 
    } 

StudyParser類

import java.io.BufferedInputStream; 
    import java.io.ByteArrayOutputStream; 
    import java.io.IOException; 
    import java.io.InputStream; 
    import java.io.InputStreamReader; 
    import java.io.StringReader; 
    import java.io.UnsupportedEncodingException; 
    import java.net.MalformedURLException; 

    import javax.xml.parsers.DocumentBuilder; 
    import javax.xml.parsers.DocumentBuilderFactory; 
    import javax.xml.parsers.ParserConfigurationException; 

    import org.apache.http.HttpEntity; 
    import org.apache.http.HttpResponse; 
    import org.apache.http.client.methods.HttpPost; 
    import org.apache.http.impl.client.DefaultHttpClient; 
    import org.apache.http.util.EntityUtils; 
    import org.w3c.dom.Document; 
    import org.w3c.dom.Element; 
    import org.w3c.dom.Node; 
    import org.w3c.dom.NodeList; 
    import org.xml.sax.InputSource; 
    import org.xml.sax.SAXException; 



    public class StudyParser { 
    public StudyParser() { 

    } 

public final static Document XMLfromString(String xml){ 
    Document doc = null; 

     DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
      try { 

      DocumentBuilder db = dbf.newDocumentBuilder(); 

      InputSource is = new InputSource(); 
       is.setCharacterStream(new StringReader(xml)); 
       doc = db.parse(is); 

     } catch (ParserConfigurationException e) { 
      System.out.println("XML parse error: " + e.getMessage()); 
      return null; 
     } catch (SAXException e) { 
      System.out.println("Wrong XML file structure: " + e.getMessage()); 
       return null; 
     } catch (IOException e) { 
      System.out.println("I/O exeption: " + e.getMessage()); 
      return null; 
     } 

      return doc; 

    } 
public static String getXMLstring(String xml){ 
     String line = null; 

     try { 

     DefaultHttpClient httpClient = new DefaultHttpClient(); 
     HttpPost httpPost = new HttpPost(xml); 

     HttpResponse httpResponse = httpClient.execute(httpPost); 
     HttpEntity httpEntity = httpResponse.getEntity(); 
     line = EntityUtils.toString(httpEntity); 

     } catch (UnsupportedEncodingException e) { 
     line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } catch (MalformedURLException e) { 
     line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } catch (IOException e) { 
     line = "<results status=\"error\"><msg>Can't connect to server</msg></results>"; 
     } 

     return line; 

    } 
public static String getXML(InputStream is)throws IOException { 

    BufferedInputStream bis = new BufferedInputStream(is); 
    ByteArrayOutputStream buf = new ByteArrayOutputStream(); 
    int result = bis.read(); 
    while(result != -1) { 
     byte b = (byte)result; 
     buf.write(b); 
     result = bis.read(); 
    }   
    return buf.toString(); 
} 
public final static String getElementValue(Node elem) { 
     Node kid; 
     if(elem != null){ 
      if (elem.hasChildNodes()){ 
       for(kid = elem.getFirstChild(); kid != null; kid = kid.getNextSibling()){ 
        if(kid.getNodeType() == Node.TEXT_NODE ){ 
         return kid.getNodeValue(); 
        } 

       } 
      } 
     } 
     return ""; 
    } 
public static int numResults(Document doc){  
     Node results = doc.getDocumentElement(); 
     int res = -1; 

     try{ 
      res = Integer.valueOf(results.getAttributes().getNamedItem("Categories").getNodeValue()); 
     }catch(Exception e){ 
      res = -1; 
     } 

     return res; 
     } 

     public static String getValue(Element item, String str) {  
     NodeList n = item.getElementsByTagName(str);  
     return StudyParser.getElementValue(n.item(0)); 
     } 


} 

動態XML只是正常的演示中,我承擔了相同的XML和,但沒有使用getElementByTagName有很多,你可以將屬性點擊查看

doc = StudyParser.XMLfromString(xml); 
    String starttag=doc.getFirstChild().getNodeName(); 
    Log.e("start",starttag); 
    n=doc.getElementsByTagName(starttag); 
    for(int i=0;i<n.getLength();i++){ 
     e=(Element)n.item(i); 
      try{ 
      Log.e("1234",""+ e.getTextContent()); 

      }catch(Exception e){ 
       e.printStackTrace(); 
      } 

    } 
+0

我已經在日誌打印值檢查它的日誌也 – Khan

+0

謝謝你的解決方案,正如我的評論之一所述,xml是動態的,它可能不包含xsd或dtd,所以我不知道文件的結構。在這種情況下,我不能使用getElementsByTagName(「」)函數。 – Sark

+0

每次xml都會改變或不改變動態xml首先你必須讀取標籤名稱比嘗試這種方式只是使用的概念,如我有用戶在第一行獲取starttag的名稱在該基地上,你可以解析 – Khan