2013-08-06 72 views
2

我正在尋找動態修改非常大的XML文件的標籤的最佳方法。修改XML的標籤

考慮下面的輸入XML:

輸入

<?xml version="1.0" encoding="UTF-8"?> 
<rootTag> 
    <dictionary> 
     <name>field1</name> 
     <address>field2</address> 
     <gender>field3</gender> 
     . 
     . 
     <postcode>field30</postcode> 
    </dictionary> 
    <records> 
     <record> 
     <field id="field1">John</field> 
     <field id="field2">Svalbard</field> 
     <field id="field3">M</field> 
     . 
     . 
     <field id="field30">12345</field> 
     </record> 
     . 
     . 
     <record> 
     . 
     . 
     </record> 
    </records> 
</rootTag> 

XML文件包含在上面一本字典和記錄節點,其標籤被鏈接到字典中的一大塊。

我想將每個記錄節點內的標籤替換爲字典中相應的值。因此,輸出應該是這樣的:

輸出

<?xml version="1.0" encoding="UTF-8"?> 
<rootTag> 
    <records> 
     <record> 
     <name>John</name> 
     <address>Svalbard</address> 
     <gender>M</gender> 
     . 
     . 
     <postcode>12345</postcode> 
     </record> 
     . 
     . 
     <record> 
     . 
     . 
     </record> 
    </records> 
</rootTag> 

請記住,有一個非常大量<record>節點,什麼是實現Java中這種轉變的最好方法是什麼?

請注意,我只想更改標記而不是屬性。

+1

如果你說你想在一條很寬的河流上架起一座橋,我希望任何一個有能力的工程師問你有多寬?答案取決於:100Mb解決方案可能與10Gb解決方案不同。 –

+0

公平點。我正在討論大約200 MB大小的多個XML文件。 – zeiger

+0

200Mb可能在內存中處理,但它接近極限,所以如果有機會它會變得更大,你可能想考慮流技術。 –

回答

1

我同意@PeterJaloveczki的xslt可能的方式。以下可以使工作

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions"> 
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> 

    <xsl:template match="node()|@*"> 
     <xsl:copy> 
      <xsl:apply-templates select="node() | @*" /> 
     </xsl:copy> 
    </xsl:template> 

    <xsl:template match="dictionary" /> 

    <xsl:template match="field"> 
     <xsl:variable name="id" select="@id" /> 
     <xsl:variable name="tagName" select="/rootTag/dictionary/node()[. = $id]/name()" /> 

     <xsl:element name="{if ($tagName != '') then $tagName else 'field'}"> 
      <xsl:apply-templates select="node() | @*[name() != 'id']" /> 
     </xsl:element> 
    </xsl:template> 

</xsl:stylesheet> 

這是簡化了一些點,因爲XML示例也簡化了,但基本上它應該工作。

+0

謝謝你的代碼示例。 – zeiger

0

可能使用XSLT將是您最好的選擇。

0

我可能會使用一個SAX XML解析器,它將確保您不會一次加載整個DOM樹。

總之,首先填充一個字典,然後對每個標籤,在解析它們時逐個填寫,用任何字典包含的名稱替換它的名稱。

對如何處理SAX配對在Java中的一個例子: http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html

0

一種選擇是使用StAX的,它具有高性能,它處理XML作爲數據流無需加載在內存中的整個XML,並且很方便使用。

0

爲什麼不手動解析XML?

import java.io.BufferedReader; 
import java.io.ByteArrayInputStream; 
import java.io.StringReader; 
import java.util.HashMap; 
import java.util.Map; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

import javax.xml.parsers.DocumentBuilder; 
import javax.xml.parsers.DocumentBuilderFactory; 

import junit.framework.Assert; 

import org.junit.Test; 
import org.w3c.dom.Document; 
import org.w3c.dom.Element; 
import org.w3c.dom.Node; 
import org.w3c.dom.NodeList; 

public class ReplaceTextInXmlTest 
{ 
    @Test 
    public void test(
    ) { 
     try { 

     final String inputXml = new String(
      "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + 
      "<rootTag>\n" + 
      " <dictionary>\n" + 
      "  <name>field1</name>\n" + 
      "  <address>field2</address>\n" + 
      "  <gender>field3</gender>\n" + 
      " </dictionary>\n" + 
      " <records>\n" + 
      "  <record>\n" + 
      "   <field id=\"field1\">John</field>\n" + 
      "   <field id=\"field2\">Svalbard</field>\n" + 
      "   <field id=\"field3\">M</field>\n" + 
      "  </record>\n" + 
      "   <field id=\"field1\">Fritz</field>\n" + 
      "   <field id=\"field2\">Hamburg</field>\n" + 
      "   <field id=\"field3\">M</field>\n" + 
      "  </record>\n" + 
      " </records>\n" + 
      "</rootTag>" 
     ); 
     final Map<Integer, String> mapping = new HashMap<>(); 
     final int start = inputXml.indexOf("<dictionary>"); 
     final int end = inputXml.indexOf("</dictionary>", start) + 13; // "</dictionary>".length() = 13 
     final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); 
     final DocumentBuilder db = dbf.newDocumentBuilder(); 
     Document dom = null; 
     try (
      ByteArrayInputStream is = new ByteArrayInputStream(inputXml.substring(start, end).getBytes()); 
     ) { 
      dom = db.parse(is); 
     } 
     final Element root = dom.getDocumentElement(); 
     final NodeList nodes = root.getChildNodes(); 
     for(int i = 0, z = nodes.getLength(); i < z; ++i) { 
      final Node node = nodes.item(i); 
      final int type = node.getNodeType(); 
      if(type == 1) { 
       final String name = node.getNodeName(); 
       final String value = node.getTextContent(); 
       mapping.put(new Integer(Integer.parseInt(value.substring(5))), name); // "field".length() = 5 
      } 
     } 

     final Pattern fieldPattern = Pattern.compile("^(\\s*<)field id=\"field([0-9]+)\" (>[^<]*</)field(>\\s*)$"); 
     final StringBuilder outputXml = new StringBuilder(); 
     try (
      BufferedReader reader = new BufferedReader(new StringReader(inputXml)); 
     ) { 
      String line = null; 
      while ((line = reader.readLine()) != null) { 
       final Matcher match = fieldPattern.matcher(line); 
       if(match.find() == true) { 
        final int fieldId = Integer.parseInt(match.group(2)); 
        final String tagName = mapping.get(new Integer(fieldId)); 
        outputXml.append(match.group(1)); 
        outputXml.append(tagName); 
        outputXml.append(match.group(3)); 
        outputXml.append(tagName); 
        outputXml.append(match.group(4)); 
       } else { 
        outputXml.append(line); 
       } 
       outputXml.append('\n'); 
      } 
     } 

     final String expectedXml = new String(
      "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + 
      "<rootTag>\n" + 
      " <dictionary>\n" + 
      "  <name>field1</name>\n" + 
      "  <address>field2</address>\n" + 
      "  <gender>field3</gender>\n" + 
      " </dictionary>\n" + 
      " <records>\n" + 
      "  <record>\n" + 
      "   <name>John</name>\n" + 
      "   <address>Svalbard</address>\n" + 
      "   <gender>M</gender>\n" + 
      "  </record>\n" + 
      "   <name>Fritz</name>\n" + 
      "   <address>Hamburg</address>\n" + 
      "   <gender>M</gender>\n" + 
      "  </record>\n" + 
      " </records>\n" + 
      "</rootTag>\n" 
     ); 
     Assert.assertEquals(expectedXml, outputXml.toString()); 

     } catch (final Exception e) { 
     Assert.fail(e.getMessage()); 
     } 
    } 
}