如何在包含許多xml文件的文本文件中識別xml以及使用Java中xml節點的其他文本？

我想讀取整個文本文件，並獲取&保存整個第二XML在我的本地驅動器基於搜索輸入如何在包含許多xml文件的文本文件中識別xml以及使用Java中xml節點的其他文本？

三更雨

文本文件的內容：

<?xml version="1.0"?> 
<catalog> 
    <book id="bk101"> 
     <author>Gambardella, Matthew</author> 
     <title>XML Developer's Guide</title> 
     <genre>Computer</genre> 
     <price>44.95</price> 
     <publish_date>2000-10-01</publish_date> 
     <description>An in-depth look at creating applications 
     with XML.</description> 
    </book> 
</catalog> 
controllercmds.statusupdate 
ExtnClientExternalSrcProcess="9" 
<catalog> 
    <book id="bk102"> 
     <author>Ralls, Kim</author> 
     <title>Midnight Rain</title> 
     <genre>Fantasy</genre> 
     <price>5.95</price> 
     <publish_date>2000-12-16</publish_date> 
     <description>A former architect battles corporate zombies, 
     an evil sorceress, and her own childhood to become queen 
     of the world.</description> 
    </book> 
</catalog>'

我的輸出應該是：

<catalog> 
    <book id="bk102"> 
     <author>Ralls, Kim</author> 
     <title>Midnight Rain</title> 
     <genre>Fantasy</genre> 
     <price>5.95</price> 
     <publish_date>2000-12-16</publish_date> 
     <description>A former architect battles corporate zombies, 
     an evil sorceress, and her own childhood to become queen 
     of the world.</description> 
    </book> 
</catalog>

這是可行的嗎？有人可以幫助我

來源

2015-10-14 Java_Novice

問題尚不清楚..在編程你正在閱讀的XML內容> – Vishal

請粘貼Java代碼以及與您已經嘗試.. – Vishal

我試着用緩衝閱讀器的BufferedReader BR =新的BufferedReader（新的InputStreamReader（ \t \t \t \t \t \t \t sftp.get（file.getFilename（））））; \t \t \t \t \t嘗試{ \t \t \t \t \t \t \t而（（行= br.readLine（））！= NULL）{ \t \t \t \t \t \t \t \t如果（line.contains（「<？ xml version「）{ \t \t \t \t \t \t \t \t \t sb。追加（線）; \t \t \t \t \t \t \t \t} \t \t \t \t \t \t \t} \t \t \t \t \t \t的System.out.println（SB）;但是識別和附加數據花費的時間太長。 –

我認爲你應該提及你正在使用的編程語言，因此人們可以用代碼給你解決方案，現在我可以想到正則表達式只能是解決方案，你必須知道什麼會是你的代碼應該尋找的根標籤。像上面我可以看到的是根標籤。我會盡量在幾個小時內與代碼解決方案競爭。

下面的代碼工作在JDK 6，並應在以後的版本中工作，以及

String xml = "<?xml version=\"1.0\"?>" + 
"<catalog>" + 
"<book id=\"bk101\">" + 
    "<author>Gambardella, Matthew</author>" + 
    "<title>XML Developer's Guide</title>" + 
    "<genre>Computer</genre>" + 
    "<price>44.95</price>" + 
    "<publish_date>2000-10-01</publish_date>" + 
    "<description>An in-depth look at creating applications" + 
    "with XML.</description>" + 
"</book>" + 
"</catalog>" + 
"controllercmds.statusupdate" + 
"ExtnClientExternalSrcProcess=\"9\"" + 
"<catalog>" + 
"<book id=\"bk102\">" + 
    "<author>Ralls, Kim</author>" + 
    "<title>Midnight Rain</title>" + 
    "<genre>Fantasy</genre>" + 
    "<price>5.95</price>" + 
    "<publish_date>2000-12-16</publish_date>" + 
    "<description>A former architect battles corporate zombies," + 
    "an evil sorceress, and her own childhood to become queen " + 
    "of the world.</description>" + 
"</book>" + 
"</catalog>"; 

String regex = "(\\<catalog\\>.*?\\</catalog\\>)"; 

java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); 
java.util.regex.Matcher matcher = pattern.matcher(xml); 

while(matcher.find()) { 

    System.out.println("Groups: " + matcher.group(1)); 
} 

System.out.println("DONE");

輸出

Groups: <catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applicationswith XML.</description></book></catalog> 
Groups: <catalog><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queen of the world.</description></book></catalog> 
DONE

See your code running online here

來源

2015-10-14 04:25:24

感謝您的回覆Mubashar。我正在嘗試使用Java語言來實現這一點。 –

非常感謝您的時間Mubashar，但我也需要在18-20 MB左右的文本文件中搜索xml。在這裏我看到你正在通過XML直通車。那可能嗎？ –

使用此解決方案，您可以讀取文件並將內容傳遞給此功能。然而，20 MB是很大的，所以你可以分成幾塊。但是您必須確保每個塊至少包含一個有效的xml組。 –

在這將是困難的一般情況。但是，如果您知道輸入符合某些特定限制條件，則可能會更容易一些。例如，如果您知道XML片段將以<catalog>開頭並以</catalog>結尾，並且如果您高度確信這兩個字符串不會在其他任何地方出現，那麼使用正則表達式解析XML片段不應該太困難。所以我認爲答案很大程度上取決於您對約束條件的瞭解，以及您準備採用「意外」（或惡意地）出現在意想不到的地方的開始/結束標籤的風險程度。

來源

2015-10-14 07:48:16

如何在包含許多xml文件的文本文件中識別xml以及使用Java中xml節點的其他文本？

回答

相關問題