2015-10-14 43 views
0

我想讀取整個文本文件,並獲取&保存整個第二XML在我的本地驅動器基於搜索輸入如何在包含許多xml文件的文本文件中識別xml以及使用Java中xml節點的其他文本?

三更雨

文本文件的內容:

<?xml version="1.0"?> 
<catalog> 
    <book id="bk101"> 
     <author>Gambardella, Matthew</author> 
     <title>XML Developer's Guide</title> 
     <genre>Computer</genre> 
     <price>44.95</price> 
     <publish_date>2000-10-01</publish_date> 
     <description>An in-depth look at creating applications 
     with XML.</description> 
    </book> 
</catalog> 
controllercmds.statusupdate 
ExtnClientExternalSrcProcess="9" 
<catalog> 
    <book id="bk102"> 
     <author>Ralls, Kim</author> 
     <title>Midnight Rain</title> 
     <genre>Fantasy</genre> 
     <price>5.95</price> 
     <publish_date>2000-12-16</publish_date> 
     <description>A former architect battles corporate zombies, 
     an evil sorceress, and her own childhood to become queen 
     of the world.</description> 
    </book> 
</catalog>' 

我的輸出應該是:

<catalog> 
    <book id="bk102"> 
     <author>Ralls, Kim</author> 
     <title>Midnight Rain</title> 
     <genre>Fantasy</genre> 
     <price>5.95</price> 
     <publish_date>2000-12-16</publish_date> 
     <description>A former architect battles corporate zombies, 
     an evil sorceress, and her own childhood to become queen 
     of the world.</description> 
    </book> 
</catalog> 

這是可行的嗎?有人可以幫助我

+0

問題尚不清楚..在編程你正在閱讀的XML內容> – Vishal

+0

請粘貼Java代碼以及與您已經嘗試.. – Vishal

+0

我試着用緩衝閱讀器的BufferedReader BR =新的BufferedReader(新的InputStreamReader( \t \t \t \t \t \t \t sftp.get(file.getFilename()))); \t \t \t \t \t嘗試{ \t \t \t \t \t \t \t而((行= br.readLine())!= NULL){ \t \t \t \t \t \t \t \t如果(line.contains(「<? xml version「){ \t \t \t \t \t \t \t \t \t sb。追加(線); \t \t \t \t \t \t \t \t} \t \t \t \t \t \t \t} \t \t \t \t \t \t的System.out.println(SB);但是識別和附加數據花費的時間太長。 –

回答

0

我認爲你應該提及你正在使用的編程語言,因此人們可以用代碼給你解決方案,現在我可以想到正則表達式只能是解決方案,你必須知道什麼會是你的代碼應該尋找的根標籤。像上面我可以看到的是根標籤。我會盡量在幾個小時內與代碼解決方案競爭。

下面的代碼工作在JDK 6,並應在以後的版本中工作,以及

String xml = "<?xml version=\"1.0\"?>" + 
"<catalog>" + 
"<book id=\"bk101\">" + 
    "<author>Gambardella, Matthew</author>" + 
    "<title>XML Developer's Guide</title>" + 
    "<genre>Computer</genre>" + 
    "<price>44.95</price>" + 
    "<publish_date>2000-10-01</publish_date>" + 
    "<description>An in-depth look at creating applications" + 
    "with XML.</description>" + 
"</book>" + 
"</catalog>" + 
"controllercmds.statusupdate" + 
"ExtnClientExternalSrcProcess=\"9\"" + 
"<catalog>" + 
"<book id=\"bk102\">" + 
    "<author>Ralls, Kim</author>" + 
    "<title>Midnight Rain</title>" + 
    "<genre>Fantasy</genre>" + 
    "<price>5.95</price>" + 
    "<publish_date>2000-12-16</publish_date>" + 
    "<description>A former architect battles corporate zombies," + 
    "an evil sorceress, and her own childhood to become queen " + 
    "of the world.</description>" + 
"</book>" + 
"</catalog>"; 

String regex = "(\\<catalog\\>.*?\\</catalog\\>)"; 

java.util.regex.Pattern pattern = java.util.regex.Pattern.compile(regex); 
java.util.regex.Matcher matcher = pattern.matcher(xml); 

while(matcher.find()) { 

    System.out.println("Groups: " + matcher.group(1)); 
} 

System.out.println("DONE"); 

輸出

Groups: <catalog><book id="bk101"><author>Gambardella, Matthew</author><title>XML Developer's Guide</title><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date><description>An in-depth look at creating applicationswith XML.</description></book></catalog> 
Groups: <catalog><book id="bk102"><author>Ralls, Kim</author><title>Midnight Rain</title><genre>Fantasy</genre><price>5.95</price><publish_date>2000-12-16</publish_date><description>A former architect battles corporate zombies,an evil sorceress, and her own childhood to become queen of the world.</description></book></catalog> 
DONE 

See your code running online here

+0

感謝您的回覆Mubashar。我正在嘗試使用Java語言來實現這一點。 –

+0

非常感謝您的時間Mubashar,但我也需要在18-20 MB左右的文本文件中搜索xml。在這裏我看到你正在通過XML直通車。那可能嗎 ? –

+0

使用此解決方案,您可以讀取文件並將內容傳遞給此功能。然而,20 MB是很大的,所以你可以分成幾塊。但是您必須確保每個塊至少包含一個有效的xml組。 –

0

在這將是困難的一般情況。但是,如果您知道輸入符合某些特定限制條件,則可能會更容易一些。例如,如果您知道XML片段將以<catalog>開頭並以</catalog>結尾,並且如果您高度確信這兩個字符串不會在其他任何地方出現,那麼使用正則表達式解析XML片段不應該太困難。所以我認爲答案很大程度上取決於您對約束條件的瞭解,以及您準備採用「意外」(或惡意地)出現在意想不到的地方的開始/結束標籤的風險程度。

相關問題