2013-10-24 21 views
-1

我必須解析給定的HTML並修改其內容並保存修改後的版本。使用Java解析和修改HTML文件

我的HTML輸入:

<div> 
<div class="post-text"><p>@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only very slightly modified his method which replaces a node (a set of tags) with the data in the node plus whatever information you would like to add.</p> 

<p>To store a String in memory I used a static <code>StringBuilder</code> to save the HTML in memory. </p> 

<p>First we read in the HTML file (that is manually specified, this can be changed), then we make a series of checks to change whatever nodes with any data that we want.</p> 

<p>The one problem that I didn't fix in the solution by MarcoS was that it split each individual word, instead of looking at a line. However I just used '-' for multiple words, because otherwise it places the string directly after that word.</p> 

<p>So a full implementation: </p> 
</div> 
<div> 
<div class="post-text" itemprop="description"> 

     <p>Recently I was recommended to use JSoup to parse and modify HTML documents. </p> 

<p>However what if I have a HTML document that I want to modify (to send, store somewhere else, etc.), how might I go about doing that without changing the original document? </p> 
</div> 

我的問題是我必須找到在html「@MarcoS不得不使用NodeTraversor,使各節點列表的優秀的解決方案,在https://stackoverflow.com/a/6594828/1861357改變,我只」和在其周圍放置一個div tag(或其他)(不包括它的父標籤或整個段落)。 我搜索的文本之間會有html標籤。

我想這樣的輸出:

<div class="post-text"><p><div id="myDiv">@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only</div>......</div> 

正則表達式是唯一的解決方案,也可以任意HTML解析器能夠做到這一點?

回答

1

你可以嘗試使用正則表達式,如果你不希望使用一些XML解析器:

String xmlStr = "some_xml"; 
xmlStr = xml.replaceAll(">\\s+<", "><").trim();