我正在使用Java XPath API從xhtml文件提取內容。我正在通過html並試圖提取特定內容。包含文本和少數內。當我使用XPath時,奇怪的是,它忽略了所有的html標籤並僅提取文本內容。這是一個html代碼片段。Java XPath API提取選擇性文本
<html>
<body>
<div class="content">
<div class="content_wrapper">
<table border="0" cellspacing="0" cellpadding="0" class="test_class">
<tr>
<td>
<p>
Reading and looking at images or movies is one thing. Experiencing it in 3D the other. If you like to figure out more about what Showcase is, I would really encourage you to
download Showcase Viewer and have a look at the demo files also available on this site. Interact with the models and see how real it looks.
</p>
<p style="text-align: center;">
<img src="/testsource/fckdata/208123/image/showcarswatch.jpg" alt="" />
<img src="/testsource/fckdata/208123/image/engineswatch.jpg" alt="" />
<img src="/th.gen/?:760x0:/userdata/fckdata/208123/image/toasterswatch.jpg" alt="" />
<img src="/testsource/fckdata/208123/image/smartphoneswatch.jpg" alt="" />
</p>
<p>
<br />
Showcase Viewer is actually a full Showcase install, except data processing and creation tools. This means that you can look at any data created with a regular Showcase you
just can´t add any information. But you may join a collaboration session hosed by a Showcase Professional user. Here is where you can get it:<br />
</p>
<p>
<strong>Operating System</strong><br />
• Microsoft® Windows® XP Professional (SP 2 or higher)<br />
• Windows XP Professional x64 Edition (Autodesk® Showcase® software runs as a 32-bit application on 64-bit operating system)<br />
• Microsoft Windows Vista® 32-bit or 64-bit, including Business, Enterprise or Ultimate (SP 1)
</p>
</td>
</tr>
</table>
</div>
</div>
</body>
</html>
現在,這裏是我使用的代碼。我需要在使用xpath之前做一些清理。
這裏是輸出。
Reading and looking at images or movies is one thing. Experiencing it in 3D the other. If you like to figure out more about what Showcase is, I would really encourage you to
download Showcase Viewer and have a look at the demo files also available on this site. Interact with the models and see how real it looks.
Showcase Viewer is actually a full Showcase install, except data processing and creation tools. This means that you can look at any data created with a regular Showcase you
just can´t add any information. But you may join a collaboration session hosed by a Showcase Professional user. Here is where you can get it
Operating System
• Microsoft® Windows® XP Professional (SP 2 or higher)<br />
• Windows XP Professional x64 Edition (Autodesk® Showcase® software runs as a 32-bit application on 64-bit operating system)<br />
• Microsoft Windows Vista® 32-bit or 64-bit, including Business, Enterprise or Ultimate (SP 1)
我需要的只是content_wrapper div中的完整內容。
任何指針將不勝感激。
- 由於
EDIT響應於揚堡溶液
示例代碼。
XPathFactory factory = XPathFactory.newInstance();
XPath xpathCompiled = factory.newXPath();
XPathExpression expr = xpathCompiled.compile(contentPath);
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Node n = (Node)nodes.item(i);
traverseNodes(n);
}
public static void traverseNodes(Node n) {
NodeList children = n.getChildNodes();
if(children != null) {
for(int i = 0; i > children.getLength(); i++) {
Node childNode = children.item(i);
System.out.println("node name = " + childNode.getNodeName());
System.out.println("node value = " + childNode.getNodeValue());
System.out.println("node type = " + childNode.getNodeType());
traverseNodes(childNode);
}
}
}
這不是關於XPath表達式,而是關於XPath結果的DOM方法。重新標記。 – 2011-04-28 00:00:04