如何解決由於DOCTYPE和DTD限制導致Groovy的XmlSlurper拒絕解析HTML？

我試圖複製HTML覆蓋率報表中的一個元素，因此覆蓋率總計出現在報表頂部以及底部。如何解決由於DOCTYPE和DTD限制導致Groovy的XmlSlurper拒絕解析HTML？

因此，HTML開始，我相信是合式：

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> 
    <head> 
    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> 
    <link rel="stylesheet" href=".resources/report.css" type="text/css" /> 
    <link rel="shortcut icon" href=".resources/report.gif" type="image/gif" /> 
    <title>Unified coverage</title> 
    <script type="text/javascript" src=".resources/sort.js"></script> 
    </head> 
    <body onload="initialSort(['breadcrumb', 'coveragetable'])">

Groovy的的XmlSlurper抱怨如下：

doc = new XmlSlurper(/* false, false, false */).parse("index.html") 
[Fatal Error] index.html:1:48: DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true. 
DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

啓用DOCTYPE：

doc = new XmlSlurper(false, false, true).parse("index.html") 
[Fatal Error] index.html:1:148: External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property. 
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property. 

doc = new XmlSlurper(false, true, true).parse("index.html") 
[Fatal Error] index.html:1:148: External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property. 
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property. 


doc = new XmlSlurper(true, true, true).parse("index.html") 
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property. 

doc = new XmlSlurper(true, false, true).parse("index.html") 
External DTD: Failed to read external DTD 'xhtml1-strict.dtd', because 'http' access is not allowed due to restriction set by the accessExternalDTD property.

所以我覺得我已經涵蓋了所有選項。必須有辦法讓這個工作，而不訴諸正則表達式和冒着託尼小馬的憤怒。

來源

2015-06-10 android.weasel

Tsk。

parser=new XmlSlurper() 
parser.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false) 
parser.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); 
parser.parse(it)

來源

2015-06-10 10:42:12

即使您的HTML也恰好是格式良好的XML，解析HTML的更通用的解決方案是使用真正的HTML解析器。我過去使用了TagSoup解析器，並且它很好地處理了真實世界的HTML。

TagSoup提供了一個解析器，它實現javax.xml.parsers.SAXParser接口，並且可以在構造函數中提供給XmlSlurper。例如：

@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1') 

import org.ccil.cowan.tagsoup.Parser 

def doc = new XmlSlurper(new Parser()).parse("index.html")

來源

2015-06-10 15:00:38 ataylor

如何解決由於DOCTYPE和DTD限制導致Groovy的XmlSlurper拒絕解析HTML？

回答

相關問題