HTTPBuilder - 如何獲取網頁的HTML內容？

我需要提取網頁的HTML 我使用HTTPuilder在Groovy中，做出如下得到：HTTPBuilder - 如何獲取網頁的HTML內容？

def http = new HTTPBuilder('http://www.google.com/search') 
http.request(Method.GET) { 
requestContentType = ContentType.HTML 
response.success = { resp, reader -> 
    println "resp: " + resp 
    println "READER: " + reader 
} 
response.failure = { resp, reader -> 
    println "Failure" 
} 
}

我得到的迴應，不包含相同的HTML，我可以看到，當我探討www.google.com/search的html資源。事實上，它既不是html，也不包含我可以在頁面的html源代碼中看到的相同信息。我試過設置不同的標題（例如，headers.Accept ='text/html，application/xhtml + xml，application/xml; q = 0.9,/; q = 0.8'，headers.Accept ='text/html'，seting用戶代理等），但結果是一樣的。如何使用http構建器獲取www.google.com/search（或任何網頁）的html？

來源

2011-07-25 NachoAsking

爲什麼使用httpBuilder？您可能會改用

def url = "http://www.google.com/".toURL() 

println url.text`

提取網頁

來源

2011-08-22 08:11:02

的內容由於httpbuilder將自動通過內容類型分析的結果。以獲取原始html，嘗試從實體獲取文本

def htmlResult = http.get(uri: url, contentType: TEXT){ resp-> 
    return resp.getEntity().getContent().getText() 
}

來源

2013-01-16 05:07:38

HTTPBuilder - 如何獲取網頁的HTML內容？

回答

相關問題