2017-01-31 75 views
0

我開始在groovy中編寫一些腳本。我編寫了這個腳本,它基本上解析了一個html頁面,並對這些數據做了一些處理。IllegalAccessException試圖訪問StringHashMap - Groovy

現在,我使用HTTPBuilder來執行http請求。每當我試圖執行這種要求,我得到這個錯誤:

Caught: java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder 
java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder 
    at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:177) 
    at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:218) 
    at Main$_main_closure1.doCall(Main.groovy:30) 
    at Main.main(Main.groovy:24) 
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:143) 

這裏是主類的代碼:

// Grap HTTPBuilder component from maven repository 
@Grab(group='org.codehaus.groovy.modules.http-builder', 
     module='http-builder', version='0.5.2') 
// import of HttpBuilder related stuff 
import groovyx.net.http.* 
import parsers.Parser 
import parsers.WuantoParser 
import parsers.Row 

class Main { 

    static mapOfParsers = [:] 
    static void main(args) { 
     List<Row> results = new ArrayList<>() 

     // Initiating the parsers for the ebay-keywords websites 
     println "Initiating Parsers..." 
     initiateParsers() 

     println "Parsing Websites..." 
     mapOfParsers.each { key, parser -> 
      switch (key) { 
       case Constants.Parsers.WUANTO_PARSER: 
        println "Parsing Url: $Constants.Url.WUANTO_ROOT_CAT_URL" 
        println "Retrieving Html Content..." 

        def http = new HTTPBuilder(Constants.Url.WUANTO_ROOT_CAT_URL) 
        def html = http.get([:]) 

        println "Parsing Html Content..." 

        results.addAll(((Parser) parser).parseHtml(html)) 
        break 
      } 
     } 

     results.each { 
      println it 
     } 
    } 

    static void initiateParsers() { 
     mapOfParsers.put(Constants.Parsers.WUANTO_PARSER , new WuantoParser()) 
    } 

    static void writeToFile(List<Row> rows) { 
     File file = "output.txt" 

     rows.each { 
      file.write it.toString() 
     } 
    } 

} 

回答

0

那麼讓我們來看看這裏。我試着在你的代碼片段中運行代碼,但http生成器依賴版本0.5.2已經過時了,而且我的groovy腳本指向的版本庫中無法訪問它。所以我用更新的版本0.7.1替換了它。

此外,代碼中從http.get返回的html變量值實際上是一個解析的格式。即它不是文字,而是一個時髦的對象。這是因爲默認情況下,http生成器會執行html解析,並且如果需要,您必須明確地告訴它返回純文本(即使它然後返回讀者而不是文本)。

下有所調整和重寫代碼的版本演示了念頭:

// Grap HTTPBuilder component from maven repository 
@Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1') 

import groovyx.net.http.* 
import groovy.xml.XmlUtil 
import static groovyx.net.http.ContentType.* 

class MyParser { 
    def parseHtml(html) { 
    [html] 
    } 
} 


def mapOfParsers = [:] 
mapOfParsers["WUANTO_PARSER"] = new MyParser() 

result = [] 
mapOfParsers.each { key, parser -> 
    switch (key) { 
     case "WUANTO_PARSER": 
      // just a sample url which returns some html data 
      def url = "https://httpbin.org/links/10/0" 

      def http = new HTTPBuilder(url) 
      def html = http.get([:]) 

      // the object returned from http.get is of type 
      // http://docs.groovy-lang.org/latest/html/api/groovy/util/slurpersupport/NodeChild.html 
      // this is a parsed format which is navigable in groovy 
      println "extracting HEAD.TITLE text: " + html.HEAD.TITLE.text() 

      println "class of returned object ${html.getClass().name}" 
      println "First 100 characters parsed and formatted:\n ${XmlUtil.serialize(html).take(100)}" 

      // forcing the returned type to be plain text 
      def reader = http.get(contentType : TEXT) 

      // what is returned now is a reader, we can get the text in groovy 
      // via reader.text 
      def text = reader.text 
      println "Now we are getting text, 100 first characters plain text:\n ${text.take(100)}" 

      result.addAll parser.parseHtml(text) 
      break 
    } 
} 

result.each { 
    println "result length ${it.length()}" 
} 

運行上面打印:

extracting HEAD.TITLE text: Links 
class of returned object groovy.util.slurpersupport.NodeChild 
First 100 characters parsed and formatted: 
<?xml version="1.0" encoding="UTF-8"?><HTML> 
    <HEAD> 
    <TITLE>Links</TITLE> 
    </HEAD> 
    <BODY>0 < 
Now we are getting text, 100 first characters plain text: 
<html><head><title>Links</title></head><body>0 <a href='/links/10/1'>1</a> <a href='/links/10/2'>2</ 
result length 313 

(與略去了一對夫婦從XmlUtil.serialize警告)。

這些都不能解釋爲什麼你會得到你得到的例外,但也許上述可以讓你解鎖並解決問題。