2016-08-17 85 views
0

我正在用Android寫一個webcrawler。我的代碼是我可以使用AsyncHttpResponseHandler或AsyncHttpClient類查找HTML標記嗎?

public void parseHttp() { 
     AsyncHttpClient client = new AsyncHttpClient(); 
     String url = "http://stackoverflow.com/questions/38959381/unable-to-scrape-data-from-internet-using-android-intents"; 

     client.get(url, new AsyncHttpResponseHandler(Looper.getMainLooper()) { 
      @Override 
      public void onSuccess(int statusCode, Header[] headers, byte[] responseBody) { 
       String body = new String(responseBody); 
       System.out.println(body); 

       Pattern p = Pattern.compile("<h1(.*)<\\/h1>"); 
       Matcher m = p.matcher(body); 
       Log.d("tag", "success"); 
       if (m.find()) { 
        String match = m.group(1); 
        Log.d("tag", match); 
       } 

      } 

      @Override 
      public void onFailure(int statusCode, Header[] headers, byte[] responseBody, Throwable error) { 

       Log.d("tag", "failure"); 
      } 
     }); 
    } 

它是找到在一個字符串h1標籤是使用regex網頁文件的響應。我能找到tag作爲一般使用Jsoup庫作爲

try { 
    Document doc; 
    URL = requestString; 
    doc = Jsoup.connect(URL).timeout(20 * 1000).userAgent("Chrome").get(); 
    Elements links = doc.select("h1"); 
    responseMessage = links.text(); 
} catch (IOException e) { 
    responseMessage = e.getMessage(); 
} 

我能找到使用AsynsHTTPResponceHandler類代碼,如Jsoup嗎?由於第四行是Elements links = doc.select("h1"); responseMessage = links.text(); 任何幫助或方向將是欣賞。

回答

0

Jsoup允許從字符串解析文檔,而不是直接通過HTTP(S)加載它。

Document doc = Jsoup.parseBodyFragment(body); 
+0

謝謝親愛的。有用。 – waqas

相關問題