正在使用HtmlCleaner和Jsoup在一起一個好主意嗎？

所以我想從某個網站獲取一些數據。當應用程序首次啓動時，它會下載某個網站的html文件並清除它。正在使用HtmlCleaner和Jsoup在一起一個好主意嗎？

private class cleanHtml extends AsyncTask<Void, Void, Void>{ 

    @Override 
    protected Void doInBackground(Void... arg0) { 
     try { 
      HtmlCleaner cleaner = new HtmlCleaner(); 
      String url = "https://www.easistent.com/urniki/263/razredi/16515"; 
      TagNode node = cleaner.clean(new URL(url)); 
      CleanerProperties props = cleaner.getProperties(); 
      String fileName = Environment.getExternalStorageDirectory().getPath() + "/Android/data/com.whizzapps.stpsurniki/cleaned.html"; 
      new PrettyXmlSerializer(props).writeToFile(node, fileName, "utf-8"); 
      Log.i("TAG", "AsyncTask done!"); 
     } catch (MalformedURLException e) { 
      // TODO Auto-generated catch block 
      e.printStackTrace(); 
     } catch (IOException e) { 
      // TODO Auto-generated catch block 
      e.printStackTrace(); 
     } 
     return null; 
    } 
}

現在我知道我可以使用使用XPath HtmlCleaner解析HTML，但我沒有知識的XPath在所有。我很肯定，在清理完文件之後用Jsoup解析它會更容易。這個可以嗎？

來源

2013-09-27 Guy

它不應該是一個問題，你需要的只是一個有效的html。你可以用這個：

String html = getHtml(); 
Document doc = Jsoup.parse(html); 
Elements elms = doc.select("cssSelector"); 
Elements elms1 = doc.getElementsByClass("class");

來源

2013-09-28 22:42:24

正在使用HtmlCleaner和Jsoup在一起一個好主意嗎？

回答

相關問題