0
所以我想從某個網站獲取一些數據。當應用程序首次啓動時,它會下載某個網站的html文件並清除它。正在使用HtmlCleaner和Jsoup在一起一個好主意嗎?
private class cleanHtml extends AsyncTask<Void, Void, Void>{
@Override
protected Void doInBackground(Void... arg0) {
try {
HtmlCleaner cleaner = new HtmlCleaner();
String url = "https://www.easistent.com/urniki/263/razredi/16515";
TagNode node = cleaner.clean(new URL(url));
CleanerProperties props = cleaner.getProperties();
String fileName = Environment.getExternalStorageDirectory().getPath() + "/Android/data/com.whizzapps.stpsurniki/cleaned.html";
new PrettyXmlSerializer(props).writeToFile(node, fileName, "utf-8");
Log.i("TAG", "AsyncTask done!");
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
}
現在我知道我可以使用使用XPath HtmlCleaner解析HTML,但我沒有知識的XPath在所有。我很肯定,在清理完文件之後用Jsoup解析它會更容易。這個可以嗎?