從源代碼剝離html標記

HTML = EntityUtils.toString(response.getEntity()); 
ResponseHandler<String> responseHandler = new BasicResponseHandler(); 
String ResponseBody = httpclient.execute(httppost, responseHandler); 
table = ResponseBody.substring(ResponseBody.indexOf("<table border=\"1\" cellpadding=\"0\" width=\"100%\" cellspacing=\"0\">")); 
table = table.substring(0, table.indexOf("</table>")); 

String htmlString = table; 
String noHTMLString = htmlString.replaceAll("\\<.*?\\>", ""); 
noHTMLString = noHTMLString.replaceAll("\r", "<br/>"); 
noHTMLString = noHTMLString.replaceAll("\n", " "); 
noHTMLString = noHTMLString.replaceAll("\'", "&#39;"); 
noHTMLString = noHTMLString.replaceAll("\"", "&quot;"); 

TextView WORK = (TextView) findViewById(R.id.HTML); 
WORK.setText(htmlString);

我正在使用正則表達式來提取HTML代碼。這是我的代碼。這似乎是正確的，但表（子字符串）是什麼被返回而不是提取的文本。有誰知道爲什麼？從源代碼剝離html標記

來源

2013-03-19 user2187017

這是不可能的使用正則表達式來解析HTML。而是使用HTML解析庫。 – DwB 2013-03-19 15:02:34

[RegEx match open tags not XHTML self-contained tags]可能重複（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags）必須在這裏閱讀答案。這真是太棒了（並且對SO最有幫助的答案） – Simon 2013-03-19 15:05:29

謝謝你們我最終不得不改變整個過程，把它放到一個二維數組中。 – user2187017 2013-03-31 15:29:56

您必須使用新的String對象作爲TextView的源代碼。更改此：

WORK.setText(htmlString);

以下幾點：

WORK.setText(noHTMLString);

來源

2013-03-19 14:57:48 Shade

從源代碼剝離html標記

回答

相關問題