Jsoup中的字符集

後，以下代碼的執行：

Document doc = new Document(language); 

File input = new File("filePath" + "filename.html"); 
PrintWriter writer = new PrintWriter(input, "UTF-8"); 

String contentType = "<%@ page contentType=\"text/html; charset=UTF-8\" %>"; 
doc.appendText(contentType); 

writer.write(doc.toString()); 
writer.flush(); 
writer.close();

在輸出HTML文件，我收到文本的下面一行：

&lt;%@ page contentType=&quot;text/html; charset=UTF-8&quot; %&gt;

，而不是

<%@ page contentType="text/html; charset=UTF-8" %>

可能是什麼問題？

來源

2015-04-24 Dan

它不太清楚你想要代碼實際上做什麼，也許你可以包括其餘的代碼呢？ – JonasCz

這些是轉義字符，用於防止瀏覽器將它們視爲html標記。這不是一個問題。如果您在此處通過瀏覽器

來源

2015-04-24 17:40:06 Aswin

打開頁面中的一些問題，這將正確地呈現：

Document doc = new Document(language);

不要這樣做。改爲使用Jsoup.parse(...)。

<%@ page contentType="text/html; charset=UTF-8" %>

這不是HTML，並可能無法正確解析。

現在，爲您的問題。你應該使用類似

Document document = Jsoup.parse(new ByteArrayInputStream(myHtmlString.getBytes(StandardCharsets.UTF_8)), "ISO-8859-1", BaseUrl);

檢查this，this和this，而您可能需要outputSetting。

來源

2015-04-24 17:51:02 JonasCz

Jsoup中的字符集

回答

相關問題