我想DOCX文檔轉換爲HTML,但我不能讓編碼工作做好。 OutputStream包含XML頭,聲明內容使用UTF-8編碼,但不是語言特定的標誌(如:ąśćźż)我得到一些垃圾。這裏是我的代碼轉換器:Docx4j:charset編碼的HTML輸出
public class DocumentProcessor extends DocumentProcessorInterface {
private WordprocessingMLPackage load;
private HTMLSettings htmlSettings;
private Http.MultipartFormData.FilePart filePart;
public DocumentProcessor(Http.MultipartFormData.FilePart filePart) {
super(filePart);
this.filePart = filePart;
}
private void prepare() {
try {
load = Docx4J.load(filePart.getFile());
htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageHandler(new DataUrlImageHandler());
htmlSettings.setWmlPackage(load);
} catch (Docx4JException e) {
e.printStackTrace();
}
}
@Override
public String getHTML() {
prepare();
OutputStream outputStream = new ByteArrayOutputStream();
Logger.info("Converting");
try {
Docx4J.toHTML(htmlSettings, outputStream, Docx4J.FLAG_EXPORT_PREFER_XSL);
} catch (Docx4JException e) {
e.printStackTrace();
}
Logger.info("Converted");
return outputStream.toString();
}
}
輸出看起來是這樣的:http://imgur.com/0sTTIe6我已經檢查數據庫編碼本身。 我錯過了什麼?
相反outputStream.toString的(),指定編碼? – JasonPlutext 2014-10-09 19:37:27