2013-02-21 57 views
1

當我節省處理一些文本使用POI的XPath後的docx文件,我則ByteArrayOutputStream傳遞到一個新的ByteArrayInputStream的和飼料它與docx4j無法讀取POI保存的文件,誰有錯?

wordMLPackage = WordprocessingMLPackage.load(
    bis 
); 

到dox4j隨着4分之3的我的模板,這將引發一個例外:

org.docx4j.openpackaging.exceptions.InvalidFormatException: Unexpected package (docx4j supports docx/docxm and pptx only 
    at org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage(ContentTypeManager.java:834) 

的代碼看起來是這樣的:

/* Return a package of the appropriate type. Used when loading an existing 
* Package, with an already populated [Content_Types].xml. When 
* creating a new Package, start with the new WordprocessingMLPackage constructor. */ 
public OpcPackage createPackage() throws InvalidFormatException { 

    /* 
    * How do we know what type of Package this is? 
    * 
    * In principle, either: 
    * 
    * 1. We were told its file extension or mime type in the 
    * constructor/method parameters, or 
    * 
    * 2. Because [Content_Types].xml contains an override for PartName 
    * /document.xml of content type 
    * application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml 
    * 
    * The latter approach is more reliable, so .. 
    * 
    */ 
    OpcPackage p; 

    if (getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_DOCUMENT) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_DOCUMENT_MACROENABLED) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.WORDPROCESSINGML_TEMPLATE_MACROENABLED) != null) { 
     log.info("Detected WordProcessingML package "); 
     p = new WordprocessingMLPackage(this); 
     return p; 
    } else if (getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_MAIN) != null 
      || getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.PRESENTATIONML_SLIDESHOW) != null) { 
     log.info("Detected PresentationMLPackage package "); 
     p = new PresentationMLPackage(this); 
     return p; 
    } else if (getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_WORKBOOK) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_WORKBOOK_MACROENABLED) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_TEMPLATE) != null 
      || getPartNameOverridenByContentType(ContentTypes.SPREADSHEETML_TEMPLATE_MACROENABLED) != null) { 
     // "xlam", "xlsb" ? 
     log.info("Detected SpreadhseetMLPackage package "); 
     p = new SpreadsheetMLPackage(this); 
     return p; 

    } else if (getPartNameOverridenByContentType(ContentTypes.DRAWINGML_DIAGRAM_LAYOUT) != null) { 
     log.info("Detected Glox file "); 
     p = new GloxPackage(this); 
     return p; 
    } else { 
     throw new InvalidFormatException("Unexpected package (docx4j supports docx/docxm and pptx only"); 
     //return new Package(this); 
    } 
} 

這似乎是無法匹配一些特定的內容類型覆蓋。在我的出發DOCX模板有一個[CONTENT_TYPES] .xml文件其中有:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
    <Override PartName="/_rels/.rels"  ContentType="application/vnd.openxmlformats-package.relationships+xml" /> 
    <Override PartName="/word/fontTable.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml" /> 
    <Override PartName="/word/_rels/document.xml.rels"  ContentType="application/vnd.openxmlformats-package.relationships+xml" /> 
    <Override PartName="/word/media/image1.wmf"   ContentType="image/x-wmf" /> 
    <Override PartName="/word/comments.xml"   ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml" /> 
    <Override PartName="/word/numbering.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml" /> 
    <Override PartName="/word/footer1.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml" /> 
    <Override PartName="/word/document.xml"   ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml" /> 
    <Override PartName="/word/styles.xml"  ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml" /> 
    <Override PartName="/docProps/app.xml"  ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml" /> 
    <Override PartName="/docProps/core.xml"   ContentType="application/vnd.openxmlformats-package.core-properties+xml" /> 
</Types> 

與POI處理後的[CONTENT_TYPES] .XML看起來是這樣的:

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
    <Default Extension="xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/> 
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> 
    <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/> 
    <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/> 
    <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/> 
    <Override PartName="/word/comments.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.comments+xml"/> 
    <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/> 
    <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/> 
    <Override PartName="/word/media/image1.wmf" ContentType="image/x-wmf"/> 
    <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/> 
    <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/> 
</Types> 

注意,佔優PartName =「/ word/document.xml」缺失!

這是無字可接受的文件內容類型的文件/ document.xml中重寫?它在LibreOffice中打開,沒有投訴。是docx4j依靠其可能不存在的內容類型,或POI不正確書寫的內容類型我的一些文件(3出4)覆蓋的標籤。

+1

我認爲這是一個錯誤docx4j - POI被設置與docx4j看起來是忽略了正確類型的默認。 – Gagravarr 2013-02-21 16:51:22

+0

我同意,我已經在github項目上打開了第46期的一些代碼想法來解決它。我仍然想知道規範說的Override標籤。 – chugadie 2013-02-21 18:13:57

回答

2

披露:我docx4j項目導致

什麼POI做似乎是按照規範合法,但效果不理想。

每ECMA-376第2部分,「獲取部分的內容類型」,當指定的POI做它的方式docx4j應該找到的docx的內容類型。

在第1部分所述的WordprocessingML中章,說「包結構」一節中:

首先,內容類型關係的部件和主文檔 部分(唯一必需的部分)必須被定義(物理位置在包 /[Content_Types].xml):

<Types 
xmlns="http://schemas.openxmlformats.org/package/2006/content-types"> 
&lt;Default Extension="rels" 
ContentType="application/vnd.openxmlformatspackage. 
relationships+xml"/> 
<Override PartName="/document.xml" 
ContentType="application/vnd.openxmlformatsofficedocument. 
wordprocessingml.document.main+xml"/> </Types> 

我的閱讀是你必須定義主文檔部分的內容類型(POI母鹿s),提示只是使用覆蓋來做到這一點。

當我的大多數零件都是.xml並且需要一個覆蓋來指定某些東西時,對於與匹配一個(或者可能是2或3個零件)的東西,使用.xml默認沒什麼意義不同。我想知道爲什麼POI這樣做 - 與規範中的建議不同,與Word發出的不同。

也就是說,https://github.com/plutext/docx4j/commit/1c1190fc3a2fc6e191c825a0e30fde2654cc997c應該解決這個問題。

相關問題