了很多的鬥爭使我以下解決方案:
我創建了一個功能,用於將DOCX文獻的字節數組的Html如下
public string ConvertToHtml(byte[] fileInfo, string fileName = "Default.docx")
{
if (string.IsNullOrEmpty(fileName) || Path.GetExtension(fileName) != ".docx")
return "Unsupported format";
//FileInfo fileInfo = new FileInfo(fullFilePath);
string htmlText = string.Empty;
try
{
htmlText = ParseDOCX(fileInfo, fileName);
}
catch (OpenXmlPackageException e)
{
if (e.ToString().Contains("Invalid Hyperlink"))
{
using (MemoryStream fs = new MemoryStream(fileInfo))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
htmlText = ParseDOCX(fileInfo, fileName);
}
}
return htmlText;
}
凡ParseDOCX做所有的皈依。 ParseDOCX的代碼:
private string ParseDOCX(byte[] fileInfo, string fileName)
{
try
{
//byte[] byteArray = File.ReadAllBytes(fileInfo.FullName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(fileInfo, 0, fileInfo.Length);
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(memoryStream, true))
{
int imageCounter = 0;
var pageTitle = fileName;
var part = wDoc.CoreFilePropertiesPart;
if (part != null)
pageTitle = (string)part.GetXDocument().Descendants(DC.title).FirstOrDefault() ?? fileName;
WmlToHtmlConverterSettings settings = new WmlToHtmlConverterSettings()
{
AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
PageTitle = pageTitle,
FabricateCssClasses = true,
CssClassPrefix = "pt-",
RestrictToSupportedLanguages = false,
RestrictToSupportedNumberingFormats = false,
ImageHandler = imageInfo =>
{
++imageCounter;
string extension = imageInfo.ContentType.Split('/')[1].ToLower();
ImageFormat imageFormat = null;
if (extension == "png") imageFormat = ImageFormat.Png;
else if (extension == "gif") imageFormat = ImageFormat.Gif;
else if (extension == "bmp") imageFormat = ImageFormat.Bmp;
else if (extension == "jpeg") imageFormat = ImageFormat.Jpeg;
else if (extension == "tiff")
{
extension = "gif";
imageFormat = ImageFormat.Gif;
}
else if (extension == "x-wmf")
{
extension = "wmf";
imageFormat = ImageFormat.Wmf;
}
if (imageFormat == null)
return null;
string base64 = null;
try
{
using (MemoryStream ms = new MemoryStream())
{
imageInfo.Bitmap.Save(ms, imageFormat);
var ba = ms.ToArray();
base64 = System.Convert.ToBase64String(ba);
}
}
catch (System.Runtime.InteropServices.ExternalException)
{ return null; }
ImageFormat format = imageInfo.Bitmap.RawFormat;
ImageCodecInfo codec = ImageCodecInfo.GetImageDecoders().First(c => c.FormatID == format.Guid);
string mimeType = codec.MimeType;
string imageSource = string.Format("data:{0};base64,{1}", mimeType, base64);
XElement img = new XElement(Xhtml.img,
new XAttribute(NoNamespace.src, imageSource),
imageInfo.ImgStyleAttribute,
imageInfo.AltText != null ?
new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
return img;
}
};
XElement htmlElement = WmlToHtmlConverter.ConvertToHtml(wDoc, settings);
var html = new XDocument(new XDocumentType("html", null, null, null), htmlElement);
var htmlString = html.ToString(SaveOptions.DisableFormatting);
return htmlString;
}
}
}
catch (Exception)
{
return "File contains corrupt data";
}
}
到目前爲止,一切都顯得漂亮和容易的,但後來我意識到,頁眉和頁腳部分只是skipt,所以我不得不以某種方式將它們轉換。 我試圖使用HeaderPart的方法GetStream()
,但當然是異常拋出,因爲Header樹與Document的不一樣。
然後,我決定使用openXML的WordprocessingDocument headerDoc = WordprocessingDocument.Create(headerStream,Document)
將新文檔(與此有關的時間很長)解壓爲Header和Footer,但不幸的是,這個文檔的轉換同樣可能是沒有意義的,因爲這只是創建一個普通的docx文檔沒有任何設置,樣式,web設置等。這花了很多時間去想出來。
因此,我終於決定創建一個新的文件通過Cathal的DocX庫,它最終來到現場。代碼如下:
public string ConvertHeaderToHtml(HeaderPart header)
{
using (MemoryStream headerStream = new MemoryStream())
{
//Cathal's Docx Create
var newDocument = Novacode.DocX.Create(headerStream);
newDocument.Save();
using (WordprocessingDocument headerDoc = WordprocessingDocument.Open(headerStream,true))
{
var headerParagraphs = new List<OpenXmlElement>(header.Header.Elements());
var mainPart = headerDoc.MainDocumentPart;
//Cloning the List is necesery because it will throw exception for the reason
// that you are working with refferences of the Elements
mainPart.Document.Body.Append(headerParagraphs.Select(h => (OpenXmlElement)h.Clone()).ToList());
//Copies the Header RelationShips as Document's
foreach (IdPartPair parts in header.Parts)
{
//Very important second parameter of AddPart, if not set the relationship ID is being changed
// and the wordDocument pictures, etc. wont show
mainPart.AddPart(parts.OpenXmlPart,parts.RelationshipId);
}
headerDoc.MainDocumentPart.Document.Save();
headerDoc.Save();
headerDoc.Close();
}
return ConvertToHtml(headerStream.ToArray());
}
}
所以這是與頭。我傳遞HeaderPart並獲取它的Header和Elements。提取關係,如果頭中有圖像,並將其導入文檔本身,並且文檔已準備好進行轉換,這非常重要。
使用相同的步驟將Html生成出頁腳。
希望這將有助於他的一些職責。
嗨,沒有直接的方法在OpenXML中獲取頁眉和頁腳作爲HTML(即在OpenXML powertools中),而不是必須將文本的頁眉和頁腳內容作爲文本讀取,那麼您必須爲該文本應用樣式標題文本。請參閱:https:// github。com/OfficeDev/Open-Xml-PowerTools/issues/66#issuecomment-326629828 –