在將Html轉換爲PDF時顯示Unicode字符

我正在使用itextsharp dll將HTML轉換爲PDF。在將Html轉換爲PDF時顯示Unicode字符

HTML有一些Unicode字符，如α，β...當我嘗試將HTML轉換爲PDF時，Unicode字符不顯示在PDF中。

我的功能：

Document doc = new Document(PageSize.LETTER); 

using (FileStream fs = new FileStream(Path.Combine("Test.pdf"), FileMode.Create, FileAccess.Write, FileShare.Read)) 
{ 
    PdfWriter.GetInstance(doc, fs); 

    doc.Open(); 
    doc.NewPage(); 

    string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), 
             "ARIALUNI.TTF"); 

    BaseFont bf = BaseFont.CreateFont(arialuniTff, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 

    Font fontNormal = new Font(bf, 12, Font.NORMAL); 

    List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), 
               new StyleSheet()); 
    Paragraph p = new Paragraph {Font = fontNormal}; 

    foreach (var element in list) 
    { 
     p.Add(element); 
     doc.Add(p); 
    } 

    doc.Close(); 
}

來源

2012-04-26 NIlesh Lanke

當Unicode字符處理和iTextSharp的有一對夫婦的需要照顧的事情。第一個你已經做了，那就是獲得支持你角色的字體。第二件事是你想實際註冊iTextSharp的字體，以便它知道它。

//Path to our font 
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF"); 
//Register the font with iTextSharp 
iTextSharp.text.FontFactory.Register(arialuniTff);

現在，我們有一個字體，我們需要創建一個StyleSheet對象，告訴iTextSharp的何時以及如何使用它。

//Create a new stylesheet 
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet(); 
//Set the default body font to our registered font's internal name 
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS");

的一個非HTML的一部分，您還需要做的是設置一個特殊的encoding參數。此編碼特定於iTextSharp，在您的情況下，您希望它是Identity-H。如果你不設置它，那麼它默認爲Cp1252（WINANSI）。

//Set the default encoding to support Unicode characters 
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H);

最後，我們需要把我們的樣式表傳遞到ParseToList方法：

//Parse our HTML using the stylesheet created above 
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST);

把所有的一起，從開到關你必須：

doc.Open(); 

//Sample HTML 
StringBuilder stringBuilder = new StringBuilder(); 
stringBuilder.Append(@"<p>This is a test: <strong>α,β</strong></p>"); 

//Path to our font 
string arialuniTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "ARIALUNI.TTF"); 
//Register the font with iTextSharp 
iTextSharp.text.FontFactory.Register(arialuniTff); 

//Create a new stylesheet 
iTextSharp.text.html.simpleparser.StyleSheet ST = new iTextSharp.text.html.simpleparser.StyleSheet(); 
//Set the default body font to our registered font's internal name 
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.FACE, "Arial Unicode MS"); 
//Set the default encoding to support Unicode characters 
ST.LoadTagStyle(HtmlTags.BODY, HtmlTags.ENCODING, BaseFont.IDENTITY_H); 

//Parse our HTML using the stylesheet created above 
List<IElement> list = HTMLWorker.ParseToList(new StringReader(stringBuilder.ToString()), ST); 

//Loop through each element, don't bother wrapping in P tags 
foreach (var element in list) { 
    doc.Add(element); 
} 

doc.Close();

編輯

在您的評論中顯示HTML它指定了重寫字體。 iTextSharp不會爲系統提供字體，並且其HTML解析器不會使用字體回退技術。 HTML/CSS中指定的任何字體都必須手動註冊。

string lucidaTff = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "l_10646.ttf"); 
iTextSharp.text.FontFactory.Register(lucidaTff);

來源

2012-04-26 15:26:09

如果HTML內容都像

α,β

上述功能無法正常工作。 – 2012-04-27 06:37:50

您還可以使用新的XMLWorkerHelper（從庫itextsharp.xmlworker），你需要但覆蓋默認FontFactory實現。

void GeneratePdfFromHtml() 
{ 
    const string outputFilename = @".\Files\report.pdf"; 
    const string inputFilename = @".\Files\report.html"; 

    using (var input = new FileStream(inputFilename, FileMode.Open)) 
    using (var output = new FileStream(outputFilename, FileMode.Create)) 
    { 
    CreatePdf(input, output); 
    } 
} 

void CreatePdf(Stream htmlInput, Stream pdfOutput) 
{ 
    using (var document = new Document(PageSize.A4, 30, 30, 30, 30)) 
    { 
    var writer = PdfWriter.GetInstance(document, pdfOutput); 
    var worker = XMLWorkerHelper.GetInstance(); 

    document.Open(); 
    worker.ParseXHtml(writer, document, htmlInput, null, Encoding.UTF8, new UnicodeFontFactory()); 

    document.Close(); 
    }  
} 

public class UnicodeFontFactory : FontFactoryImp 
{ 
    private static readonly string FontPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), 
     "arialuni.ttf"); 

    private readonly BaseFont _baseFont; 

    public UnicodeFontFactory() 
    { 
     _baseFont = BaseFont.CreateFont(FontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED); 

    } 

    public override Font GetFont(string fontname, string encoding, bool embedded, float size, int style, BaseColor color, 
     bool cached) 
    { 
     return new Font(_baseFont, size, style, color); 
    } 
}

來源

2013-01-09 12:16:26

謝謝你，但是我得到的結果是，信件彼此分開。它顯示像我試過 – 2015-02-23 08:08:12

我試過但仍然沒有呈現中文單詞。 – 2015-09-24 10:39:43

升級到5.5.5並使用Microsoft Yasei front後，它現在可以正常工作。 – 2015-09-28 02:16:18

-1

下面是幾個步驟中轉換HTML到PDF

創建HTMLWorker
註冊Unicode字體，併爲其分配
創建一個樣式表來顯示Unicode字符和將編碼設置爲Identity-H
將樣式表分配給html分析器

查看下面的鏈接瞭解更多....

Display Unicode characters in converting Html to Pdf

印地文，土耳其和特殊字符也使用這種方法從HTML轉換爲PDF過程中顯示。檢查下面的演示圖像。

來源

2015-12-23 17:57:05

[鼓勵鏈接到外部資源，但請在鏈接上添加上下文，以便您的同行用戶瞭解它是什麼以及它爲什麼在那裏。請始終引用重要鏈接中最相關的部分，以防目標網站無法訪問或永久脫機。]（http://stackoverflow.com/help/how-to-answer） – 2016-05-20 15:41:37

在將Html轉換爲PDF時顯示Unicode字符

回答

相關問題