使用Python中的xhtml2pdf.pisa轉換阿拉伯文頁面

我試圖從比薩實用程序轉換html2pdf。請檢查下面的代碼。我得到了我無法弄清楚的錯誤。使用Python中的xhtml2pdf.pisa轉換阿拉伯文頁面

Traceback (most recent call last): 
    File "dewa.py", line 27, in <module> 
    html = html.encode(enc, 'replace') 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd9 in position 203: ordinal not in range(128)

請檢查這裏的代碼。

from cStringIO import StringIO 
from grab import Grab 
from grab.tools.lxml_tools import drop_node, render_html 
from grab.tools.text import remove_bom 
from lxml import etree 
import grab.error 
import inspect 
import lxml 
import os 
import sys 
import xhtml2pdf.pisa as pisa 

enc = 'utf-8' 
filePath = '~/Desktop/dewa' 
############################## 

g = Grab() 
g.go('http://www.dewa.gov.ae/arabic/aboutus/dewahistory.aspx') 

html = g.response.body 

html = html.replace('bgcolor="EDF389"', 'bgcolor="#EDF389"') 


''' clear page ''' 
html = html.encode(enc, 'replace') 

print html 

f = file(filePath + '.html' , 'wb') 
f.write(html) 
f.flush() 
f.close() 

''' Save PDF ''' 
pdfresult = StringIO() 
pdf = pisa.pisaDocument(StringIO(html), pdfresult, encoding = enc) 
f = file(filePath + '.pdf', 'wb') 
f.write(pdfresult.getvalue()) 
f.flush() 
f.close() 
pdfresult.close()

來源

2012-12-10 ArunaFromLK

谷歌搜索的**「ASCII」編解碼器**堆棧溢出不能解碼字節返回12K +結果。你可能想從那開始...... – dda

如果檢查通過這條線返回的對象的類型：

html = g.response.body

，你會看到它是不是一個Unicode對象：

print type(html) 
... 
<type 'str'>

所以當你來到這一行：

html = html.encode(enc, 'replace')

您正在嘗試重新編碼已編碼的字符串（導致錯誤）。

爲了解決這個問題，改變你的代碼看起來像這樣：

# decode the dowloaded data 
html = g.response.body.decode(enc) 

# html is now a unicode object 
html = html.replace('bgcolor="EDF389"', 'bgcolor="#EDF389"') 

print html 

# encode as utf-8 before writing to file (no need for 'replace') 
html = html.encode(enc)

來源

2012-12-10 18:32:45 ekhumoro

親愛的ekhumoro。感謝你的回答。按照您的建議修復腳本後，保存的pdf/html文件無法讀取。請檢查生成的文件。 – ArunaFromLK

我給出的代碼是正確的，並處理編碼問題。我猜你對字體有不同的問題。你在pdf文件中看到很多黑色的矩形嗎？如果是這樣，[這個問題]（http://stackoverflow.com/q/4047095/984421）可能會有所幫助。 – ekhumoro

親愛的ekhumoro，thans再次。現在我可以看到pdf中的阿拉伯文字。但是，所有的文本都是相反的順序。任何線索？ – ArunaFromLK

使用Python中的xhtml2pdf.pisa轉換阿拉伯文頁面

回答

相關問題