python
  • unicode
  • python-docx
  • 2017-08-20 138 views 1 likes 
    1

    有沒有辦法將unicode字符寫入docx文件?我試過python-docx,但是,它給了我TypeError將Unicode寫入.docx文件

    回溯

    Traceback (most recent call last):

    File "< ipython-input-1-ba89c735995d >", line 1, in runfile('H:/Python/Practice/new/download.py', wdir='H:/Python/Practice/new')

    File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)

    File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

    File "H:/Python/Practice/new/download.py", line 37, in document.add_paragraph(story.encode("utf-8"))

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\document.py", line 63, in add_paragraph return self._body.add_paragraph(text, style)

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\blkcntnr.py", line 36, in add_paragraph paragraph.add_run(text)

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\paragraph.py", line 37, in add_run run.text = text

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\run.py", line 163, in text self._r.text = text

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 104, in text _RunContentAppender.append_to_run_from_text(self, text)

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 134, in append_to_run_from_text appender.add_text(text)

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 142, in add_text self.add_char(char)

    File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 156, in add_char elif char in '\r\n':

    TypeError: 'in < string >' requires string as left operand, not int

    我試圖刮網站,並寫文本到MS Word文件。文本以當地語言(孟加拉語)。當我在控制檯上打印story時,它會完美地打印整個文本。

    代碼

    import requests 
    from docx import Document 
    from bs4 import BeautifulSoup 
    
    url = "some url" 
    response = requests.get(url) 
    soup = BeautifulSoup(response.text, "lxml") 
    
    story = soup.find("div", {"dir": "ltr"}).get_text().replace("<br />", "\n", len(response.text)) 
    
    title = "title" 
    document = Document() 
    document.add_heading(title, 0) 
    document.add_paragraph(story.encode("utf-8")) 
    document.save(title + ".docx") 
    
    +0

    我已添加完整追蹤。順便說一句,它從「URL」中獲得完美的數據。當我在控制檯上打印它時,它工作正常。 –

    +0

    你試過*不*調用'.encode('utf-8')'? – kennytm

    +0

    如果我不使用'encode(「utf-8」)'它給了我一個由一些塊組成的文檔。 –

    回答

    2

    在Python 3,str英格斯(應該)自動支持Unicode,所以你不需要做什麼特別喜歡的編碼到UTF-8。

    document.add_paragraph(story) 
    #     ^just do this directly, no need to call `.encode('utf-8')` 
    

    雖然你可以寫Unicode字符到文檔中,通過文字處理器的默認文本渲染引擎可能不支持它。您可能需要specify a font以確保字符不會變成□□□□。

    run = document.add_paragraph(story).add_run() 
    font = run.font 
    font.name = 'Vrinda' 
    
    相關問題