用特殊字符閱讀Python網頁源代碼

-1

我正在從網頁中讀取網頁源代碼，然後解析該源代碼中的值。那裏我正面臨特殊字符的問題。用特殊字符閱讀Python網頁源代碼

在我的python控制器文件iam中使用# -*- coding: utf-8 -*-。但我讀這是使用charset=iso-8859-1

網頁源所以，當我讀不指定它拋出錯誤的任何編碼爲UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 133: invalid start byte

當我使用string.decode("iso-8859-1").encode("utf-8")則分析數據沒有任何錯誤的頁面內容。但它顯示的值是'F \ u00fcnke'而不是'Fünke'。

請讓我知道我可以如何解決這個問題。我將不勝感激任何建議。

來源

2013-08-18 Pradeeshnarayan

嘗試打印'u「F \ u00fcnke」' –

Python ** 2 **或** 3 **？ – Torxed

Python 2.7。並嘗試unicode（）它顯示相同。 – Pradeeshnarayan

編碼在Python3中肯定是PITA（在某些情況下也是2）。嘗試檢查這些鏈接時，他們可能會幫助您：

Python - Encoding string - Swedish Letters
Python3 - ascii/utf-8/iso-8859-1 can't decode byte 0xe5 (Swedish characters)

http://docs.python.org/2/library/codecs.html

而且這將是與"So when I read the page content without specifying any encoding"我最好的猜測代碼漂亮是你的控制檯不使用utf-8（例如，windows ..你的# -*- coding: utf-8 -*-只告訴Python在源代碼中找到什麼類型的字符，而不是代碼要解析或分析自己的實際數據。例如我寫：

# -*- coding: iso-8859-1 -*- 
import time 
# Här skriver jag ut tiden (Translation: Here, i print out the time) 
print(time.strftime('%H:%m:%s'))

來源

2013-08-18 21:38:03 Torxed

並沒有理由的downvote，真正有建設性（如果我錯了，至少指出ffs） - .- – Torxed

用特殊字符閱讀Python網頁源代碼

回答

相關問題