打開html文件來閱讀它的來源搜索特定標記

我無法打開一個html文件在閱讀模式來搜索其來源中的字符串匹配。打開html文件來閱讀它的來源搜索特定標記

fo = open("htm1.html", "r"); 
str = fo.read(10); 
print("Read String is : ", str); 
fo.close();

這是行不通的。我收到以下錯誤：

Traceback (most recent call last): 
    File "C:/Python33/myProjects/malCode.py", line 34, in <module> 
    str = fo.read(10); 
    File "C:\Python33\lib\encodings\cp1252.py", line 23, in decode 
    return codecs.charmap_decode(input,self.errors,decoding_table)[0] 
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 6040: character maps to <undefined>

來源

2014-02-15 faiz

如果不指定編碼，open() will use locale.getpreferredencoding();在你的情況下cp1252。這是一個問題，因爲HTML文件採用了一種不同的尚未確定的編碼。碰巧已經使用了哪種編碼是一件好事。如果做不到這一點，你可以把它與一些常見的編碼一去像'utf-8'或'latin-1'.例如：

with open("htm1.html", "r", encoding="utf-8") as fo: 
    sample = fo.read(10)

（次要語法提示：不需要行結束分號）

來源

2014-02-15 08:16:23 bernie

打開html文件來閱讀它的來源搜索特定標記

回答

相關問題