Python 2.6和3.2的問題在Windows上的urlopen例程

以前，在Python 2.6中，我使用了很多urllib.urlopen來捕獲網頁內容，然後後來處理我收到的數據。現在，這些例程以及我正在嘗試用於Python 3.2的新例程正在運行，似乎只是一個窗口（甚至可能只是Windows 7的問題）。Python 2.6和3.2的問題在Windows上的urlopen例程

使用與Python 3.2.2（64）下面的代碼在Windows 7上......

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read() 
fp.close() 
print(string.decode("utf8"))

我得到以下信息：

Traceback (most recent call last): 
    File "TATest.py", line 5, in <module> 
    string = fp.read() 
    File "d:\python32\lib\http\client.py", line 489, in read 
    return self._read_chunked(amt) 
    File "d:\python32\lib\http\client.py", line 553, in _read_chunked 
    self._safe_read(2)  # toss the CRLF at the end of the chunk 
    File "d:\python32\lib\http\client.py", line 592, in _safe_read 
    raise IncompleteRead(b''.join(s), amt) 
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

使用下面的代碼，而不是...

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 
for Line in fp: 
    print(Line.decode("utf8").rstrip('\n')) 
fp.close()

我得到了相當數量的網頁的內容，但隨後捕捉其餘由...

Traceback (most recent call last): 
    File "TATest.py", line 9, in <module> 
    for Line in fp: 
    File "d:\python32\lib\http\client.py", line 489, in read 
    return self._read_chunked(amt) 
    File "d:\python32\lib\http\client.py", line 545, in _read_chunked 
    self._safe_read(2) # toss the CRLF at the end of the chunk 
    File "d:\python32\lib\http\client.py", line 592, in _safe_read 
    raise IncompleteRead(b''.join(s), amt) 
http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)

試圖讀取另一頁產量受挫...

Traceback (most recent call last): 
    File "TATest.py", line 11, in <module> 
    print(Line.decode("utf8").rstrip('\n')) 
    File "d:\python32\lib\encodings\cp1252.py", line 19, in encode 
    return codecs.charmap_encode(input,self.errors,encoding_table)[0] 
UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 
21: character maps to <undefined>

我相信這是一個Windows的問題，但可以蟒蛇進行更強大的處理與是什麼造成的？在Linux上嘗試類似的代碼（版本2.6代碼）時，我們不會遇到問題。有沒有解決的辦法？我也發佈到gmane.comp.python.devel新聞組

來源

2011-11-15 Thom Ives

看起來您正在閱讀的頁面編碼爲cp1252。

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read() 
fp.close() 
print(string.decode("cp1252"))

應該工作。

There are many方式來指定內容的字符集，但使用的HTTP標頭應該能滿足大多數網頁：

import urllib.request 

fp = urllib.request.urlopen(URL_string_that_I_use) 

string = fp.read().decode(fp.info().get_content_charset()) 
fp.close() 
print(string)

來源

2014-06-30 10:56:25

感謝塞斯。我一會兒沒有看這個，只是現在才意識到你已經回答了。我相信這將在未來有價值。 –

@ThomIves不客氣。如果解決方案爲您工作，請將其標記爲已接受。 –

Python 2.6和3.2的問題在Windows上的urlopen例程

回答

相關問題