Python urllib開放問題

我想從獲取數據http://book.libertorrent.com/，但目前我失敗了，因爲一些額外的數據（標題）出現在響應。我的代碼是非常簡單的：Python urllib開放問題

response = urllib.urlopen('http://book.libertorrent.com/login.php') 
f = open('someFile.html', 'w') 
f.write(response.read())

閱讀（）回報：

Date: Fri, 09 Nov 2012 07:36:54 GMT 
Content-Type: text/html; charset=utf-8 
Transfer-Encoding: chunked 
Connection: close 
Cache-Control: no-cache, pre-check=0, post-check=0 
Expires: 0 
Pragma: no-cache 
Set-Cookie: bb_test=973132321; path=/; domain=book.libertorrent.com 
Content-Language: ru 

1ec0 
...Html... 
0

而且response.info（）是空的。

有什麼方法可以糾正響應嗎？

來源

2012-11-10 maravan

response.read（）之後，response.getcode（）會說什麼？在我的Mac上，response.read（）返回html，而.getcode（）返回200，這是OK（成功）。 –

你的方法通常有效;當我嘗試使用該網站時，我遇到了同樣的問題... –

我也是，有趣的是它適用於Python 3. – poke

讓我們試試這個：

$ echo -ne "GET /index.php HTTP/1.1\r\nHost: book.libertorrent.com\r\n\r\n" | nc book.libertorrent.com 80 | head -n 10 
HTTP/1.1 200 OK 
WWW 
Date: Sat, 10 Nov 2012 17:41:57 GMT 
Content-Type: text/html; charset=utf-8 
Transfer-Encoding: chunked 
Connection: keep-alive 
Content-Language: ru 

1f57 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html dir="ltr">

請參閱「WWW」，在第二行？這不是有效的HTTP頭，我猜這就是在這裏拋出響應解析器。

順便說一句，python2和python3表現不同的位置：

python2看來這個無效的標題爲內容後，立即解釋什麼
python3忽略所有頭並繼續讀雙新行後的內容。由於標題被忽略，所以傳輸編碼也是如此，因此內容長度被解釋爲正文的一部分。

所以最後問題是服務器發送了一個無效的響應，應該在服務器端修復。

來源

2012-11-10 17:55:23 mata

我在Python3上嘗試此代碼，結果比Python2更好，頭信息存在於info（）。現在回覆如下：1ec0 ... Html ... 0 – maravan

Python urllib開放問題

回答

相關問題