是否可以查看urllib2響應中的數據？

我需要檢測HTTP響應中的字符編碼。要做到這一點，我看看標題，如果它沒有設置在內容類型標題中，我必須查看響應並查找「<meta http-equiv='content-type'>」標題。我希望能夠編寫外觀和工作方式是這樣的一個功能：是否可以查看urllib2響應中的數據？

response = urllib2.urlopen("http://www.example.com/") 
encoding = detect_html_encoding(response) 
... 
page_text = response.read()

但是，如果我這樣做response.read（）在我的「detect_html_encoding」的方法，那麼subseuqent response.read（）在對我的函數的調用後將失敗。

有沒有一種簡單的方法來查看讀取後的響應和/或倒帶？

來源

2009-08-20 John

def detectit(response): 
    # try headers &c, then, worst case...: 
    content = response.read() 
    response.read = lambda: content 
    # now detect based on content

當然的訣竅是確保response.read()將再次返回相同的東西，如果需要的話...這就是爲什麼我們分配lambda如果需要的話，也就是說，如果我們已經需要提取內容 - 確保可以再次提取相同的內容（並且再一次，...... ;-)）。

來源

2009-08-21 02:05:26

如果它是在HTTP標頭（而不是文檔本身），你可以使用response.info()檢測編碼

如果要解析HTML，保存響應數據：

page_text = response.read() 
encoding = detect_html_encoding(response, page_text)

來源

2009-08-20 20:30:44 orip

它可以是（1）在頭文件中，（2）在文檔中或（3）缺席（在這種情況下，我必須使用chardet根據文檔中的字符來檢測它）。我明顯可以提前提取文本，但我想要做的特別事情基本上是讓我避免這種類型的方法。 – John 2009-08-20 20:41:36

是否可以查看urllib2響應中的數據？

回答

相關問題