python-requests：獲取響應內容的頭部而不消耗所有內容

使用python-requests和python-magic，我想測試一個web資源的mime類型而不需要獲取它的所有內容（特別是如果這個資源恰好是例如一個ogg文件或一個PDF文件）。根據結果，我可能決定把它全部取出。然而，在測試了mime類型之後調用text方法只返回尚未被使用的東西。如何在不消耗響應內容的情況下測試MIME類型？python-requests：獲取響應內容的頭部而不消耗所有內容

下面是我目前的代碼。

import requests 
import magic 


r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) 
mime = magic.from_buffer(r.iter_content(256).next(), mime=True) 

if mime == "text/html": 
    print(r.text) # I'd like r.text to give me the entire response content

謝謝！

來源

2012-11-02 user1415785

注意：在問這個問題的時候，正確的方法只提取標題流正在使用prefetch=False。該選項已被重命名爲stream，並且布爾值被反轉，因此您需要stream=True。

原來的答案如下。

一旦你使用iter_content()，你必須繼續使用它; .text間接使用引擎蓋下的相同接口（通過.content）。

換句話說，通過使用iter_content()可言，你所要做的工作.text做手工：

from requests.compat import chardet 

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) 
peek = r.iter_content(256).next() 
mime = magic.from_buffer(peek, mime=True) 

if mime == "text/html": 
    contents = peek + b''.join(r.iter_content(10 * 1024)) 
    encoding = r.encoding 
    if encoding is None: 
     # detect encoding 
     encoding = chardet.detect(contents)['encoding'] 
    try: 
     textcontent = str(contents, encoding, errors='replace') 
    except (LookupError, TypeError): 
     textcontent = str(contents, errors='replace') 
    print(textcontent)

假設你使用Python 3

另一種方法是使2個請求：

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) 
mime = magic.from_buffer(r.iter_content(256).next(), mime=True) 

if mime == "text/html": 
    print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

的Python版本2：

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) 
peek = r.iter_content(256).next() 
mime = magic.from_buffer(peek, mime=True) 

if mime == "text/html": 
    contents = peek + ''.join(r.iter_content(10 * 1024)) 
    encoding = r.encoding 
    if encoding is None: 
     # detect encoding 
     encoding = chardet.detect(contents)['encoding'] 
    try: 
     textcontent = unicode(contents, encoding, errors='replace') 
    except (LookupError, TypeError): 
     textcontent = unicode(contents, errors='replace') 
    print(textcontent)

來源

2012-11-02 15:14:05

謝謝，我會試試這個！ – user1415785

嘿，我不能設法得到第一個解決方案的工作：在用「r」替換「self」的引用後，我收到一條錯誤消息：「RuntimeError：此響應的內容已被佔用」。任何想法？謝謝！ – user1415785

@ user1415785：對不起，我的錯誤;用'contents'替換'self.content'。這是來自'.text'源頭的或多或少的直接翻譯。 –

如果'content-type'足夠，您可以發出HTTP'Head'請求而不是'Get'，以僅接收HTTP標頭。

import requests 

url = 'http://www.december.com/html/demo/hello.html' 
response = requests.head(url) 
print response.headers['content-type']

來源

2012-11-02 16:03:08

謝謝。事實上，它會更容易，但我希望在聲明的內容類型錯誤的情況下使用python-magic作爲第二個意見。 – user1415785

python-requests：獲取響應內容的頭部而不消耗所有內容

回答

相關問題