2014-02-15 66 views
0

我打算使用http-conduit通過HTTP/HTTPS獲取大量數據。爲了有效地做到這一點,我想使用Accept-Encoding: deflate,gzip頭允許服務器(如果支持)以壓縮方式傳輸數據。http-conduit:使用gzip傳輸數據

但是,我想從中獲取的某些服務器似乎錯誤地響應了Content-Encoding: gzip標頭,而未返回gzip數據。

所以我需要處理這些案件:

  • 服務器不支持壓縮 - >返回平原響應主體
  • 服務器返回gzip壓縮/癟內容 - >返回解壓縮響應主體
  • 服務器(在響應頭文件中返回gzip內容,但gzip解碼失敗 - >返回明文響應主體

在第三種情況下,它可以(在此特定情況下)安全地假定,即明文,未壓縮的數據看起來並不像gzip的數據,所以響應頭說,這是gzip壓縮& &未gzip的失敗是完全等價於的數據不被壓縮

我該如何使用http-conduit來做到這一點?

注意:此問題故意不顯示研究工作,因爲它已立即通過Q & A風格的方式回答。

回答

0

爲了使這個回答更加簡潔,我們將使用代碼的一些我以前的帖子&概念:

  • simpleHttpWithManagerhere
  • 容錯的gzip /緊縮從here
解碼

爲了避免冗餘,我將首先解釋基本步驟,然後提供一個完整的示例。

首先,我們將處理髮送標題。請注意,如果http-types包含,hAcceptEncoding未預定義。除此之外,這是一項微不足道的任務。

發送請求後,我們需要檢查是否有Content-Encoding。如果沒有,我們將假設未壓縮的明文,否則我們需要檢查它是否是gzipdeflate。在這種情況下,哪一個確切無關緊要,因爲[zlib]支持通過報頭進行自動檢測。

對於這個簡單的例子,我們只是假設,如果服務器返回一個Content-Encoding值既不是不存在的,也不gzip也不deflate,響應不被壓縮。由於我們不允許(通過Accept-Encoding)諸如sdch之類的其他按鈕,服務器將通過這種方式違反HTTP標準。

如果我們檢測到壓縮編碼,我們會嘗試解壓並返回它。如果失敗或者根本沒有壓縮數據,我們會返回普通的響應主體。

這裏的例子:

{-# LANGUAGE OverloadedStrings #-} 
import Network.HTTP.Conduit 
import Network.Connection 
import Codec.Compression.Zlib.Internal 
import Data.Maybe 
import Data.Either 
import Network.HTTP.Types 
import Data.ByteString.Char8 (ByteString) 
import qualified Data.ByteString.Lazy.Char8 as LB 

myurl :: String 
myurl = "http://stackoverflow.com" 

hAcceptEncoding :: HeaderName 
hAcceptEncoding = "Accept-Encoding" 

-- | The Accept-Encoding HTTP header value for allowing gzip or deflated responses 
gzipDeflateEncoding :: ByteString 
gzipDeflateEncoding = "gzip,deflate" 

-- HTTP header list that allows gzipped/deflated response 
compressionEnabledHeaders :: RequestHeaders 
compressionEnabledHeaders = [(hAcceptEncoding, gzipDeflateEncoding)] 

-- | Give an encoding string and a HTTP response object, 
-- Checks if the Content-Encoding header value of the response object 
-- is equal to the given encoding. Returns false if no ContentEncoding 
-- header exists in the given response, or if the value does not match 
-- the encoding parameter. 
hasResponseEncoding :: ByteString -> Response b -> Bool 
hasResponseEncoding encoding response = 
    let responseEncoding = lookup hContentEncoding headers 
     headers = responseHeaders response 
    in maybe False (== encoding) responseEncoding 

-- | Convert the custom error format from zlib to a Either 
decompressStreamToEither :: DecompressStream -> Either String LB.ByteString 
decompressStreamToEither (StreamError _ errmsg) = Left errmsg 
decompressStreamToEither [email protected](StreamChunk _ _) = Right $ fromDecompressStream stream 
decompressStreamToEither StreamEnd = Right $ "" 

-- | Decompress with explicit error handling 
safeDecompress :: LB.ByteString -> Either String LB.ByteString 
safeDecompress bstr = decompressStreamToEither $ decompressWithErrors gzipOrZlibFormat defaultDecompressParams bstr 

-- | Decompress gzip, if it fails, return uncompressed String 
decompressIfPossible :: LB.ByteString -> LB.ByteString 
decompressIfPossible bstr = 
    let conv (Left a) = bstr 
     conv (Right a) = a 
    in (conv . safeDecompress) bstr 

-- | Tolerantly decompress response body. As some HTTP servers set the header incorrectly, 
-- just return the plain response text if the compression fails 
decompressResponseBody :: Response LB.ByteString -> LB.ByteString 
decompressResponseBody res 
    | hasResponseEncoding "gzip" res = decompressIfPossible $ responseBody res 
    | hasResponseEncoding "deflate" res = decompressIfPossible $ responseBody res 
    | otherwise = responseBody res 

-- | Download like with simpleHttp, but using an existing manager for the task 
-- and automatically requesting & handling gzipped data 
simpleHttpWithAutoGzip :: Manager -> String -> IO LB.ByteString 
simpleHttpWithAutoGzip manager url = do req <- parseUrl url 
             let req' = req {requestHeaders = compressionEnabledHeaders} 
             fmap decompressResponseBody $ httpLbs req' manager 

-- Example usage 
main :: IO() 
main = do manager <- newManager conduitManagerSettings -- Create a simple manager 
      content <- simpleHttpWithAutoGzip manager "http://stackoverflow.com" 
      -- Print the uncompressed content 
      print $ content