2012-05-29 43 views
3

我要讓HTTP請求,而不具有依賴性,打開插座連接到捲曲和allow_url_fopen = 1和發送原始的HTTP請求:因爲我使用HTTP/1.1如何在製作原始HTTP請求時輕鬆解碼HTTP分塊的編碼字符串?

/** 
* Make HTTP GET request 
* 
* @param string the URL 
* @param int  will be filled with HTTP response status code 
* @param string will be filled with HTTP response header 
* @return string HTTP response body 
*/ 
function http_get_request($url, &$http_code = '', &$res_head = '') 
{ 
    $scheme = $host = $user = $pass = $query = $fragment = ''; 
    $path = '/'; 
    $port = substr($url, 0, 5) == 'https' ? 443 : 80; 

    extract(parse_url($url)); 

    $path .= ($query ? "?$query" : '').($fragment ? "#$fragment" : ''); 

    $head = "GET $path HTTP/1.1\r\n" 
     . "Host: $host\r\n" 
     . "Authorization: Basic ".base64_encode("$user:$pass")."\r\n" 
     . "Connection: close\r\n\r\n"; 

    $fp = fsockopen($scheme == 'https' ? "ssl://$host" : $host, $port) or 
    die('Cannot connect!'); 

    fputs($fp, $head); 
    while(!feof($fp)) { 
    $res .= fgets($fp, 4096); 
    } 
    fclose($fp); 

    list($res_head, $res_body) = explode("\r\n\r\n", $res, 2); 
    list(, $http_code,) = explode(' ', $res_head, 3); 

    return $res_body; 
} 

功能工作正常,但響應正文通常在Chunked-encoded字符串中返回。例如(維基百科):

25 
This is the data in the first chunk 

1C 
and this is the second one 

3 
con 
8 
sequence 
0 

我不想使用http_chunked_decode()因爲它有PECL依賴,我想一個高度可移植的代碼。

如何輕鬆解碼HTTP分塊編碼的字符串,以便我的函數可以返回原始HTML?我還必須確保解碼字符串的長度與Content-Length:標題匹配。

任何幫助,將不勝感激。謝謝。

+0

【如何正確處理分塊編碼的要求嗎?(http://stackoverflow.com/questions/3289574/how-to-handle-chunked-encoding-request-properly) –

+0

[這個問題的可能重複](http://stackoverflow.com/q/3289574/1396314)與我的問題有點相似。但選擇的答案是*太臃腫*。我正在使用那裏的代碼更簡單的解決方案。我希望這個問題不會被關閉:) – flowfree

+0

該答案中的代碼不是那麼大,它只是很好的評論:) –

回答

9

由於該函數返回HTTP響應頭,因此應檢查'Transfer-Encoding'是否爲'chunked',然後解碼分塊編碼的字符串。 在僞代碼:

CALL parse_http_header 
IF 'Transfer-Encoding' IS 'chunked' 
    CALL decode_chunked 

解析HTTP響應報頭:

下面是對HTTP響應報頭解析到關聯數組的功能。

function parse_http_header($str) 
{ 
    $lines = explode("\r\n", $str); 
    $head = array(array_shift($lines)); 
    foreach ($lines as $line) { 
    list($key, $val) = explode(':', $line, 2); 
    if ($key == 'Set-Cookie') { 
     $head['Set-Cookie'][] = trim($val); 
    } else { 
     $head[$key] = trim($val); 
    } 
    } 
    return $head; 
} 

該函數將返回一個數組是這樣的:

Array 
(
    [0] => HTTP/1.1 200 OK 
    [Expires] => Tue, 31 Mar 1981 05:00:00 GMT 
    [Content-Type] => text/html; charset=utf-8 
    [Transfer-Encoding] => chunked 
    [Set-Cookie] => Array 
     (
      [0] => k=10.34; path=/; expires=Sat, 09-Jun-12 01:58:23 GMT; domain=.example.com 
      [1] => guest_id=v1%3A13; domain=.example.com; path=/; expires=Mon, 02-Jun-2014 13:58:23 GMT 
     ) 
    [Content-Length] => 43560 
) 

通知之Set-Cookie頭如何被解析到數組。您需要稍後解析Cookie以將URL與需要發送的Cookie相關聯。


解碼分塊編碼的字符串

下面的函數把分塊編碼的字符串作爲參數,並返回 已解碼的字符串。

function decode_chunked($str) { 
    for ($res = ''; !empty($str); $str = trim($str)) { 
    $pos = strpos($str, "\r\n"); 
    $len = hexdec(substr($str, 0, $pos)); 
    $res.= substr($str, $pos + 2, $len); 
    $str = substr($str, $pos + 2 + $len); 
    } 
    return $res; 
} 

// Given the string in the question, the function above will returns: 
// 
// This is the data in the first chunk 
// and this is the second one 
// consequence 
+0

@rdlowrey我編輯了我的答案。感謝您的更正。 – flowfree

+0

謝謝 - 這很快:) – rdlowrey

+0

刪除我所有的字符串(我的字符串是JSON) – user3770797

2

我不知道它是否最適合您的需求,但是,如果您指定HTTP/1.0而不是HTTP/1.1,則不會得到分塊響應。

+0

是的。但HTTP/1中有一些很酷的功能。1,我想在我的函數中實現。 – flowfree

0

這個函數在Wordpress中使用。

function decode_chunked($data) { 
    if (!preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', trim($data))) { 
     return $data; 
    } 



    $decoded = ''; 
    $encoded = $data; 

    while (true) { 
     $is_chunked = (bool) preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', $encoded, $matches); 
     if (!$is_chunked) { 
      // Looks like it's not chunked after all 
      return $data; 
     } 

     $length = hexdec(trim($matches[1])); 
     if ($length === 0) { 
      // Ignore trailer headers 
      return $decoded; 
     } 

     $chunk_length = strlen($matches[0]); 
     $decoded .= substr($encoded, $chunk_length, $length); 
     $encoded = substr($encoded, $chunk_length + $length + 2); 

     if (trim($encoded) === '0' || empty($encoded)) { 
      return $decoded; 
     } 
    } 

    // We'll never actually get down here 
    // @codeCoverageIgnoreStart 
}