2015-11-27 93 views
2

鑑於以下文件包含HTTP管道化的HTTP請求和HTTP響應流。解析來自文本文件的HTTP請求和響應

我該如何解析這個文件到我的stream變量中?

type Connection struct{ 
    Request *http.Request 
    Response *http.Response 
} 
stream := make([]Connection, 0) 

原始檔案:

GET /ubuntu/dists/trusty/InRelease HTTP/1.1 
Host: archive.ubuntu.com 
Cache-Control: max-age=0 
Accept: text/* 
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2) 

HTTP/1.1 404 Not Found 
Date: Thu, 26 Nov 2015 18:26:36 GMT 
Server: Apache/2.2.22 (Ubuntu) 
Vary: Accept-Encoding 
Content-Length: 311 
Content-Type: text/html; charset=iso-8859-1 

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
<html><head> 
<title>404 Not Found</title> 
</head><body> 
<h1>Not Found</h1> 
<p>The requested URL /ubuntu/dists/trusty/InRelease was not found on this server.</p> 
<hr> 
<address>Apache/2.2.22 (Ubuntu) Server at archive.ubuntu.com Port 80</address> 
</body></html> 
GET /ubuntu/dists/trusty-updates/InRelease HTTP/1.1 
Host: archive.ubuntu.com 
Cache-Control: max-age=0 
Accept: text/* 
User-Agent: Debian APT-HTTP/1.3 (1.0.1ubuntu2) 

HTTP/1.1 200 OK 
Date: Thu, 26 Nov 2015 18:26:37 GMT 
Server: Apache/2.2.22 (Ubuntu) 
Last-Modified: Thu, 26 Nov 2015 18:03:00 GMT 
ETag: "fbb7-5257562a5fd00" 
Accept-Ranges: bytes 
Content-Length: 64439 
Cache-Control: max-age=382, proxy-revalidate 
Expires: Thu, 26 Nov 2015 18:33:00 GMT 

-----BEGIN PGP SIGNED MESSAGE----- 
Hash: SHA512 

Origin: Ubuntu 
Label: Ubuntu 
Suite: trusty-updates 
Version: 14.04 
Codename: trusty 
[... truncated by author] 

我知道有http.ReadRequest。關於響應呢?任何想法/反饋/想法都會受到讚賞。

回答

2

這其實很簡單:

package main 

import (
    "bufio" 
    "bytes" 
    "fmt" 
    "io" 
    "io/ioutil" 
    "log" 
    "net/http" 
    "net/http/httputil" 
    "os" 
) 

type Connection struct { 
    Request *http.Request 
    Response *http.Response 
} 

func ReadHTTPFromFile(r io.Reader) ([]Connection, error) { 
    buf := bufio.NewReader(r) 
    stream := make([]Connection, 0) 

    for { 
     req, err := http.ReadRequest(buf) 
     if err == io.EOF { 
      break 
     } 
     if err != nil { 
      return stream, err 
     } 

     resp, err := http.ReadResponse(buf, req) 
     if err != nil { 
      return stream, err 
     } 

     //save response body 
     b := new(bytes.Buffer) 
     io.Copy(b, resp.Body) 
     resp.Body.Close() 
     resp.Body = ioutil.NopCloser(b) 

     stream = append(stream, Connection{Request: req, Response: resp}) 
    } 
    return stream, nil 

} 
func main() { 
    f, err := os.Open("/tmp/test.http") 
    if err != nil { 
     log.Fatalln(err) 
    } 
    defer f.Close() 
    stream, err := ReadHTTPFromFile(f) 
    if err != nil { 
     log.Fatalln(err) 
    } 
    for _, c := range stream { 
     b, err := httputil.DumpRequest(c.Request, true) 
     if err != nil { 
      log.Fatalln(err) 
     } 
     fmt.Println(string(b)) 
     b, err = httputil.DumpResponse(c.Response, true) 
     if err != nil { 
      log.Fatalln(err) 
     } 
     fmt.Println(string(b)) 
    } 
} 

的幾個注意事項:

  • http.ReadRequesthttp.ReadResponse
  • http.ReadRequesthttp.ReadResponse可以一遍又一遍被稱爲在同一budio.Reader直到EOF它會「正常工作」
    • 「正在工作」取決於Content-Length頭是否存在且正確,因此讀取主體會將讀取器置於下一個請求/響應的開始處,以便準確理解什麼會起作用,牛逼
  • resp.Body必須按照文檔Close版,所以我們必須把它複製到另一個緩衝區,以保持它
  • 使用您的示例數據(修改內容長度以匹配您的截斷),此代碼將輸出給出的相同請求和響應
  • httputil.DumpRequesthttputil.DumpResponse不一定會傾倒在相同的順序輸入文件中的HTTP頭,所以不要指望一個diff是完美
+0

這是偉大的!非常感謝。我一定錯過了http.ReadResponse函數。我喜歡這個事實,即一遍又一遍地調用讀取函數就可以工作! – mattes

+0

是的,我很驚訝它的工作也。但它基本上處理Content-Length,所以它很有意義。如果你在請求/響應之間有某種分隔符並且使用諸如'io.LimitedReader'之類的東西來確保你不會被錯誤的HTTP響應拋出,那麼可能會更好。 – korylprince