解析由tcpdump - 實體捕獲的HTTP響應爲空，但是在頭部之後有數據

我試圖用tcpdump捕獲的.pcap文件解析HTTP響應消息，使用pkts.io解析捕獲文件和Apache httpcommons解析消息。解析由tcpdump - 實體捕獲的HTTP響應爲空，但是在頭部之後有數據

在解析捕獲文件，我追加每個分組的有效負載（同Packet.getPayload()，doc獲得）是這樣的消息分成byte[] data的一部分。

如果我打印new String(data, "UTF-8")，我得到這個：

HTTP/1.1 200 OK 
    Server: nginx 
    Date: Fri, 10 Apr 2015 04:00:04 GMT 
    Content-Type: text/html; charset=utf-8 
    Transfer-Encoding: chunked 
    Connection: keep-alive 
    Keep-Alive: timeout=300 
    Vary: Accept-Encoding 
    Content-Encoding: gzip 
    1dd 
    ��������������S�n�0��+X_��� 
��q�b�a���������Ȓf�q��G�K�I��=���������χ/�rg�f�d"kʌ\�+1l���P 
]�\^�@r�{�k��;pģ﷐�7�=t� `C+5qg� 
...

Full response on pastebin

當我嘗試分析HTTP消息（下面的代碼），我得到的所有的頭還行，但resp.getEntity()回報null 。

SessionInputBufferImpl inBuffer = new SessionInputBufferImpl(new HttpTransportMetricsImpl(), packet.getData().length); 
InputStream inStream = new ByteArrayInputStream(packet.getData()); 
inBuffer.bind(inStream); 
DefaultHttpResponseParser respParser = new DefaultHttpResponseParser(inBuffer); 
HttpResponse resp = (HttpResponse) respParser.parse();

我在哪裏可以從這裏嘗試獲取文本的響應主體？

來源

2015-04-12 Michelle

當獲取實體主體時，您需要查看Transfer-Encoding和Content-Encoding，並進行相應的解碼。請參閱section 4 "Transfer Codings" of RFC 7230。

查看HttpComponents中的類，例如ChunkedInputStream（用於分塊傳輸編碼），並查找可以解壓縮gzip gzip文本（用於gzip內容編碼）的代碼。

來源

2015-04-13 01:10:12

這兩個只適用於消息的內容，但是，不是？（也就是說，如果實體不爲空，我想包裝由'someHttpResponse.getEntity（）。getContent（）'返回的'InputStream'。） – Michelle

如果HTTP響應真的在* Content-編碼：gzip'和'1dd'，它的格式不正確，因爲頭字段和主體之間應該有一個空行（CRLF）。如果是這樣，那麼你不應該指望它是可解析的，儘管HttpComponents應該指出一個錯誤。如果它有一個空白行，你應該得到一個實體，如果HttpComponents沒有爲你解壓並解壓縮它，你需要把東西搞定，這樣它才能做到。 – 2015-04-14 18:34:35

呵呵。我嘗試手動獲取內容，並且看到了雙重CRLF，並且可以使用封裝在gzip流中的分塊流中封裝的字節數組流進行解析。但是'HttpResponse'對象仍然爲實體返回null。 – Michelle

我無法獲得HttpResponse.getEntity()的工作，所以我不得不自己解析響應。這是我扔在一起的代碼。它遍歷包含整個響應內容尋找空行分離的報頭字段和主體上的byte[]，並複製一切之後：

private byte[] getContent(byte[] message) { 
    int start = -1; 
    byte[] content = null; 
    for (int i = 0; i < message.length; ++i) { 
     if (start >= 0) { 
      content[i-start] = message[i]; 
      continue; 
     } 
     System.out.print((char)message[i]); 
     if (message[i] == (byte) 13 && message[i+1]==(byte)10 && message[i+2] == (byte) 13 && message[i+3]==(byte)10) { //CR 
      start = i+4; 
      content = new byte[message.length-(i+4)]; 
      i += 3; 
     } 
    } 
    return content; 
}

然後，如果響應具有Transfer-Encoding: chunked和Content-Encoding: gzip，我用ChunkedInputStream（來自HttpComponents）和GZIPInputStream從java.util獲得實際內容。

byte[] content = getContent(packet.getData()); 
if (content.length > 0) { 
    InputStream byteIS = new ByteArrayInputStream(content); 
    SessionInputBufferImpl contentBuf = new SessionInputBufferImpl(new HttpTransportMetricsImpl(), content.length); 
    contentBuf.bind(byteIS); 

    ChunkedInputStream chunkedIS = new ChunkedInputStream(contentBuf); 

    GZIPInputStream gzipIS = new GZIPInputStream(chunkedIS); 

    while (gzipIS.available() != 0) { 
     byte[] buf = new byte[128]; 
     gzipIS.read(buf); 
     contentBuilder.append(new String(buf, "UTF-8")); 
    } 
    gzipIS.close(); 
    String contentString = contentBuilder.toString(); 
}

來源

2015-08-22 14:00:52 Michelle

感謝您的支持。所以跛腳，不明白如何使用DefaultHttpResponseParser輕鬆準備好身體。 – Kyle

解析由tcpdump - 實體捕獲的HTTP響應爲空，但是在頭部之後有數據

回答

相關問題