解析原始HTTP请求

我正在研究HTTP流量数据集,它由完整的POST和GET请求组成,如下所示。 我在java中编写了代码,将每个请求分开并将其保存为数组列表中的字符串元素。 现在我很困惑如何在java中解析这些原始HTTP请求有没有比手动解析更好的方法?

GET http://localhost:8080/tienda1/imagenes/3.gif/ HTTP/1.1 User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no-cache Cache-control: no-cache Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, *;q=0.5 Accept-Language: en Host: localhost:8080 Cookie: JSESSIONID=FB018FFB06011CFABD60D8E8AD58CA21 Connection: close 

我正在[一] HTTP流量数据集上工作,该流量数据集由完整的POST和GET请求组成[s]

因此,您要解析包含多个HTTP请求的文件或列表。 您要提取哪些数据? 无论如何, 这里是一个Java HTTP解析类,它可以读取请求行中使用的方法,版本和URI,并将所有头读取到Hashtable中。

如果您想重新发明轮子,可以使用那个或自己写一个。 查看RFC以查看请求是什么样的,以便正确解析它:

 Request = Request-Line ; Section 5.1 *(( general-header ; Section 4.5 | request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [ message-body ] ; Section 4.3 

这是一个通用的Http请求解析器,适用于所有方法类型(GET,POST等),以方便您:

  package util.dpi.capture; import java.io.BufferedReader; import java.io.IOException; import java.io.StringReader; import java.util.Hashtable; /** * Class for HTTP request parsing as defined by RFC 2612: * * Request = Request-Line ; Section 5.1 (( general-header ; Section 4.5 | * request-header ; Section 5.3 | entity-header ) CRLF) ; Section 7.1 CRLF [ * message-body ] ; Section 4.3 * * @author izelaya * */ public class HttpRequestParser { private String _requestLine; private Hashtable _requestHeaders; private StringBuffer _messagetBody; public HttpRequestParser() { _requestHeaders = new Hashtable(); _messagetBody = new StringBuffer(); } /** * Parse and HTTP request. * * @param request * String holding http request. * @throws IOException * If an I/O error occurs reading the input stream. * @throws HttpFormatException * If HTTP Request is malformed */ public void parseRequest(String request) throws IOException, HttpFormatException { BufferedReader reader = new BufferedReader(new StringReader(request)); setRequestLine(reader.readLine()); // Request-Line ; Section 5.1 String header = reader.readLine(); while (header.length() > 0) { appendHeaderParameter(header); header = reader.readLine(); } String bodyLine = reader.readLine(); while (bodyLine != null) { appendMessageBody(bodyLine); bodyLine = reader.readLine(); } } /** * * 5.1 Request-Line The Request-Line begins with a method token, followed by * the Request-URI and the protocol version, and ending with CRLF. The * elements are separated by SP characters. No CR or LF is allowed except in * the final CRLF sequence. * * @return String with Request-Line */ public String getRequestLine() { return _requestLine; } private void setRequestLine(String requestLine) throws HttpFormatException { if (requestLine == null || requestLine.length() == 0) { throw new HttpFormatException("Invalid Request-Line: " + requestLine); } _requestLine = requestLine; } private void appendHeaderParameter(String header) throws HttpFormatException { int idx = header.indexOf(":"); if (idx == -1) { throw new HttpFormatException("Invalid Header Parameter: " + header); } _requestHeaders.put(header.substring(0, idx), header.substring(idx + 1, header.length())); } /** * The message-body (if any) of an HTTP message is used to carry the * entity-body associated with the request or response. The message-body * differs from the entity-body only when a transfer-coding has been * applied, as indicated by the Transfer-Encoding header field (section * 14.41). * @return String with message-body */ public String getMessageBody() { return _messagetBody.toString(); } private void appendMessageBody(String bodyLine) { _messagetBody.append(bodyLine).append("\r\n"); } /** * For list of available headers refer to sections: 4.5, 5.3, 7.1 of RFC 2616 * @param headerName Name of header * @return String with the value of the header or null if not found. */ public String getHeaderParam(String headerName){ return _requestHeaders.get(headerName); } } 

如果您只想按原样发送原始请求,那么非常简单,只需使用TCP套接字发送实际的字符串即可!

像这样的东西:

  Socket socket = new Socket(host, port); BufferedWriter out = new BufferedWriter( new OutputStreamWriter(socket.getOutputStream(), "UTF8")); for (String line : getContents(request)) { System.out.println(line); out.write(line + "\r\n"); } out.write("\r\n"); out.flush(); 

有关完整代码,请参阅JoeJag撰写的这篇博客文章 。

UPDATE

我启动了一个项目, RawHTTP为请求,响应,标题等提供HTTP解析器……结果很好,它使得在它上面编写HTTP服务器和客户端变得非常容易。 如果您正在寻找低水平的东西,请查看它。