《How Tomcat Works》学习（三）——连接器（一）——解析http请求路径与参数_综合

前言

本文内容根据《How Tomcat Works》第三章学习所写。书中第三章的程序比较庞大，实现了简单的请求解析、请求头部解析等。鉴于程序比较复杂，所以我自己实现了一个更简易的版本，很多特殊处理没做，不过代码简单一些。另外书中第三章内容比较多，我决定分成几篇文章来讲。本文的程序主要是为了讲解web服务器解析请求的路径与参数的过程到底做了什么。

程序

启动类

我们创建一个启动类，作为main入口。这里我们新建一个连接器，启动连接器

public final class Bootstrap {public static void main(String[] args) {HttpConnector connector = new HttpConnector();connector.start();}
}

连接器

连接器是个实现Runnable的类，就是说这个类是用来执行线程的。

public class HttpConnector implements Runnable {boolean stopped = false;private String scheme = "http";public String getScheme() {return scheme;}public void run() {ServerSocket serverSocket = null;int port = 8080;try {serverSocket = new ServerSocket(port, 1, InetAddress.getByName("127.0.0.1"));} catch (IOException e) {e.printStackTrace();System.exit(1);}while (!stopped) {Socket socket = null;try {socket = serverSocket.accept();} catch (Exception e) {continue;}// Hand this socket off to an HttpProcessorHttpProcessor processor = new HttpProcessor(this);processor.process(socket);}}public void start() {Thread thread = new Thread(this);thread.start();}
}

32-35行是通过启动类调用的，然后这里启动一个线程。然后线程执行run方法，创建ServerSocket并通过accept接受客户端的连接，类似前文中HttpServer类的功能。27行然后把请求处理统一交给HttpProcessor类。

处理类

处理类的功能承接了前文HttpServer类下面区分servlet与静态资源的处理，代码基本一致。

public class HttpProcessor {private HttpConnector httpConnector = null;public HttpProcessor(HttpConnector httpConnector) {this.httpConnector = httpConnector;}public void process(Socket socket) {InputStream input = null;OutputStream output = null;try {input = socket.getInputStream();output = socket.getOutputStream();HttpRequest request = new HttpRequest(input);request.parseRequest();Response response = new Response(output);response.setRequest(request);if (request.getRequestURI().startsWith("/servlet/")) {ServletProcessor processor = new ServletProcessor();processor.process(request, response);} else {StaticResourceProcessor processor = new StaticResourceProcessor();processor.process(request, response);}socket.close();} catch (Exception e) {e.printStackTrace();}}
}

16行的HttpRequest类是前文Request类的加强版。HttpRequest类是本文的重点。

HttpRequest类

HttpRequest继承了HttpServletRequest类，需要实现的方法非常多，这里我展现的代码先省略一部分，未用上的方法按集成开发环境默认生成的即可。

public class HttpRequest implements HttpServletRequest {private InputStream input = null;private String method = null;private String requestURI = null;private String protocol = null;private String queryString = null;private String requestedSessionId = null;private boolean requestedSessionURL = false;//判断请求参数是否已解析protected boolean parsed = false;protected Map<String, String[]> parameters = null;public HttpRequest(InputStream input) {this.input = input;}public void parseRequest() {StringBuffer request = new StringBuffer(2048);int i = 0;int index;byte[] buffer = new byte[2048];try {while(true){input.read(buffer, i, 1);if(buffer[i] == '\r') {i++;input.read(buffer, i, 1);if(buffer[i] == '\n') {break;}}i++;}} catch (IOException e) {e.printStackTrace();i = -1;}for (int j = 0; j < i; j++) {request.append((char) buffer[j]);}System.out.println(request);method = parseMethod(request.toString());protocol = parseProtocol(request.toString());String requestString = parseRequestString(request.toString());requestString = normalize(requestString);parseQueryString(requestString);parseSessionId();}//......
}

18行parseRequest是解析请求的主要实现方法。19-30行读取请求信息。32-33行截取HTTP请求的第一行。34行解析获取客户端请求方法，即“GET”、“POST”等。35行解析协议，如“HTTP/1.1”。36行获取http请求的字符串，37行对请求的字符串作标准化处理，38行分割请求路径和请求参数。下面逐个展示

获取请求方法

获取的方法非常简单，只需要截取http请求第一行第一个空格前的内容即可。另外要重写getMethod方法，让外部可以获取已解析的请求方法。

public class HttpRequest implements HttpServletRequest {private String method = null;//......private String parseMethod(String requestString) {int index1;index1 = requestString.indexOf(' ');if (index1 != -1) {return requestString.substring(0, index1);}return null;}@Overridepublic String getMethod() {return method;}//......
}

获取协议

和获取请求方法差不多，获取的是http请求第一行第二个空格之后的内容。另外重写getProtocol方法

public class HttpRequest implements HttpServletRequest {private String protocol = null;//......private String parseProtocol(String requestString) {int index1, index2;index1 = requestString.indexOf(' ');if (index1 != -1) {index2 = requestString.indexOf(' ', index1 + 1);if (index2 > index1)return requestString.substring(index2 + 1);}return null;}@Overridepublic String getProtocol() {return protocol;}//......
}

获取请求URL

这里获取的是http请求第一行第一个空格和第二个空格之间的内容。不过这不一定是单纯的路径，还可能包括请求参数和SessionId，所以后面还需要进行拆分。

public class HttpRequest implements HttpServletRequest {//......private String parseRequestString(String requestString) {int index1, index2;index1 = requestString.indexOf(' ');if (index1 != -1) {index2 = requestString.indexOf(' ', index1 + 1);if (index2 > index1) {return requestString.substring(index1 + 1, index2);}}return null;}//......
}

标准化处理

这里代码挺长的，基本就是对请求路径做一些标准化的处理，例如请求路径是“/.”，就把它改为“/”。

	protected String normalize(String path) {if (path == null)return null;String normalized = path;if (normalized.startsWith("/%7E") || normalized.startsWith("/%7e"))normalized = "/~" + normalized.substring(4);if ((normalized.indexOf("%25") >= 0) || (normalized.indexOf("%2F") >= 0) || (normalized.indexOf("%2E") >= 0)|| (normalized.indexOf("%5C") >= 0) || (normalized.indexOf("%2f") >= 0)|| (normalized.indexOf("%2e") >= 0) || (normalized.indexOf("%5c") >= 0)) {return null;}if (normalized.equals("/."))return "/";if (normalized.indexOf('\\') >= 0)normalized = normalized.replace('\\', '/');if (!normalized.startsWith("/"))normalized = "/" + normalized;while (true) {int index = normalized.indexOf("//");if (index < 0)break;normalized = normalized.substring(0, index) + normalized.substring(index + 1);}while (true) {int index = normalized.indexOf("/./");if (index < 0)break;normalized = normalized.substring(0, index) + normalized.substring(index + 2);}while (true) {int index = normalized.indexOf("/../");if (index < 0)break;if (index == 0)return (null);int index2 = normalized.lastIndexOf('/', index - 1);normalized = normalized.substring(0, index2) + normalized.substring(index + 3);}if (normalized.indexOf("/...") >= 0)return (null);return (normalized);}

分割请求路径和请求参数

前面讲到了请求的字符串可能会包含请求参数，因此需要分割。请求路径和参数使用问号“?”进行分割，因此我们只需找出这个符号进行分割即可。

public class HttpRequest implements HttpServletRequest {private String requestURI = null;private String queryString = null;//......private void parseQueryString(String requestString) {int index;index = requestString.indexOf('?');if (index != -1) {requestURI = requestString.substring(0, index);queryString = requestString.substring(index + 1);} else {requestURI = requestString;}}@Overridepublic String getRequestURI() {return requestURI;}@Overridepublic String getQueryString() {return queryString;}
//......
}

测试

写一个servlet类进行测试

import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;public class ModernServlet extends HttpServlet {public void init(ServletConfig config) {System.out.println("ModernServlet -- init");}public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {response.setContentType("text/html");PrintWriter out = response.getWriter();out.print("HTTP/1.1 200 OK\r\n\r\n");out.println("<html>");out.println("<head>");out.println("<title>Modern Servlet</title>");out.println("</head>");out.println("<body>");out.println("<br><h2>Method</h2");out.println("<br>" + request.getMethod());out.println("<br><h2>Query String</h2");out.println("<br>" + request.getQueryString());out.println("<br><h2>Request URI</h2");out.println("<br>" + request.getRequestURI());out.println("</body>");out.println("</html>");}
}

启动Bootstrap类，然后打开浏览器输入http://localhost:8080/servlet/ModernServlet?username=jack&password=123456

我们成功获取解析内容

小结

本文实现的程序属于精简版，仅仅为了讲述web服务器大致的工作过程，很多特殊处理是无法适配的。即使书中所附的代码也远比本文的复杂，tomcat的处理则更复杂了。读者可以循序渐进，先了解大致其工作原理，在深入源码学习。