一.引子
下面的servlet能显示中文吗?
public class SimpleServlet extends HttpServlet{
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException,
java.io.IOException{
resp.setContentType("text/html");
PrintWriter w = resp.getWriter();
System.out.println("响应的输出编码器:" + resp.getCharacterEncoding());
w.println("<html>");
w.println("<!DOCTYPE html PUBLIC /"-//W3C//DTD XHTML 1.0 Strict//EN/" /"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd/">");
w.println("<head>");
w.println("<title>文字与图片</title>");
w.println("</head>");
w.println("<body>");
w.println("文字与图片");
w.println("<hr />");
//w.println("<img src="gif/aa.jpg" />");
w.println("</body>");
w.println("</html>");
w.flush();
}
}
结果是:
?????
二.看服务器传回浏览器的原始包
[第一部份,包头(以字串方式显示)]
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=ISO-8859-1
Transfer-Encoding: chunked
Date: Wed, 22 Oct 2008 08:37:29 GMT
[第二部分,包体(以二进制方式显示)]
c7[第一个包体长度]
485454502F312E3120323030204F4B0D0A5365727665723A204170616368652D436F796F74652F312E310D0A436F6E74656E742D547970653A20746578742F68746D6C3B636861727365743D49534F2D383835392D310D0A5472616E736665722D456E636F64696E673A206368756E6B65640D0A446174653A205765642C203232204F637420323030382030383A33373A323920474D540D0A0D0A63370D0A3C68746D6C3E0D0A3C21444F43545950452068746D6C205055424C494320222D2F2F5733432F2F445444205848544D4C20312E30205374726963742F2F454E222022687474703A2F2F7777772E77332E6F72672F54522F7868746D6C312F4454442F7868746D6C312D7374726963742E647464223E0D0A3C686561643E0D0A3C7469746C653E3F3F3F3F3F3C2F7469746C653E0D0A3C2F686561643E0D0A3C626F64793E0D0A3F3F3F3F3F0D0A3C6872202F3E0D0A3C2F626F64793E0D0A3C2F68746D6C3E0D0A0D0A300D0A0D0A
0[第二个包体长度]
从Content-Type中看出包体的编码方式是ISO-8859-1,这是Servlet的默认编码方式.以ISO-8859-1来翻译第一个包体的内容:
<html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/x
html1/DTD/xhtml1-strict.dtd">
<head>
<title>?????</title>
</head>
<body>
?????
<hr />
</body>
</html>
三.原因
尽管我们在Servlet中设置了中文,但容器框架会把中文翻成ISO-8859-1编码,而ISO-8859-1并不支持中文,所以翻成了?????
四.解决
1.设置编码方式来解决问题:
public class SimpleServlet extends HttpServlet{
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException,
java.io.IOException{
resp.setContentType("text/html;charset=gb2312");
...
}
}
结果:
文字与图片
原始包:
485454502F312E3120323030204F4B0D0A5365727665723A204170616368652D436F796F74652F312E310D0A436F6E74656E742D547970653A20746578742F68746D6C3B636861727365743D6762323331320D0A5472616E736665722D456E636F64696E673A206368756E6B65640D0A446174653A205765642C203232204F637420323030382030383A34343A343020474D540D0A0D0A64310D0A3C68746D6C3E0D0A3C21444F43545950452068746D6C205055424C494320222D2F2F5733432F2F445444205848544D4C20312E30205374726963742F2F454E222022687474703A2F2F7777772E77332E6F72672F54522F7868746D6C312F4454442F7868746D6C312D7374726963742E647464223E0D0A3C686561643E0D0A3C7469746C653ECEC4D7D6D3EBCDBCC6AC3C2F7469746C653E0D0A3C2F686561643E0D0A3C626F64793E0D0ACEC4D7D6D3EBCDBCC6AC0D0A3C6872202F3E0D0A3C2F626F64793E0D0A3C2F68746D6C3E0D0A0D0A
CEC4D7D6D3EBCDBCC6AC正是"文字与图片"的gb2312编码
2.一定得设成gb2312吗?再试试UTF-8
public class SimpleServlet extends HttpServlet{
public void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException,
java.io.IOException{
resp.setContentType("text/html;charset=UTF-8");
...
}
}
结果:
文字与图片
仍是正确的,这是由于UTF-8支持汉字,不会以3f3f3f3f3f来代替.
原始包:
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Date: Wed, 22 Oct 2008 08:48:14 GMT
db
485454502F312E3120323030204F4B0D0A5365727665723A204170616368652D436F796F74652F312E310D0A436F6E74656E742D547970653A20746578742F68746D6C3B636861727365743D5554462D380D0A5472616E736665722D456E636F64696E673A206368756E6B65640D0A446174653A205765642C203232204F637420323030382030383A34383A313420474D540D0A0D0A64620D0A3C68746D6C3E0D0A3C21444F43545950452068746D6C205055424C494320222D2F2F5733432F2F445444205848544D4C20312E30205374726963742F2F454E222022687474703A2F2F7777772E77332E6F72672F54522F7868746D6C312F4454442F7868746D6C312D7374726963742E647464223E0D0A3C686561643E0D0A3C7469746C653EE69687E5AD97E4B88EE59BBEE789873C2F7469746C653E0D0A3C2F686561643E0D0A3C626F64793E0D0AE69687E5AD97E4B88EE59BBEE789870D0A3C6872202F3E0D0A3C2F626F64793E0D0A3C2F68746D6C3E0D0A0D0A
0
E69687E5AD97E4B88EE59BBEE78987是"图片与文字"的UTF-8编码
五.meta数据中的Content-Type
1.我们也可以在页面的<head>中通过<meta>来设置文件编码
<html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/x
html1/DTD/xhtml1-strict.dtd">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
<title>图片与文字</title>
</head>
<body>
图片与文字
<hr />
</body>
</html>
浏览器会按UTF-8来解析页面.
2.resp.setContentType("text/html;charset=gb2312")与<meta>都指定了,并且不一致.哪一个有效呢?
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
author: holly
Content-Type: text/html;charset=gb2312
Transfer-Encoding: chunked
Date: Thu, 23 Oct 2008 07:45:49 GMT
经过测试,浏览器依据前者来显示.
六.客户端提交的表单数据
我们可能也会希望,浏览器上送的request的包头的Content-Type项也指明表单数据的编码方式.可是没有:
Content-Type: application/x-www-form-urlencoded
servlet在解码的时候,同样会按照默认的编码来解,可能是ISO-8859-1.这样就造成了?.
所以在HttpServletRequest.getParameter之前,必须先设置HttpServletRequest.setCharacterEncoding("...")
也许会纳闷,怎么知道浏览器上送数据的编码方式呢?产生上送数据的页面不也是从服务器上传下去的吗?该页面的Content-Type中charset是什么,浏览器就会严格按照这个来编码.所以自始至终都应该主动设置每个页面的字符编码.
七.jsp的page指令
<%@page contentType="text/html;charset=gb2312" pageEncoding="gb2312"%>
1.contentType
jsp文件会被tomcat转换成servlet.
page指令中的contentType正好是HttpServletResponse.setContentType()语句,也就是响应包包头的Content-Type: text/html;charset=gb2312
2.pageEncoding
jsp文件本身是用什么编码方式保存的,这样转成servlet时,就能正确的解析文件中的汉字
八.结论
1.请通过HttpServletResponse.setContentType("text/html;charset=gb2312") 或HttpServletResponse.setCharacterEncoding("gb2312")来指定服务器下传内容的编码.
2.编码能支持汉字即可,不一定非为gb2312.