抓取发现了bad case,请求的url是:http://food.39.net/a/2012322/1995027.html
起初以为是httpclient的问题,后来使用java httpconnection也还是抛出相同的异常,但是浏览器能访问,使用ruby脚本也能正常请求数据,不知道java的httpconnection读数据时出现了什么偏差。没google原因,有的说是java检查http请求严格造成的,不知道这里有谁遇到这个问题???
------解决方案--------------------
- Java code
String str="http://food.39.net/a/2012322/1995027.html"; HttpClient client = new HttpClient(); GetMethod method = new GetMethod(str); List<Header> headers = new ArrayList<Header>(); headers.add(new Header("Host","food.39.net")); headers.add(new Header("User-Agent","Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0)")); headers.add(new Header("Accept","*/*")); headers.add(new Header("Content-Type","application/x-www-form-urlencoded; charset=UTF-8")); headers.add(new Header("Connection","keep-alive")); client.getHostConfiguration().getParams().setParameter("http.default-headers", headers); try { client.executeMethod(method); System.out.println(method.getResponseBodyAsString()); } catch (HttpException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }