比如http://www.appannie.com/top/iphone/united-states/games/这个页面,我想把FREE一栏的游戏排位上升大于30的游戏名称都抓取出来,该怎么办呢?貌似可以用jsoup,不过我找了很多例子,看不太懂。。。
------解决方案--------------------
jsoup这种连接即断开的方式很容易被认为是网络攻击,所以会报503错误,像LZ说的这个网站就不能用jsoup直接抓取,不过可以先用HttpClient将网页保存到本地,然后再用jsoup来分析
- Java code
//先保存到本地硬盘 HttpClient client = new HttpClient(); String htmlurl = "http://www.appannie.com/top/iphone/united-states/games/"; System.out.println(htmlurl); HttpMethod method = new GetMethod(htmlurl); try { client.executeMethod(method); System.out.println(method.getStatusLine()); String html = method.getResponseBodyAsString(); FileWriter fw = new FileWriter("C:\\download\\Top Charts - iPhone - United States - Games App Annie.htm" ); fw.write(html); fw.close(); } catch (HttpException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }