- Java code
public static void main(String[] args) throws Exception { String[] urls = { "http://mil.news.sina.com.cn/2012-04-10/0428687123.html", "http://mil.news.sina.com.cn/2012-04-12/0731687387.html", "http://news.sina.com.cn/c/2012-04-13/044224264609.shtml" }; final Pattern titlePattern = Pattern .compile("<h1 id=\"artibodyTitle\".*?>(.*?)</h1>"); final Pattern wordCountPattern = Pattern.compile("\u515a|\u56fd\u5bb6"); for (final String url : urls) { new Thread() { public void run() { BufferedReader reader = null; try { reader = new BufferedReader(new InputStreamReader( new URL(url).openStream(), "GB2312")); String line; String title = null; int[] count = new int[2]; while ((line = reader.readLine()) != null) { if (title == null) { Matcher titleMatcher = titlePattern.matcher(line); if (titleMatcher.find()) { title = titleMatcher.group(1); } } Matcher wordCountMatcher = wordCountPattern.matcher(line); while (wordCountMatcher.find()) { String word = wordCountMatcher.group(); count[word.length() >> 1]++; } } if (count[0] > count[1]) { throw new RuntimeException( String.format("%s[%s] \u515a:%d > \u56fd\u5bb6:%d", title, url, count[0], count[1])); } System.out.printf("%s[%s] is good!", title, url); } catch (IOException ex) { ex.printStackTrace(); } finally { if (reader != null) { try { reader.close(); reader = null; } catch (Exception ex) { } } } } }.start(); } }
------解决方案--------------------
solution:add following line before Matcher wordCountMatcher = wordCountPattern.matcher(line);
line=line.replaceAll("\u515a","\u4EBA\u6C11");
------解决方案--------------------
楼主你是哪个单位的?!
你是为\u515a说话的,还是为\u56fd\u5bb6说话的?!
------解决方案--------------------
貌似有句说明了这点没有\u515a就没有新中国.............
所以\u515a 铁定大于 \u56fd\u5bb6
------解决方案--------------------
这个不太会 钻研下了
------解决方案--------------------
.......为了某些事情你还真是不遗余力啊
------解决方案--------------------
河蟹,真河蟹。
------解决方案--------------------
\u515a倒下,问题就解决了
------解决方案--------------------
------解决方案--------------------
\u515a=党
\u56fd=人
\u5bb6=民
------解决方案--------------------
貌似是个牛逼帖子!!!!!!!!!!!!!!
------解决方案--------------------
出现敏感字眼会被河蟹的。