?? 本文经历的事情发生在2008年
?? 本人比较懒,碰到问题才会想到去解决, 一次偶然的机会,发现线上环境的tomcat自从把连接方式改成nio以后,时不时的发现有服务器CPU占用100%的情况。很奇怪的现象,为整个系统带来了不稳定因素,需要尽快解决。于是去网上查找资料,但是没有找到现成的答案。
?? 定下心来自己分析,先需要知道tomcat中的线程在干什么,于是找到了jstack命令。
命令格式:jstack ${pid}?
?? 其中pid可以通过jps命令获得。
?? 于是我们可以得到当前应用中工作线程的堆栈, 各个线程分别在做什么 , 比如很多线程都执行在同一个代码块,那这部分就可能有问题。
??? 结果如下:
?
"MaintThread" daemon prio=10 tid=0x91cd3400 nid=0x4f40 waiting on condition [0x8faec000..0x8faecfb0] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492) "Thread-3" prio=10 tid=0x8f9ff800 nid=0x4f3f waiting on condition [0x8fb2d000..0x8fb2df30] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.tudou.usersrv.client.HotSpareProxyMemcachedClient$1.run(HotSpareProxyMemcachedClient.java:71) "MaintThread" daemon prio=10 tid=0x91dcfc00 nid=0x4f3e waiting on condition [0x8fb6e000..0x8fb6eeb0] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492) "MaintThread" daemon prio=10 tid=0x91cd2800 nid=0x4f3d waiting on condition [0x8fbaf000..0x8fbafe30] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492) "Low Memory Detector" daemon prio=10 tid=0x09a16c00 nid=0x4f37 runnable [0x00000000..0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x09a15000 nid=0x4f36 waiting on condition [0x00000000..0x9250f688] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x09a13800 nid=0x4f35 waiting on condition [0x00000000..0x92590608] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x09a12800 nid=0x4f34 runnable [0x00000000..0x925d1c10] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x099ff800 nid=0x4f33 in Object.wait() [0x92812000..0x92812eb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116) - locked <0x978b2300> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x099ff000 nid=0x4f32 in Object.wait() [0x92853000..0x92853e30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x998d4160> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x09918800 nid=0x4f2c runnable [0xf7fb9000..0xf7fba208] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) - locked <0x978b24f8> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:453) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.catalina.core.StandardServer.await(StandardServer.java:389) at org.apache.catalina.startup.Catalina.await(Catalina.java:642) at org.apache.catalina.startup.Catalina.start(Catalina.java:602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) "VM Thread" prio=10 tid=0x099fc400 nid=0x4f31 runnable "GC task thread#0 (ParallelGC)" prio=10 tid=0x0991f400 nid=0x4f2d runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x09920400 nid=0x4f2e runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x09921400 nid=0x4f2f runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x09922400 nid=0x4f30 runnable "VM Periodic Task Thread" prio=10 tid=0x09a18400 nid=0x4f38 waiting on condition?
???
??? 使用linux自带的ps命令:
ps -eL -o pid,%cpu,lwp|grep -i 16907?
??? 列出子线程的cpu占用率等情况,再对照jstack命令得到的线程号,就可以知道cpu占用高的线程执行到了那里。
??? 通过这种方式,确定了是由于tomcat的nio用到了jdk6的nio包,命中了其中的一个死锁bug。apache组织的人也建议在高并发的情况下不要开启nio.
?
???? 在生产环境中,jstack命令帮助解决了内存泄露引起的垃圾回收线程占用高CPU, 以及服务线程阻塞在某个远程接口处导致无响应等问题。
?
???? 希望这篇文章可以帮助到大家解决更多问题,谢谢!
?