1.前言
之前一直使用gdb filename方式启动调试,昨天想停住某个特定线程,然后就犯难了,捣鼓了很久。后来就想着,反正都捣鼓gdb,干脆就顺便好好研究一下attach启动方式吧。捣鼓明白之后,感觉自己太智障了,想要在non-stop下停住特定线程,使用attach才是正解。
2.attach启动方式相关
启动方式
假设:
- 程序是使用-g选项编译的,带有调试信息。
如何确认程序是否可调试?
对于没有调试信息的程序如何处理?
这类的问题在后面会介绍,先不涉及; - 默认为non-stop模式,不过all-stop模式在操作上好像也没
使用attach的话,进程id是必须,所以首先应利用以下命令找出进程id:
ps -ef | grep programName
- gdb attach pid,直接读取~/.gdbinit启动gdb,并attach到pid进程号的进程中。这里有一个缺点,如果 ~/.gdbinit中的设置并不是本次调试想要的,那么就达不到目的,因为直接启动之后就自动attach到进程了,类似non-stop之类的选项是无法在attach之后修改的;
- 在gdb内部attach启动方式,
# 和上面的差别,就是你调整某些选项,然后再attach到进程。
# 例如吧,set non-stop off。
gdb
(gdb) set some thing
(gdb) file filename
(gdb) attach pid
- gdb filename pid(–pid pid),效果和直接attach暂时来看是相同的,我的理解是这样的,因为实际运行的程序,基本上是不会保留调试信息(也就是不会带-g、-ggdb编译)。这时候怎么办呢?
我们可以另外编译一份带调试信息的啊,然后通过gdb filename pid的方式,就可以正常调试了。 这是我认为的,这种启动方式的存在意义。
# 当然,也可以这样启动,
# 区别的话,上面也说了,这样弄可以先改一改一些选项,
# 然后再attach。
gdb
(gdb) set some thing
(gdb) file filename
(gdb) attach pid
attach方式的适用场景
转载了一个stackoverflow上关于gdb attach的回答,里面提到了两种gdb attach的适用场景,或者说如果某些程序不适合适用attach方式的话,如何将其进行改造使之适用。
-
If the program to debug (in gdb lingo, “the inferior”) is long-running – for example, a GUI or a server of some kind – then the simplest way is to just run the script, wait for the inferior to start, and then attach to it. You can attach using the PID, either with gdb -p PID or using attach PID at the gdb prompt.
意思就是,那些会跑很久,甚至一直跑的程序,那是最简单的。直接让它跑起来,然后attach就行了(为什么出现script字眼?原问题提问者是因为程序的启动脚本过于复杂,才会选用attach方式进行调试的); -
If the program is short-lived, then another classic approach is to add a call to sleep early in the program’s startup; say as the first line of main. Then, continue with the attach plan.
意思就是,那些跑一会就停,可能你都还没attach到就结束的程序,如果想用attach方式调试的话,你需要修改一下源码,让它sleep一会。然后在它sleep的时候趁机attach进去,也就是夜袭(正经脸);
调试已经开始运行,但没调试信息的程序
上面已经说了,再编译一份有调试信息的,然后gdb filename pid(–pid pid)就可以了。在知乎的这个专栏文章里看到的。
3.实例
- 先用ps命令,找出进程id,可以看到20526就是目标进程id;
[ray@masaike bin]$ ps -ef | grep hashcat
ray 20526 19856 0 13:59 pts/4 00:00:01 ./bin/dcr_hashcat.bin s
ray 20626 19778 0 14:43 pts/2 00:00:00 grep --color=auto hashcat
- 模拟真实运行情况,特地编译了两份可执行文件,当前执行的是不带调试信息(编译时不带-g)的那一份,而用于调试的则是带调试信息的那一份。由于~/.gdbinit中设置了non-stop模式,并且不需要更改,所以直接使用gdb filename pid(–pid pid) 附着到进程上:
[ray@masaike bin]$ gdb dcr_hashcat.bin 20526
- 使用info thread可以看到,现在所有线程都被停下来了。
(gdb) info threadId Target Id Frame 15 Thread 0x7fbf11392700 (LWP 20527) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.614 Thread 0x7fbf10b91700 (LWP 20528) "dcr_hashcat.bin" 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.613 Thread 0x7fbf0bfff700 (LWP 20529) "dcr_hashcat.bin" 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.612 Thread 0x7fbf0b7fe700 (LWP 20530) "dcr_hashcat.bin" 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.611 Thread 0x7fbf0affd700 (LWP 20531) "dcr_hashcat.bin" 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.610 Thread 0x7fbf0a7fc700 (LWP 20532) "dcr_hashcat.bin" 0x00007fbf1176c9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.09 Thread 0x7fbf09ffb700 (LWP 20533) "dcr_hashcat.bin" 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.68 Thread 0x7fbf097fa700 (LWP 20534) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.67 Thread 0x7fbf08ff9700 (LWP 20535) "dcr_hashcat.bin" 0x00007fbf1176c9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.06 Thread 0x7fbf03fff700 (LWP 20536) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.65 Thread 0x7fbf037fe700 (LWP 20537) "dcr_hashcat.bin" 0x00007fbf1176c9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.04 Thread 0x7fbf02ffd700 (LWP 20538) "dcr_hashcat.bin" 0x00007fbf1176c9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.03 Thread 0x7fbf027fc700 (LWP 20539) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.62 Thread 0x7fbf01ffb700 (LWP 20540) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.6
* 1 Thread 0x7fbf125c3a00 (LWP 20526) "dcr_hashcat.bin" 0x00007fbf11491e63 in epoll_wait () from /lib64/libc.so.6
- 在这个实例中,我是使用切换线程+输出堆栈的方式,来找到分解器线程的,能找到目标线程就OK:
(gdb) thread 13
[Switching to thread 13 (Thread 0x7fbf0bfff700 (LWP 20529))]
#0 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.6
(gdb) where
#0 0x00007fbf1145880d in nanosleep () from /lib64/libc.so.6
#1 0x00007fbf114890e4 in usleep () from /lib64/libc.so.6
#2 0x000000000045903b in dcr::core::sleepMs (micro_secs=500) at src/dcr/core/time_util.h:12
#3 0x00000000004e6c88 in dcr::core::BasicComponentExecutor::onRun (this=0x2005830) at src/dcr/core/component_executor.cpp:633
#4 0x00000000004e336b in dcr::core::ComponentRunnable::run (this=0x20055a0) at src/dcr/core/component_executor.cpp:18
#5 0x000000000040a0f4 in dcr::Thread::ThreadData::runInThread (this=0x2005940) at src/dcr/base/thread.cpp:53
#6 0x000000000040a11c in dcr::startThread (arg=0x2005940) at src/dcr/base/thread.cpp:58
#7 0x00007fbf11768e65 in start_thread () from /lib64/libpthread.so.0
#8 0x00007fbf1149188d in clone () from /lib64/libc.so.6
- 成功找到目标线程就是13号线程后,给13号线程打上断点,打断点成功表明gdb filename pid(–pid pid)确实有成功载入到调试信息:
(gdb) b component_executor.cpp:633 thread 13
Breakpoint 1 at 0x4e6c7e: file src/dcr/core/component_executor.cpp, line 633.
- 最后使用thread apply all continue使13号线程暂停在component_executor.cpp:633处,而其他线程恢复执行。再次输入info thread后可以看到,其他线程由于non-stop模式的原因已经不受干扰地正常执行了(running状态),而目标线程被暂停了下来:
(gdb) thread apply all continueThread 15 (Thread 0x7fbf11392700 (LWP 20527)):
Continuing.Thread 14 (Thread 0x7fbf10b91700 (LWP 20528)):
(太长了,省略掉一些信息)
Breakpoint 1, dcr::core::BasicComponentExecutor::onRun (this=0x2005830) at src/dcr/core/component_executor.cpp:633
633 sleepMs(executor_sleep_ms);(gdb) info threadId Target Id Frame 15 Thread 0x7fbf11392700 (LWP 20527) "dcr_hashcat.bin" (running)14 Thread 0x7fbf10b91700 (LWP 20528) "dcr_hashcat.bin" (running)
* 13 Thread 0x7fbf0bfff700 (LWP 20529) "dcr_hashcat.bin" dcr::core::BasicComponentExecutor::onRun (this=0x2005830) at src/dcr/core/component_executor.cpp:63312 Thread 0x7fbf0b7fe700 (LWP 20530) "dcr_hashcat.bin" (running)11 Thread 0x7fbf0affd700 (LWP 20531) "dcr_hashcat.bin" (running)10 Thread 0x7fbf0a7fc700 (LWP 20532) "dcr_hashcat.bin" (running)9 Thread 0x7fbf09ffb700 (LWP 20533) "dcr_hashcat.bin" (running)8 Thread 0x7fbf097fa700 (LWP 20534) "dcr_hashcat.bin" (running)7 Thread 0x7fbf08ff9700 (LWP 20535) "dcr_hashcat.bin" (running)6 Thread 0x7fbf03fff700 (LWP 20536) "dcr_hashcat.bin" (running)5 Thread 0x7fbf037fe700 (LWP 20537) "dcr_hashcat.bin" (running)4 Thread 0x7fbf02ffd700 (LWP 20538) "dcr_hashcat.bin" (running)3 Thread 0x7fbf027fc700 (LWP 20539) "dcr_hashcat.bin" (running)2 Thread 0x7fbf01ffb700 (LWP 20540) "dcr_hashcat.bin" (running)1 Thread 0x7fbf125c3a00 (LWP 20526) "dcr_hashcat.bin" (running)
订正:
其实并不需要指定线程打断点,直接b component_executor.cpp:633给所有线程打上断点,然后thread apply all cont就会自动找到相关线程,然后让它停下来。除非有特殊需求,例如分解器线程和规约器线程用的同一个线程函数,这时候才需要想办法分辨哪个是分解器线程,哪个是规约器线程。