Shared library can be confusing and erroneous. It is very vulnerable to get "undefined symbols" error. There are some good tools can be used to analyze the so file and help us to locate the error.
A. Tools dealing with ELF(binary and shared libraries)
1. ldd xxx.so
ldd -- can print shared library dependencies
use man ldd or ldd --help to learn details
2. readelf xxx.so
readelf -- print information of ELF file, it support both binary and shared library.
use man page or --help to get detail info.
-s --symbols is used to display symbols in ELF
3. Remember use EXPORT_API to declare functions, otherwise, it might be treated as local function.
You can use readelf -s xxx.so to see symbols in shared library. Export API is GLOBAL and private functions are LOCAL.
4. nm xxx.so
list symbols from object files.
nm -gD xxxx.so
list all the symbols in shared library xxxx.so
5. objdump xxx.so
dump information about object file.
6. strip xxx.so
discard symbol tables from object files.
B. Common problems
1. how to resolve dlopen(xxx.so, RTLD_NOW) throws "undefined symbols" runtime error
There are three objects: A(binary) B (Shared libary) C(shared library)
B has dependency to C(link dependency)
A use dlopen(B, RTLD_NOW), will throw "B has undefined symbols, the symbols are defined in C"
对于这种问题,首先要确认了B库中和C库中的symbols,也就是说B中未定义的symbols应该在C库中定义,可以用nm命令来查看。如果发现是C中确实缺少了B所需要的symbols,那到就是C库编译生成的库有问题,或者未生成正确的库。也可能是B库中把函数的名字写错了。这个需要一个一个的排查,先确定B库,如果调用方没有问题,那么问题就出在实现方,检查编译时是否有链到正确的库,检查编译时是否生成了正确的库,是否把源码都编译进去了,是否声明有问题。这样就可以解决问题。
如果C中有B中需要的库,但B还是会报undefined symbols,那么就有可能是B库未依赖到C库,可以用ldd命令查看B库的依赖集,如果输出中未包含C库,那么就是B库编译的有问题,它在编译时候未指定对C库的依赖。修改它的Makefile让B库依赖到C库,即可解决问题。
总体的原则就是:库的依赖分为二个方面,一个是编译时依赖,通常是include库的头文件,并需要在Makefile中指定,通过编译选项-l;另一个是运行时依赖,也就是编译时没有任何依赖,即使没有库和头文件也能编译成功,运行时通过dlopen去打开指定的库,通过dlsym去查找相应的函数。
工具就是:用ldd查看库的编译依赖;用nm或readelf或objdump查看库的symbol tables看是否包含指定的symbols.
2. compile time "undefined symbols"
这个就相对容易得多,也非常的普通,通常的原因是编译选项-l没有链接到指定的库。用-lxxx即可解决。
这个根本原因 是ld无法找到对应的库,一个原因是选项中未指定;另一方面就是ld确实无法找到原因有二:一个是系统中缺少这个库,这个需要安装一下即可;还有可能是系统中有这个库,但是ld无法找到,这就需要把库的路径加入到环境变量LD_LIBRARY_PATH当中去,就可以解决问题。
3. how to replace some shared library
In order to replace some shared library and make new lib work, you must stop all processes using old lib first. Otherwise, libs might not get updated.
References:
http://www.ibm.com/developerworks/linux/library/l-shlibs/index
http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries