How to troubleshoot “ASM does not discover disks”
原文链接:http://www.dbaleet.org/how_to_troubleshoot_asm_does_not_discover_disks/
This scenario is far from new in a conventional Oracle database environment, there are a couple of checklist need to be done if you came across this issue.
1. Check the ownership and permission of the asm candidate disks, it should be owned by the RDBMS software owner, eg: oracle:dba, and the permission of the candidate disks should be 660. You can check this by “ls -ltr” command on most cases.
2. If both ownership and permission are correct, you might have to read the disk manually by OS command “dd” under user “oracle”. Eg, if the name of the LUN to be used by asm is “/dev/asm/ocr1″ , you can read this disk by:
#su - oracle $dd if=/dev/asm/ocr1 of=/dev/null bs=8192 count=10
If the output of the above command returns something like “xxx in, xxx out” , then it is most likely that not be a problem of disk itself.
3. If you are using a multi-path technology, do not forget checking the certification information before making a plan. The certification has been documented well in MOS Oracle ASM and Multi-Pathing Technologies [ID 294869.1] . Be ware that IBM VPath is not supported on ASM, you should use alternative solution MPIO instead.
4. If you are using ‘ASMlib’, Firstly make sure that asmlib has been properly reconfigured, asmlib relied on specific Linux kernel versions, a mismatch between asmlib and linux kernel will lead a asmlib installation failure. Secondly, please try to run
/etc/init.d/oracleasm scandisks /etc/init.d/oracleasm listdisks
If there is no disk be found, do not rush into building ASM instance and ASM Diskgroup, investigate the reason behind first would save your time.
5. Please also make sure that your asm_diskstring parameter is properly set, ASM will only find the devices under the path which asm_diskstring provided with.
6. Last but not least, kfod is a friend you can count on.
$export LD_LIBRARY_PATH=/tmp/OraInstall2013-09-12_06-25-45PM/ext/lib $cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin $./kfod op=disks disks=all
if the above command returns nothing, try to trace this process:
$strace -f ./kfod op=disks disks=all
and investigate further by the output, it should cover all of the detail which is helpful for diagnostic this issue.
ON EXADATA:
Normaly, you can skip all of the 5 steps above, just take the 6th step should be enough if you griddisks are all online.
there are some dummy traces environment variables need to be set before trace the kfod on some rare occasions.
$export CELLCLIENT_TRACE_LEVEL="all,4" $export CELLCLIENT_AUTOFLUSH_LEVEL="all,4" $xport CELLCLIENT_TRACE_INFO="autoflush_sync,on" $cd /tmp/OraInstall2013-09-12_06-25-45PM/ext/bin $./kfod op=disks disks=all $strace -f ./kfod op=disks disks=all
We recently found that ASM instance and cellsrv should not be on the same node, otherwise, ASM instance won’t find any disks if cellsrv on the same node is already up.
It seems ASM instance tend to be searching for a library called “libcell11.so”, if there is a cell version of this file and it is now up, ASM instance would stop discovering the griddisks.
Juan Mosqueda contributes for the Exadata part of this article.
Thanks you Juan.