今天有个备份策略出问题,报错内容是这样的:
04/24/2008 00:03:00 - requesting resource bfepdb-hcart2
04/24/2008 00:03:00 - requesting resource bfbkup.NBU_CLIENT.MAXJOBS.bfepdb
04/24/2008 00:03:00 - requesting resource bfbkup.NBU_POLICY.MAXJOBS.bfep_db
04/24/2008 00:03:00 - awaiting resource bfepdb-hcart2. Waiting for resources.
Reason: Tape media server is not active, Media server: erpdb,
Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,
Volume Pool: DB_ep_full, Storage Unit: bfepdb-hcart2, Drive Scan Host: N/A
client backup was not attempted because backup window closed (196)
在master server打开控制台,deivce--host,发现的status是“active for disk”。由于这个media server上面连接的是磁带库。正常应该是“active for type and disk”于是在这台机器上执行tpconfig –d,一直没有反馈信息。
1.通过bpps查看进程:
NB Processes
------------
root 11379 1 0 15:24:23 ? 0:00 /usr/openv/netbackup/bin/bpcompatd
root 11384 1 0 15:24:28 ? 0:00 /usr/openv/netbackup/bin/nbsl
root 11356 1 0 15:24:17 ? 0:00 /usr/openv/netbackup/bin/nbnos
root 11391 1 0 15:24:29 ? 0:00 /usr/openv/netbackup/bin/nbsvcmon
MM Processes
------------
root 11494 11364 0 15:30:37 ? 0:00 tldd
root 11364 1 0 15:24:20 ? 0:00 /usr/openv/volmgr/bin/ltid
root 11497 11364 0 15:30:39 ? 0:00 avrd
root 11480 1 0 15:30:34 ? 0:00 vmd
没有发现异常情况。
2.在master server上执行:
vmdareq -a
发现没有bfepdb这个media server的信息,决定重启nbu进程。netbackup stop后bpps -a
NB Processes
------------
MM Processes
------------
3.ioscan -fnC tape
Class I H/W Path Driver S/W State H/W Type Description
=========================================================================
tape 0 0/0/1/0.4.0 stape CLAIMED DEVICE HP C5683A
/dev/rmt/0m /dev/rmt/0mnb /dev/rmt/c0t4d0BESTn /dev/rmt/c0t4d0DDSb
/dev/rmt/0mb /dev/rmt/c0t4d0BEST /dev/rmt/c0t4d0BESTnb /dev/rmt/c0t4d0DDSn
/dev/rmt/0mn /dev/rmt/c0t4d0BESTb /dev/rmt/c0t4d0DDS /dev/rmt/c0t4d0DDSnb
tape 7 0/10/0/0.97.26.255.1.3.0 stape CLAIMED DEVICE HP Ultrium 2-SCSI
/dev/rmt/7m /dev/rmt/7mn /dev/rmt/c16t3d0BEST /dev/rmt/c16t3d0BESTn
/dev/rmt/7mb /dev/rmt/7mnb /dev/rmt/c16t3d0BESTb /dev/rmt/c16t3d0BESTnb
tape 8 0/10/0/0.97.26.255.1.3.1 stape CLAIMED DEVICE HP Ultrium 2-SCSI
/dev/rmt/8m /dev/rmt/8mn /dev/rmt/c16t3d1BEST /dev/rmt/c16t3d1BESTn
/dev/rmt/8mb /dev/rmt/8mnb /dev/rmt/c16t3d1BESTb /dev/rmt/c16t3d1BESTnb
tape 3 0/12/0/0.97.25.255.1.3.1 stape CLAIMED DEVICE HP Ultrium 2-SCSI
/dev/rmt/3m /dev/rmt/3mn /dev/rmt/c14t3d1BEST /dev/rmt/c14t3d1BESTn
/dev/rmt/3mb /dev/rmt/3mnb /dev/rmt/c14t3d1BESTb /dev/rmt/c14t3d1BESTnb
tape 4 0/12/0/0.97.25.255.1.3.2 stape CLAIMED DEVICE HP Ultrium 2-SCSI
/dev/rmt/4m /dev/rmt/4mn /dev/rmt/c14t3d2BEST /dev/rmt/c14t3d2BESTn
/dev/rmt/4mb /dev/rmt/4mnb /dev/rmt/c14t3d2BESTb /dev/rmt/c14t3d2BESTnb
设备也没有什么异常情况。
4.接着netbackup start,再手工启动策略问题依然存在。
5.再次netbackup stop后执行bp.kill_all,彻底杀掉nbu进程,再netbackup start启动nbu,vmdareq -a一切正常。
# netbackup start
NetBackup Database Server started.
NetBackup Notification Service started.
NetBackup Enterprise Media Manager started.
NetBackup Resource Broker started.
Media Manager daemons started.
NetBackup request daemon started.
NetBackup compatibility daemon started.
NetBackup Job Manager started.
NetBackup Policy Execution Manager started.
NetBackup Service Layer started.
NetBackup is not configured for clustering.
NetBackup Service Monitor started.
# vmdareq -a
Drive2 - AVAILABLE
bfbkup UP
erpdb UP
Drive3 - AVAILABLE
bfbkup UP
erpdb UP
HPUltrium2-SCSI0 - AVAILABLE
bfbkup UP
erpdb UP
HPUltrium2-SCSI1 - AVAILABLE
bfbkup UP
erpdb UP
6.结论
出现这种问题,可能是由nbu进程的异常造成的。但是正常的重启可能仍然不能解决问题,这时候需要执行bp.kill_all脚本来停止nbu的后台驻留程序。