对于存在NAT或防火墙的RAC数据库,在启用了服务器端的load balance后,经常会碰到ORA-12545连接错误,这是因为服务器端转发客户端连接请求到其它节点后,客户端使用返回的IP再次发出连接请求而出现不可识别的IP地址或主机名而造成的。本文描述了这个问题并给出了解决方案。
有关RAC监听配置请参考
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
一、NAT下监听描述图
二、配置情况
1、服务器端的配置情况
################Server node 1 listener.ora########################
LISTENER_VMDB01P =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314)(IP = FIRST))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01p)(PORT = 1314)(IP = FIRST))
)
)
SID_LIST_LISTENER_VMDB01P =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/oracle/db)
(PROGRAM = extproc)
)
)
################Server node 2 listener.ora########################
LISTENER_VMDB02P =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314)(IP = FIRST))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02p)(PORT = 1314)(IP = FIRST))
)
)
SID_LIST_LISTENER_VMDB02P =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/oracle/db)
(PROGRAM = extproc)
)
)
################Server side tnsnames.ora, same for both nodes########################
remote_lsnr_lm5330 =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
)
local_lsnr_lm5330a =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
)
local_lsnr_lm5330b =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
)
################Server side host table, same for both nodes########################
127.0.0.1 localhost
10.200.48.17 vmdb01p.oradb.com vmdb01p
10.200.48.18 vmdb02p.oradb.com vmdb02p
10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
192.168.48.17 vmdb01pph.oradb.com vmdb01pph
192.168.48.18 vmdb02pph.oradb.com vmdb02pph
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
instance_name string LM5330A
SQL> show parameter listener
NAME TYPE VALUE
------------------------------------ ------------------- -------------------
local_listener string local_lsnr_lm5330a
remote_listener string remote_lsnr_lm5330
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
instance_name string LM5330B
SQL> show parameter listener
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
local_listener string local_lsnr_lm5330b
remote_listener string remote_lsnr_lm5330
2、客户端tnsnames.ora配置
#For NAT client
LM5330TKO =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.15)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.16)(PORT = 1314))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = LM5330)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
#For non NAT client
LM5330VIP =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.15)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.16)(PORT = 1314))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = LM5330)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
3、客户端hosts文件配置
c:\windows\system32\drivers\etc\hosts
10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
三、ORA-12545错误信息
C:\Users\Administrator>sqlplus system/system@lm5330tko
SQL*Plus: Release 10.2.0.3.0 - Production on 星期二 11月 27 16:01:23 2012
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
ERROR:
ORA-12545: Connect failed because target host or object does not exist
对于客户端发起的连接请求,有时候可以成功建立连接,有时候无法建立连接并伴随给出ORA-12545错误
oracle@bo2dbp:~> oerr ora 12545
12545, 00000, "Connect failed because target host or object does not exist"
// *Cause: The address specified is not valid, or the program being
// connected to does not exist.
// *Action: Ensure the ADDRESS parameters have been entered correctly; the
// most likely incorrect parameter is the node name. Ensure that the
// executable for the server exists (perhaps "oracle" is missing.)
// If the protocol is TCP/IP, edit the TNSNAMES.ORA file to change the
// host name to a numeric IP address and try again.
从ORA-12545错误描述来看,主要是由于解析的问题产生该类错误
四、分析与解决
1、启用sqlnet trace
为客户端的sqlnet.ora文件添加下列项来启用sqlnet trace,trace后,建议移除这些选项以避免额外的网络开销
Trace_level_client=16
Trace_directory_client=<path_to_the_trace_directory> # use the full path to the trace directory
Trace_unique_client=on
Trace_timestamp_client=on
Diag_adr_enabled=off
#Author : Robinson
#Blog : http://blog.csdn.net/robinson_0612
下面是trace文件提供的信息,关于如何做 oracle net trace 请参考:http://blog.csdn.net/robinson_0612/article/details/8254720
文件名:cli_10648_1.trc
[27-11-2012 17:59:01:694] nsc2addr: (ADDRESS=(PROTOCOL=TCP)(HOST=vmdb02pvip)(PORT=1314))
[27-11-2012 17:59:01:694] nttbnd2addr: entry
[27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
[27-11-2012 17:59:01:694] snlinGetAddrInfo: Invalid IP address string vmdb02pvip
[27-11-2012 17:59:01:694] snlinFreeAddrInfo: entry
[27-11-2012 17:59:01:694] snlinFreeAddrInfo: exit
[27-11-2012 17:59:01:694] snlinGetAddrInfo: exit
[27-11-2012 17:59:01:694] nttbnd2addr: looking up IP addr for host: vmdb02pvip
[27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
[27-11-2012 17:59:04:259] snlinGetAddrInfo: Name resolution failed for vmdb02pvip
[27-11-2012 17:59:04:259] snlinFreeAddrInfo: entry
[27-11-2012 17:59:04:259] snlinFreeAddrInfo: exit
[27-11-2012 17:59:04:259] snlinGetAddrInfo: exit
[27-11-2012 17:59:04:259] nttbnd2addr: *** hostname lookup failure! ***
[27-11-2012 17:59:04:259] nttbnd2addr: exit
[27-11-2012 17:59:04:259] nserror: entry
[27-11-2012 17:59:04:259] nserror: nsres: id=0, op=77, ns=12545, ns2=12560; nt[0]=515, nt[1]=1004, nt[2]=0;
ora[0]=0, ora[1]=0, ora[2]=0
[27-11-2012 17:59:04:259] nsc2addr: error exit
2、故障分析
#分析上面的trace信息,trace信息中指出了无法解析主机vmdb02pvip
#客户端显示的12545错误的发生正式由于这个vmdb02pvip无法解析。
#那有时候可以连接成功是怎么一回事呢?
#当客户端发起连接请求,由于客户端配置了load balance,因此客户端会随机从ADDRESS列表中挑选一个IP进行连接。
#假定当前选择从NAT IP 10.255.48.15发出连接请求,10.255.48.15会被自动映射到相应的virtual ip,即10.200.48.15。
#当10.200.48.15上的监听器捕获到该请求,服务器端的load balance生效,pmon进程根据监听信息发现当前节点并不繁忙,直接建立连接。
#上面的描述就是可以成功建立连接的情形。
#当服务器的监听发现其余的节点空闲时,而当前节点繁忙,监听器将实现转发。
#也就是说服务器端返回一个新的IP地址给客户端,说我比较忙,你连接到10.200.48.16吧。
#于是客户端根据所提供的IP地址重新发起连接请求,而此时客户端到10.200.48.16根本不可达,即使将ADDRESS列表改为该IP。
#所以上面的分析是产生ORA-12545的原因。
3、解决
#下面通过tracert命令来检查节点是否可达
C:\Users\Administrator>tracert vmdb02pvip
Tracing route to vmdb02pvip [10.200.48.16]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms 192.168.9.254
2 <1 ms <1 ms <1 ms 10.97.5.1
3 * * * Request timed out. #超时,节点不可达
4 * * * Request timed out.
5 * * * Request timed out.
6 * * * Request timed out.
#考虑将客户端hosts文件作下列更改,即将映射关系的IP直接改为NAT的IP
c:\windows\system32\drivers\etc\hosts
#10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
#10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
10.255.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.255.48.16 vmdb02pvip.oradb.com vmdb02pvip
#再次tracert
C:\Users\Administrator>tracert vmdb02pvip
Tracing route to v2048db02pvip [10.255.48.16]
over a maximum of 30 hops:
1 <1 ms 1 ms 1 ms 192.168.9.254
2 <1 ms <1 ms <1 ms 10.97.5.1
3 8 ms 5 ms 4 ms 10.100.34.4
4 6 ms 4 ms 4 ms vmdb02pvip [10.255.48.16]
5 8 ms 4 ms 4 ms vmdb02pvip [10.255.48.16]
Trace complete.
#再次建立连接时,不再出现ORA-12545
C:\Users\Administrator>netstat
Active Connections
Proto Local Address Foreign Address State
TCP 192.168.9.12:59541 vmdb01pvip:1314 ESTABLISHED
TCP 192.168.9.12:59558 vmdb02pvip:1314 ESTABLISHED
其次也可以通过禁用服务器端的load balance来避免该错误,当然此方法不被推荐。
alter system set remote_listener='' sid='*';
也可以通过配置CMAN(Oracle Connection Manager)来解决该问题
五、更多参考
有关Oracle RAC请参考
使用crs_setperm修改RAC资源的所有者及权限
使用crs_profile管理RAC资源配置文件
RAC 数据库的启动与关闭
再说 Oracle RAC services
Services in Oracle Database 10g
Migrate datbase from single instance to Oracle RAC
Oracle RAC 连接到指定实例
Oracle RAC 负载均衡测试(结合服务器端与客户端)
Oracle RAC 服务器端连接负载均衡(Load Balance)
Oracle RAC 客户端连接负载均衡(Load Balance)
ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
ORACLE RAC 监听配置 (listener.ora tnsnames.ora)
配置 RAC 负载均衡与故障转移
CRS-1006 , CRS-0215 故障一例
基于Linux (RHEL 5.5) 安装Oracle 10g RAC
使用 runcluvfy 校验Oracle RAC安装环境
有关Oracle 网络配置相关基础以及概念性的问题请参考:
配置非默认端口的动态服务注册
配置sqlnet.ora限制IP访问Oracle
Oracle 监听器日志配置与管理
设置 Oracle 监听器密码(LISTENER)
配置ORACLE 客户端连接到数据库
有关基于用户管理的备份和备份恢复的概念请参考
Oracle 冷备份
Oracle 热备份
Oracle 备份恢复概念
Oracle 实例恢复
Oracle 基于用户管理恢复的处理
SYSTEM 表空间管理及备份恢复
SYSAUX表空间管理及恢复
Oracle 基于备份控制文件的恢复(unsing backup controlfile)
有关RMAN的备份恢复与管理请参考
RMAN 概述及其体系结构
RMAN 配置、监控与管理
RMAN 备份详解
RMAN 还原与恢复
RMAN catalog 的创建和使用
基于catalog 创建RMAN存储脚本
基于catalog 的RMAN 备份与恢复
RMAN 备份路径困惑
使用RMAN实现异机备份恢复(WIN平台)
使用RMAN迁移文件系统数据库到ASM
linux 下RMAN备份shell脚本
使用RMAN迁移数据库到异机
有关ORACLE体系结构请参考
Oracle 表空间与数据文件
Oracle 密码文件
Oracle 参数文件
Oracle 联机重做日志文件(ONLINE LOG FILE)
Oracle 控制文件(CONTROLFILE)
Oracle 归档日志
Oracle 回滚(ROLLBACK)和撤销(UNDO)
Oracle 数据库实例启动关闭过程
Oracle 10g SGA 的自动化管理
Oracle 实例和Oracle数据库(Oracle体系结构)