作者:张华 发表于:2014-10-29
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明
( http://blog.csdn.net/quqi99 )
我们知道, OpenStack Juno版本实现了VRRP特性为neutron l3-agent提供HA服务(可以参见笔者的另一篇博客, http://blog.csdn.net/quqi99/article/details/18799877). 但是OpenStack Icehouse版本不支持VRRP特性, 使用Corosync+Pacemaker为Icehouse L3-agent提供Active/Passive HA服务.
1, 理论分析
a, 当一个L3-agent节点出故障后, 另一个L3-agent节点启动后, 要想L3-agent对这个router重新调度可以使用'neutron l3-agent-router-remove <l3-agent-id> <router-id>'命令对router解绑定(网上也有类似的脚本, 如: https://github.com/stackforge/cookbook-openstack-network/blob/master/files/default/neutron-ha-tool.py ). 或者更简单一些地直接让两个L3-agent节点的hostname相同, 这样另一个L3-agent都会对router进程重新调度.
使用相同的hostname来实现icehouse中的l3-agent的HA问题是最简单的. 但需要确保corosync能work:
(1) 在l3-agent的ocf文件的start方法的最前面修改hostname和/etc/hosts:
hostname neutron-gateway
old_hostname=`cat /etc/hostname`
ip=`ping -c 1 $old_hostname |grep 'icmp_seq' |awk -F ')' '{print $1}' |awk -F '(' '{print $2}'`
record="$ip neutron-gateway"
is_in_hosts=`grep 'neutron-gateway' /etc/hosts`
[[ $is_in_hosts = $record ]] && echo yes || sh -c "echo $record >> /etc/hosts"
如果是采用router重新调度的方式, 可以在ocf文件的start方法的最前面加上下列脚本:
export OS_USERNAME=admin
export OS_PASSWORD=openstack
export OS_TENANT_NAME=admin
export OS_REGION_NAME=RegionOne
export OS_AUTH_URL=${OS_AUTH_PROTOCOL:-http}://10.5.0.18:5000/v2.0
l3_agent_ids=$(neutron agent-list |grep 'L3 agent' |awk -F '|' '{print $2}')
for l3_agent_id in $l3_agent_ids; do
router_ids=$(neutron router-list-on-l3-agent $l3_agent_id |grep 'network_id' |awk -F '|' '{print $2}')
if [ ! -s $router_ids ]; then
for router_id in $router_ids; do
if [ ! -z "$if_down" ]; then
neutron l3-agent-router-remove $l3_agent_id $router_id
done;
fi
done;
b, 另外, 如果一个L3-agent节点并不是关机了, 只是进程死掉的话, 还应该删除qrouter-<router-id>命名空间, 虚机子网网关接口, 以及浮动IP接口. 故需要ocf的stop方法中添加:
neutron-ovs-cleanup #它将删除 router-命名空间中的qr-和qg-打头的网关port, 同时删除全局命名空间中与之相关的peer设备.
for ns in $(ip netns list |grep 'qrouter-'); do ip netns delete $ns; done; #不用neutron-netns-cleanup脚本的目的是为了避免也删除了dhcp-命名空间.
c, HA集群结构如下图:
1) Messaging Layer, 心跳信息传输层, 如corosync
2) CRM(Cluster Resources Manager), LRM(Local Resource Manager), 捕捉到Messaging Layer中的心跳后通过资源代码RA去做一些事情, pacemaker属于这一层, pacemaker的配置接口叫crmsh. 集群中的服务应该交由CRM来管理不需要用户参与所以可设置服务开机关闭
3) RA(Resource Agent), 真正干活的脚本, 类型有:
(1) heartbeat legacy, heartbeat的传统类型, 监听在udp的694端口上
(2) LSB, linux standard base, 那些在/etc/rc.d/init.d/*的脚本就是LSB
(3) OCF, Open Cluster Framework, 开放集群架构, 那些提供RA脚本的组织叫provider, pacemaker就是其中的一个provider
(4) STONITH (shoot the other node in the head), 这个RA类型主要是做节点隔离的, 比如集群有5台的话, 现在网络故障导致左边3台与右边2台不能通信, 这时候左右两边就会分别推选出一台做为DC(Designated Coordinator), 而从出现了两个集群会导致资源争用(如恰好双方都往共享存储里写数据就会导致文件系统崩溃, 这叫集群分裂), 为了避免集体分裂, 就出现了法定票数(quorum, 票数 > 半数票数的集群(也就是左右的那3台)成为合法的集群, stonith设备就是让不合法的右边的2台节点的集群退出集群, 释放资源, 关闭电源. 如果一个集群只有两个节点的话, 就是一种特殊的集群, 万一出现集群分裂后, 就选不法合法的群集, 那结果可想而知, 资源不会转移, 导致整个资源都故障了, 因为没有仲裁设备. 所以要设置no-quorum-policy属性忽略它(crm configure property no-quorum-policy="ignore")
2, 环境与安装
环境: 两个节点, 节点1(172.16.1.122)与节点2(172.16.1.123), 且两个节点的hostname相同. 不一定非要使用root用户. 且确保两个节点的时间同步.
安装: sudo apt-get install pacemaker corosync heartbeat crmsh cluster-glue resource-agents
两个节点相互免ssh登录配置:
修改/etc/ssh/sshd_config让ssh支持以root用户登录, 并重启 service ssh restart
#PermitRootLogin without-password
PermitRootLogin yes
节点1, ssh-keygen -t rsa -P '' & ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.123
节点2, ssh-keygen -t rsa -P '' & ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.16.1.122
测试, ssh root@172.16.1.123 -- ip addr show
在一个节点上生成corosync key, 然后拷贝到其他所有节点. 生成key时会默认会调用/dev/random随机数设备,一旦系统中断的IRQS的随机数不够用,将会产生大量的等待时间,所以将random替换成urandom
mv /dev/{random,random.bak}
ln -s /dev/urandom /dev/random
corosync-keygen
scp /etc/corosync/authkey root@172.16.1.123:/etc/corosync/
chown root:root /etc/corosync/authkey
chmod 400 /etc/corosync/authkey
3, 在两个节点上配置corosync
# cat /etc/default/corosync
START=yes
# mkdir /var/log/corosync
# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf, 并修改里面的: bindnetaddr: 172.16.1.0, 其余保持默认.
totem {version: 2secauth: offthreads: 0interface {ringnumber: 0bindnetaddr: 10.8.0.0mcastaddr: 226.94.1.1mcastport: 5405ttl: 1
}
}
logging {fileline: offto_stderr: noto_logfile: yesto_syslog: yeslogfile: /var/log/cluster/corosync.logdebug: offtimestamp: onlogger_subsys {subsys: AMFdebug: off
}}
service {# Load the Pacemaker Cluster Resource Managername: pacemakerver: 1
}
amf {mode: disabled
}
quorum {provider: corosync_votequorumexpected_votes: 2
two_node: 1
}
two_node: 1
}
在两个节点中,必须配置expected_votes:2 与 two_node: 1
4, 在两个节点上启动并验证corosync
# service corosync restart
# corosync-cfgtool -s
Printing ring status.
Local node ID -1408237190
RING ID 0
id = 172.16.1.122
status = ring 0 active with no faults
# corosync-cmapctl |grep member |grep ip
runtime.totem.pg.mrp.srp.members.2886730106.ip (str) = r(0) ip(172.16.1.122)
runtime.totem.pg.mrp.srp.members.2886730107.ip (str) = r(0) ip(172.16.1.123)
$ sudo grep TOTEM /var/log/corosync/corosync.log
#检查集群节点和人数:
#corosync-quorumtool -l
Nodeid Votes Name
16779274 1 server1
33556490 1 server2
在线修改quorum数目:
corosync-cmapctl -s quorum.expected_votes u32 3
corosync-cmapctl -s runtime.votequorum.ev_barrier u32 3
corosync-cmapctl | grep quorum
5, 在两个节点上启动并验证pacemaker
# service pacemaker restart
# crm_mon --one-shot -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Last updated: Wed Oct 29 02:52:08 2014
Last change: Wed Oct 29 02:20:43 2014
Current DC: NONE
0 Nodes configured
0 Resources configured
上面显示Current DC: NONE选不出DC是因为用了两个节点又没设置no-quorum-policy造成的, 可以只先启动一个节点, 设置了no-quorum-policy之后再启动另一个就好了(还要注意配置上面配置文件中的two_node参数。如:
sudo crm status
Last updated: Fri Oct 31 03:30:04 2014
Last change: Fri Oct 31 03:29:16 2014 via crm_attribute on node1
Stack: corosync
Current DC: node122 (739246458) - partition with quorum
Version: 1.1.10-42f2063
2 Nodes configured
0 Resources configured
Online: [ node1 node2 ]
sudo crm configure property stonith-enabled=false
sudo crm configure property no-quorum-policy=ignore
添加VIP, crm configure primitive FAILOVER-ADDR ocf:heartbeat:IPaddr2 params ip="172.16.1.100" nic="eth0" op monitor interval="10s" meta is-managed="true"
查看配置, crm configure show
6, 与openstack集成
sudo mkdir /usr/lib/ocf/resource.d/openstack
cd /usr/lib/ocf/resource.d/openstack
sudo wget https://raw.github.com/madkiss/openstack-resource-agents/master/ocf/neutron-agent-l3
chmod a+rx neutron-l3-agent
primitive p_neutron-l3-agent ocf:openstack:neutron-agent-l3 \
params config="/etc/neutron/neutron.conf" \
plugin_config="/etc/neutron/l3_agent.ini" \
op monitor interval="30s" timeout="30s"
sudo crm node standby node1
sudo crm node online node2
Troubleshooting
1, Attempting connection to the cluster...Could not establish cib_ro connection: Connection refused
iptables -I INPUT 1 --protocol udp --dport 5405 -j ACCEPT
iptables -I INPUT 1 --protocol udp --sport 5404 -j ACCEPT
iptables -I OUTPUT 1 --protocol udp --dport 5405 -j ACCEPT
iptables -I OUTPUT 1 --protocol udp --sport 5404 -j ACCEPT
2, 如果是同时对metadata-agent与l3-agent做HA,可以用packmaker的关联约束将二者同时调度到同一节点, 见: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-colocation.html
sudo crm configure show xml > /tmp/crm_conf.xml
update /tmp/crm_conf.xml file with below configuration
<constraints>
<rsc_colocation id="coloc-1" score="INFINITY" >
<resource_set id="collocated-set-1" sequential="false">
<resource_ref id="res_neutron-metadata-agent"/>
<resource_ref id="res_neutron-l3-agent"/>
</resource_set>
</rsc_colocation>
</constraints>
sudo crm configure load update /tmp/crm_conf.xml
或者使用group约束能达到同样的效果:
sudo crm configure group res_group_l3_metadata res_neutron-l3-agent res_neutron-metadata-agent
最后重新agent
sudo crm resource restart res_neutron-l3-agent
sudo crm resource restart res_neutron-metadata-agent
sudo crm resource cleanup res_neutron-l3-agent
定义删除resource
sudo crm configure primitive res_ceilometer_agent_central ocf:openstack:ceilometer-agent-central op monitor interval="30s"
sudo crm -w -F resource stop res_ceilometer_agent_central
sudo crm resource cleanup res_ceilometer_agent_central
sudo crm -w -F configure delete res_ceilometer_agent_central
单独调试ocf script
sudo -i
export OCF_ROOT=/usr/lib/ocf
export OCF_RESOURCE_INSTANCE=res_ceilometer_agent_central
$OCF_ROOT/resource.d/openstack/ceilometer-agent-central monitor
cat /var/run/resource-agents/res_ceilometer_agent_central.pid
升级charm
juju set ceilometer openstack-origin=cloud:trusty-liberty
#juju resolved -r ceilometer/0
juju upgrade-charm ceilometer
20200818更新
当delete then re-add unit时,如果delete后的node个数刚好为2会遇到这个bug (https://bugs.launchpad.net/charm-hacluster/+bug/1400481), 会出现这种情况下列的情况,哪怕是此时re-add unit也会分裂成两个clusters而无法修复状态.
Online: [ juju-7d2712-gnocchi-24 ] OFFLINE: [ juju-7d2712-gnocchi-12 juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-23 ] Stopped: [ juju-7d2712-gnocchi-12 juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-23 juju-7d2712-gnocchi-24 ]
这种情况,需要针对两个clusters都删除多余的node (crm node delete juju-7d2712-gnocchi-24), 这样两个clusters会变成这种状态:
root@juju-7d2712-gnocchi-24:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-24 (version 1.1.18-2b07d5c5a9) - partition WITHOUT quorum
Last updated: Fri Oct 16 09:54:47 2020
Last change: Fri Oct 16 09:54:11 2020 by root via crm_attribute on juju-7d2712-gnocchi-241 node configured
2 resources configuredOnline: [ juju-7d2712-gnocchi-24 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): StoppedClone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Stopped: [ juju-7d2712-gnocchi-24 ]root@juju-7d2712-gnocchi-13:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-25 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Fri Oct 16 09:55:44 2020
Last change: Fri Oct 16 09:49:28 2020 by root via crm_node on juju-7d2712-gnocchi-252 nodes configured
3 resources configuredOnline: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-25 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-7d2712-gnocchi-13Clone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Started: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-25 ]root@juju-7d2712-gnocchi-25:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-25 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Fri Oct 16 09:56:13 2020
Last change: Fri Oct 16 09:49:28 2020 by root via crm_node on juju-7d2712-gnocchi-252 nodes configured
3 resources configuredOnline: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-25 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-7d2712-gnocchi-13Clone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Started: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-25 ]
时间长一点,例如我之前以为它是错的,但过了一晚上,两个clusers变成一个了:
ubuntu@zhhuabj-bastion:~$ juju status gnocchi
Model Controller Cloud/Region Version SLA Timestamp
gnocchi zhhuabj stsstack/stsstack 2.8.1 unsupported 02:01:43ZApp Version Status Scale Charm Store Rev OS Notes
gnocchi 4.2.5 active 3 gnocchi jujucharms 113 ubuntu
gnocchi-hacluster active 3 hacluster jujucharms 150 ubuntu Unit Workload Agent Machine Public address Ports Message
gnocchi/1* active idle 13 10.5.3.80 8041/tcp Unit is readygnocchi-hacluster/0* active idle 10.5.3.80 Unit is ready and clustered
gnocchi/4 active idle 24 10.5.3.149 8041/tcp Unit is readygnocchi-hacluster/4 active idle 10.5.3.149 Unit is ready and clustered
gnocchi/5 active idle 25 10.5.1.40 8041/tcp Unit is readygnocchi-hacluster/5 active idle 10.5.1.40 Unit is ready and clusteredMachine State DNS Inst id Series AZ Message
13 started 10.5.3.80 63b667b9-0aa1-4188-b03c-89f117140572 bionic nova ACTIVE
24 started 10.5.3.149 19f405e4-62ed-4117-847f-dc4a5696fbeb bionic nova ACTIVE
25 started 10.5.1.40 991d7693-b449-4955-bae7-e734688e9151 bionic nova ACTIVEroot@juju-7d2712-gnocchi-25:~# corosync-quorumtool -lMembership information
----------------------Nodeid Votes Name1005 1 10.5.1.40 (local)1000 1 10.5.3.801004 1 10.5.3.149root@juju-7d2712-gnocchi-25:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-25 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Sun Oct 18 02:02:34 2020
Last change: Fri Oct 16 10:42:03 2020 by hacluster via crmd on juju-7d2712-gnocchi-243 nodes configured
4 resources configuredOnline: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-7d2712-gnocchi-24Clone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Started: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]root@juju-7d2712-gnocchi-13:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-25 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Sun Oct 18 02:03:00 2020
Last change: Fri Oct 16 10:42:03 2020 by hacluster via crmd on juju-7d2712-gnocchi-243 nodes configured
4 resources configuredOnline: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-7d2712-gnocchi-24Clone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Started: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]root@juju-7d2712-gnocchi-24:~# crm status
Stack: corosync
Current DC: juju-7d2712-gnocchi-25 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Sun Oct 18 02:03:29 2020
Last change: Fri Oct 16 10:42:03 2020 by hacluster via crmd on juju-7d2712-gnocchi-243 nodes configured
4 resources configuredOnline: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]Full list of resources:Resource Group: grp_gnocchi_vipsres_gnocchi_bf9661e_vip (ocf::heartbeat:IPaddr2): Started juju-7d2712-gnocchi-24Clone Set: cl_res_gnocchi_haproxy [res_gnocchi_haproxy]Started: [ juju-7d2712-gnocchi-13 juju-7d2712-gnocchi-24 juju-7d2712-gnocchi-25 ]
参考
[1], http://docs.openstack.org/high-availability-guide/content/_setting_up_corosync.html
[2], http://nmshuishui.blog.51cto.com/1850554/1399811
[3], http://zeldor.biz/2010/12/activepassive-cluster-with-pacemaker-corosync/
[4], https://review.openstack.org/#/c/79018
[5], http://www.cnblogs.com/sammyliu/p/4730517.html