版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明 (作者:张华 发表于:2021-07-22)
debug juju-controller
juju ssh -m controller 0
juju_engine_report | pastebinit -f yaml
eg:
instance-poller:inputs:- api-caller- environ-tracker- clock- valid-credential-flag- migration-fortress- migration-inactive-flagstart-count: 19847state: starting
上面说instance-poller有问题. instance-poller用于juju-controller定期从maas-api轮循maas machine/instance的数据. 如下面可能和spaces相关.
juju reload-spaces --debug --verbose --show-log
ERROR could not reload spaces: unexpected: Get "http://10.91.42.24/MAAS/api/2.0/spaces/": EOF
16:16:21 DEBUG cmd supercommand.go:537 error stack:
unexpected: Get "http://10.91.42.24/MAAS/api/2.0/spaces/": EOF
检查一下maas端和juju端的spaces信息是否一致, 不一致可试试重启juju-controller试试.
maas admin spaces read
juju spaces#juju spaces
ceph-access-space 5 10.91.246.0/23
ceph-replica-space 4 10.91.250.0/23
external-space 3 10.91.248.0/23
internal-space 2 10.91.244.0/23
oam-space 1 10.91.42.0/23
#cat maas-admin-space-read.txt | jq '.[] | {name:.name,id:.id,url:.resource_uri,subnet:.subnets[].cidr}' -c
{"name":"oam-space","id":1,"url":"/MAAS/api/2.0/spaces/1/","subnet":"10.91.42.0/23"}
{"name":"internal-space","id":2,"url":"/MAAS/api/2.0/spaces/2/","subnet":"10.91.244.0/23"}
{"name":"external-space","id":3,"url":"/MAAS/api/2.0/spaces/3/","subnet":"10.91.248.0/23"}
{"name":"ceph-replica-space","id":4,"url":"/MAAS/api/2.0/spaces/4/","subnet":"10.91.250.0/23"}
{"name":"ceph-access-space","id":5,"url":"/MAAS/api/2.0/spaces/5/","subnet":"10.91.246.0/23"}
{"name":"undefined","id":-1,"url":"/MAAS/api/2.0/spaces/undefined/","subnet":"192.168.122.0/24"}# why undefine space is missing, see db
juju ssh -m controller 0
mongo --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/admin --ssl -u "$(ls /var/lib/juju/agents/)" -p "$(sudo grep statepassword /var/lib/juju/agents/$(ls /var/lib/juju/agents/)/agent.conf|awk '{print $2}')" --authenticationDatabase admin
use juju;
db.spaces.find().pretty()
mongodump --port 37017 --sslAllowInvalidCertificates --ssl --authenticationDatabase admin -u "$(sudo awk '/^tag:/ { print $2 }' /var/lib/juju/agents/machine-*/agent.conf)" -p "$(sudo awk '/^statepassword:/ { print $2 }' /var/lib/juju/agents/machine-*/agent.conf)" --db juju --gzip
2)tar -czf mongodump-`date '+%F_%T'| tr -s ':' '-'`.tgz dump
不管controller side是不是少一个undefine的space吧,至少:
- instance-poller正在访问/MAAS/api/2.0/machines出了EOF错
- reload-spaces也出了EOF错
这两点都说明controller和maas的80/5240端口似乎不通.所以接下来可以在maas上安装maas-cli再试以排除网络问题.但是使用’maas admin spaces read’时能看到下列请求的日志,但reload-spaces时似乎看不到.
... regiond: [info] 10.91.42.24 GET /MAAS/api/2.0/spaces/ HTTP/1.1 --> 200 OK (referrer: -; agent: Python-httplib2/0.9.2 (gzip))
难道是连接被代理了吗( juju model-config -m controller ),抓包试,10.91.43.2是其中一个controoler.
sudo tcpdump -i any src host 10.91.43.2 -w pcap-02.pcap
juju reload-spaces --debug --verbose --show-log
继续抓额外的包:
tcpdump -i any dst host 10.91.42.24 -w pcap-controller.pcap on machine 10.91.43.2
tcpdump -i any src host 10.91.43.2 -w pcap-maas.pcap on machine 10.91.42.24
juju reload-spaces
maas admin spaces read
难道是haproxy有问题,通过下列方法绕开haproxy试试:
sudo bash -c 'cat > testcloud.yaml' << EOF
clouds:foundations-maas:type: maasauth-types: [oauth1]endpoint: http://10.91.42.23:5240/MAAS
EOF
juju update-cloud maas-cloud -f testcloud.yaml --client --controller new_maas_cloud_controller
juju reload-spaces --debug --verbose --show-log
最后是因为这个设置,comment之后就好了(客户自己加的).
#defaults
# timeout connect 10s
# timeout client 30s
# timeout server 30s