情况说明
原始集群中使用flannel作为网络插件,后来想在集群中使用calico网络插件建立网络规则,但在安装的过程中因为一些失误操作(没有读懂官方的安装说明),导致集群宕机,coredns始终处于ContainerCreating状态。
报错信息:
首先需要说明一下,coredns的状态和flannel有着必然的联系,所以首先使用describe查看pods的错误信息。
[root@master ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5bfd685c78-mmjxc 0/1 ContainerCreating 0 9s
coredns-5bfd685c78-zmmpv 0/1 ContainerCreating 0 39s
etcd-master 1/1 Running 5 47d
kube-apiserver-master 1/1 Running 5 47d
kube-controller-manager-master 1/1 Running 5 47d
kube-flannel-ds-8vzsv 1/1 Running 1 15m
kube-flannel-ds-zbqt9 1/1 Running 1 15m
kube-flannel-ds-zwxrh 1/1 Running 1 15m
kube-proxy-6r25s 1/1 Running 5 47d
kube-proxy-m8gxx 1/1 Running 5 47d
kube-proxy-s2jb8 1/1 Running 5 47d
kube-scheduler-master 1/1 Running 5 47d
使用describe查看错误信息。
[root@master ~]# kubectl describe pods -n kube-system coredns-5bfd685c78-mmjxc
Name: coredns-5bfd685c78-mmjxc
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: node2/192.168.10.220
Start Time: Tue, 14 Jul 2020 22:11:33 +0800
Labels: k8s-app=kube-dnspod-template-hash=5bfd685c78
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5bfd685c78
Containers:coredns:Container ID: Image: k8s.gcr.io/coredns:1.3.1Image ID: Ports: 53/UDP, 53/TCP, 9153/TCPHost Ports: 0/UDP, 0/TCP, 0/TCPArgs:-conf/etc/coredns/CorefileState: WaitingReason: ContainerCreatingReady: FalseRestart Count: 0Limits:memory: 170MiRequests:cpu: 100mmemory: 70MiLiveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3Environment: <none>Mounts:/etc/coredns from config-volume (ro)/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-vcfp8 (ro)
Conditions:Type StatusInitialized True Ready False ContainersReady False PodScheduled True
Volumes:config-volume:Type: ConfigMap (a volume populated by a ConfigMap)Name: corednsOptional: falsecoredns-token-vcfp8:Type: Secret (a volume populated by a Secret)SecretName: coredns-token-vcfp8Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnlynode-role.kubernetes.io/master:NoSchedulenode.kubernetes.io/not-ready:NoExecute for 300snode.kubernetes.io/unreachable:NoExecute for 300s
Events:Type Reason Age From Message---- ------ ---- ---- -------Normal Scheduled 25s default-scheduler Successfully assigned kube-system/coredns-5bfd685c78-mmjxc to node2Warning FailedCreatePodSandBox 24s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "071c8fc6acb87838fd4ee341479a0769a97401c481a93b5b54f8812ba6fa0ed4" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/Warning FailedCreatePodSandBox 23s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e35ce83b7c4f18d1aa8c2a57cea6ade03b33ad98041d1b5c0d218ddfc300f23e" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/Warning FailedCreatePodSandBox 22s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e043cc0de7e693fa36d8115ae07b44560ab1883363127fca15becc36b7976ecc" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/Warning FailedCreatePodSandBox 21s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d128ed26899ab9cf2bf9f62784a3bb723b63141cd3fff82f860325ad3433c322" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/Warning FailedCreatePodSandBox 20s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "045f258154329ba4e46192a01df441cf04c395a3599fd68ebd5a5bf7c648fcd1" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/Warning FailedCreatePodSandBox 19s kubelet, node2 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dc776f01abbbf1a219f8014c377f0359db90c4ec74cfe8baa993ff63a0b805f7" network for pod "coredns-5bfd685c78-mmjxc": NetworkPlugin cni failed to set up pod "coredns-5bfd685c78-mmjxc_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
可以看出node节点中的网络不通畅,
解决方法
卸载master、node节点上所有关于calico的安装信息。
l连接Node节点,使用以下命令删除node节点上关于calico的配置信息,并重启kubelet服务
rm -rf /etc/cni/net.d/*rm -rf /var/lib/cni/calicosystemctl restart kubelet