您现在的位置是:首页 > 文章详情

K8S自己动手系列 - 1.2 - 节点管理

日期:2019-06-08点击:501

节点管理

节点状态

please refer: https://kubernetes.io/docs/concepts/architecture/nodes/#node-status
这里我们重点关注下Condition部分
如文档描述

节点失联 or 节点宕机

查看节点列表

 ~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 21h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 # 查看pod和其运行所在节点 ~ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 19h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 19h 10.244.1.17 worker02 <none> <none>

现在我们使worker02关机

 ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 20h 10.244.1.17 worker02 <none> <none> ~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 22h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 ~ ping 192.168.100.117 PING 192.168.100.117 (192.168.100.117) 56(84) bytes of data. ^C --- 192.168.100.117 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1007ms

可以看到虽然查看状态还是正常,实际上节点已经无法ping通,再次查看,发现节点已经处于NotReady状态

 ~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 22h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 NotReady <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 20h 10.244.1.17 worker02 <none> <none>

查看节点详情,发现节点已经被打上node.kubernetes.io/unreachable:NoExecute和node.kubernetes.io/unreachable:NoSchedule两个污点,conditions也随之变化

 ~ kubectl describe node worker02 Name: worker02 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 ... Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"4e:0a:b4:88:0d:83"} ... CreationTimestamp: Sat, 08 Jun 2019 12:42:00 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status.

再次查看pod,发现已经被调度到其他节点

 ~ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Terminating 0 20h 10.244.1.17 worker02 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 71s 10.244.0.14 worker01 <none> <none>

查看正在terminating的pod状态:

 ~ kubectl describe pod/nginx-6657c9ffc-msd6t Name: nginx-6657c9ffc-msd6t Node: worker02/192.168.100.117 Status: Terminating (lasts 2m5s) Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none>

发现两个污点容忍tolerations选项,解释了为什么节点处于NotReady后,pod还是running状态一段时间后被terminate重建
更多细节,please refer https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller

移除节点

为了实验效果,先把pod扩容到10个

 ~ kubectl scale --replicas=10 deployment/nginx deployment.extensions/nginx scaled ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6lfbp 1/1 Running 0 8s 10.244.1.20 worker02 <none> <none> nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-8cbpl 1/1 Running 0 8s 10.244.1.23 worker02 <none> <none> nginx-6657c9ffc-dbr7c 1/1 Running 0 8s 10.244.1.22 worker02 <none> <none> nginx-6657c9ffc-kk84d 1/1 Running 0 8s 10.244.1.21 worker02 <none> <none> nginx-6657c9ffc-kp64c 1/1 Running 0 8s 10.244.0.16 worker01 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 40m 10.244.0.14 worker01 <none> <none> nginx-6657c9ffc-pxbs2 1/1 Running 0 8s 10.244.1.19 worker02 <none> <none> nginx-6657c9ffc-sb85v 1/1 Running 0 8s 10.244.1.18 worker02 <none> <none> nginx-6657c9ffc-sxb25 1/1 Running 0 8s 10.244.0.15 worker01 <none> <none>

使用drain放干节点,然后删除节点

 ~ kubectl drain worker02 --force --grace-period=900 --ignore-daemonsets node/worker02 already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-amd64-crq7k, kube-system/kube-proxy-x9h7m evicting pod "nginx-6657c9ffc-sb85v" evicting pod "nginx-6657c9ffc-dbr7c" evicting pod "nginx-6657c9ffc-6lfbp" evicting pod "nginx-6657c9ffc-8cbpl" evicting pod "nginx-6657c9ffc-kk84d" evicting pod "nginx-6657c9ffc-pxbs2" pod/nginx-6657c9ffc-kk84d evicted pod/nginx-6657c9ffc-dbr7c evicted pod/nginx-6657c9ffc-8cbpl evicted pod/nginx-6657c9ffc-pxbs2 evicted pod/nginx-6657c9ffc-sb85v evicted pod/nginx-6657c9ffc-6lfbp evicted node/worker02 evicted # 执行完后发现所有pod已经运行在worker01上了 ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-45svt 1/1 Running 0 63s 10.244.0.18 worker01 <none> <none> nginx-6657c9ffc-489zz 1/1 Running 0 63s 10.244.0.21 worker01 <none> <none> nginx-6657c9ffc-49vqv 1/1 Running 0 63s 10.244.0.20 worker01 <none> <none> nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-fq7gg 1/1 Running 0 63s 10.244.0.22 worker01 <none> <none> nginx-6657c9ffc-kl6j9 1/1 Running 0 63s 10.244.0.19 worker01 <none> <none> nginx-6657c9ffc-kp64c 1/1 Running 0 4m31s 10.244.0.16 worker01 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 44m 10.244.0.14 worker01 <none> <none> nginx-6657c9ffc-ssmph 1/1 Running 0 63s 10.244.0.17 worker01 <none> <none> nginx-6657c9ffc-sxb25 1/1 Running 0 4m31s 10.244.0.15 worker01 <none> <none> # 删除节点 ~ kubectl delete node worker02 node "worker02" deleted ~ kubectl get node NAME STATUS ROLES AGE VERSION worker01 Ready master 22h v1.14.3

更多详情 please refer https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

加入节点

使用kubeadm初始化master节点后,会有一段提示,说明如何加入新节点,那个务必要保存好
接下来我们将刚刚被下线的worker02重置,然后重新加入集群

 ~ kubeadm reset [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks W0609 11:00:23.206928 7376 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] No etcd config found. Assuming external etcd [reset] Please manually reset etcd to prevent further issues [reset] Stopping the kubelet service [reset] unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes] [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually. For example: iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. ~ iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

执行kubeadm join

 ~ kubeadm join 192.168.101.113:6443 --token ss6flg.csw4u0ok134n2fy1 \ --discovery-token-ca-cert-hash sha256:bac9a150228342b7cdedf39124ef2108653db1f083e9f547d251e08f03c41945 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster. 

查看节点,发现已经添加成功

 ~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 23h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 33s v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 
原文链接:https://yq.aliyun.com/articles/704920
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章