K8S自己动手系列 - 1.2 - 节点管理
节点管理
节点状态
please refer: https://kubernetes.io/docs/concepts/architecture/nodes/#node-status
这里我们重点关注下Condition部分
如文档描述
节点失联 or 节点宕机
查看节点列表
~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 21h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 # 查看pod和其运行所在节点 ~ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 19h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 19h 10.244.1.17 worker02 <none> <none>
现在我们使worker02关机
~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 20h 10.244.1.17 worker02 <none> <none> ~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 22h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 ~ ping 192.168.100.117 PING 192.168.100.117 (192.168.100.117) 56(84) bytes of data. ^C --- 192.168.100.117 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1007ms
可以看到虽然查看状态还是正常,实际上节点已经无法ping通,再次查看,发现节点已经处于NotReady状态
~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 22h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 NotReady <none> 21h v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Running 0 20h 10.244.1.17 worker02 <none> <none>
查看节点详情,发现节点已经被打上node.kubernetes.io/unreachable:NoExecute和node.kubernetes.io/unreachable:NoSchedule两个污点,conditions也随之变化
~ kubectl describe node worker02 Name: worker02 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 ... Annotations: flannel.alpha.coreos.com/backend-data: {"VtepMAC":"4e:0a:b4:88:0d:83"} ... CreationTimestamp: Sat, 08 Jun 2019 12:42:00 +0800 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Sun, 09 Jun 2019 10:03:52 +0800 Sun, 09 Jun 2019 10:04:52 +0800 NodeStatusUnknown Kubelet stopped posting node status.
再次查看pod,发现已经被调度到其他节点
~ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-msd6t 1/1 Terminating 0 20h 10.244.1.17 worker02 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 71s 10.244.0.14 worker01 <none> <none>
查看正在terminating的pod状态:
~ kubectl describe pod/nginx-6657c9ffc-msd6t Name: nginx-6657c9ffc-msd6t Node: worker02/192.168.100.117 Status: Terminating (lasts 2m5s) Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none>
发现两个污点容忍tolerations选项,解释了为什么节点处于NotReady后,pod还是running状态一段时间后被terminate重建
更多细节,please refer https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller
移除节点
为了实验效果,先把pod扩容到10个
~ kubectl scale --replicas=10 deployment/nginx deployment.extensions/nginx scaled ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-6lfbp 1/1 Running 0 8s 10.244.1.20 worker02 <none> <none> nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-8cbpl 1/1 Running 0 8s 10.244.1.23 worker02 <none> <none> nginx-6657c9ffc-dbr7c 1/1 Running 0 8s 10.244.1.22 worker02 <none> <none> nginx-6657c9ffc-kk84d 1/1 Running 0 8s 10.244.1.21 worker02 <none> <none> nginx-6657c9ffc-kp64c 1/1 Running 0 8s 10.244.0.16 worker01 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 40m 10.244.0.14 worker01 <none> <none> nginx-6657c9ffc-pxbs2 1/1 Running 0 8s 10.244.1.19 worker02 <none> <none> nginx-6657c9ffc-sb85v 1/1 Running 0 8s 10.244.1.18 worker02 <none> <none> nginx-6657c9ffc-sxb25 1/1 Running 0 8s 10.244.0.15 worker01 <none> <none>
使用drain放干节点,然后删除节点
~ kubectl drain worker02 --force --grace-period=900 --ignore-daemonsets node/worker02 already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-amd64-crq7k, kube-system/kube-proxy-x9h7m evicting pod "nginx-6657c9ffc-sb85v" evicting pod "nginx-6657c9ffc-dbr7c" evicting pod "nginx-6657c9ffc-6lfbp" evicting pod "nginx-6657c9ffc-8cbpl" evicting pod "nginx-6657c9ffc-kk84d" evicting pod "nginx-6657c9ffc-pxbs2" pod/nginx-6657c9ffc-kk84d evicted pod/nginx-6657c9ffc-dbr7c evicted pod/nginx-6657c9ffc-8cbpl evicted pod/nginx-6657c9ffc-pxbs2 evicted pod/nginx-6657c9ffc-sb85v evicted pod/nginx-6657c9ffc-6lfbp evicted node/worker02 evicted # 执行完后发现所有pod已经运行在worker01上了 ~ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6657c9ffc-45svt 1/1 Running 0 63s 10.244.0.18 worker01 <none> <none> nginx-6657c9ffc-489zz 1/1 Running 0 63s 10.244.0.21 worker01 <none> <none> nginx-6657c9ffc-49vqv 1/1 Running 0 63s 10.244.0.20 worker01 <none> <none> nginx-6657c9ffc-6q2pt 1/1 Running 1 20h 10.244.0.11 worker01 <none> <none> nginx-6657c9ffc-fq7gg 1/1 Running 0 63s 10.244.0.22 worker01 <none> <none> nginx-6657c9ffc-kl6j9 1/1 Running 0 63s 10.244.0.19 worker01 <none> <none> nginx-6657c9ffc-kp64c 1/1 Running 0 4m31s 10.244.0.16 worker01 <none> <none> nginx-6657c9ffc-n9s4p 1/1 Running 0 44m 10.244.0.14 worker01 <none> <none> nginx-6657c9ffc-ssmph 1/1 Running 0 63s 10.244.0.17 worker01 <none> <none> nginx-6657c9ffc-sxb25 1/1 Running 0 4m31s 10.244.0.15 worker01 <none> <none> # 删除节点 ~ kubectl delete node worker02 node "worker02" deleted ~ kubectl get node NAME STATUS ROLES AGE VERSION worker01 Ready master 22h v1.14.3
更多详情 please refer https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
加入节点
使用kubeadm初始化master节点后,会有一段提示,说明如何加入新节点,那个务必要保存好
接下来我们将刚刚被下线的worker02重置,然后重新加入集群
~ kubeadm reset [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks W0609 11:00:23.206928 7376 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory [reset] No etcd config found. Assuming external etcd [reset] Please manually reset etcd to prevent further issues [reset] Stopping the kubelet service [reset] unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes] [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually. For example: iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. ~ iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
执行kubeadm join
~ kubeadm join 192.168.101.113:6443 --token ss6flg.csw4u0ok134n2fy1 \ --discovery-token-ca-cert-hash sha256:bac9a150228342b7cdedf39124ef2108653db1f083e9f547d251e08f03c41945 [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
查看节点,发现已经添加成功
~ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME worker01 Ready master 23h v1.14.3 192.168.101.113 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5 worker02 Ready <none> 33s v1.14.3 192.168.100.117 <none> Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.9.5
低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。
持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。
转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。
- 上一篇
K8S自己动手系列 - 1.1 - 集群搭建
准备 作为学习与实战的记录,笔者计划编写一系列实战系列文章,主要根据实际使用场景选取Kubernetes最常用的功能进行实验,并使用当前最流行的kubeadm安装集群。本文用到的所有实验环境均基于笔者个人工作站虚拟化多个VM而来,如果读者有一台性能尚可的工作站台式机,推荐读者参考本文操作过程实战演练一遍,有助于对Kubernetes各项概念及功能的理解。 前期准备: 两台VM,笔者安装的OS为ubuntu 16.04 保证两台VM网络互通,为了使网络拓扑尽可能简单,我使用的虚拟化软件为VirtualBox,宿主机为ubuntu 19.04,网络模式为Bridge 先解决网络问题 Ubuntu APT https://opsx.alibaba.com/mirror 搜索 ubuntu Kubernetes APT Repo https://opsx.alibaba.com/mirror 搜索 Kubernetes Docker Image Repo # 此过程需要主机先安装好docker-daemon,参考集群安装部分有说明 1. 安装/升级Docker客户端 推荐安装1.10.0以上版...
- 下一篇
在Centos7上安装Jenkins
胡言乱语 Jenkins是开源的,使用Java编写的持续集成的工具,在Centos上可以通过yum命令行直接安装。记录下安装的过程,方便以后查找。 安装 java $ sudo yum install -y java ...... 安装Jenkins $ sudo wget -O /etc/yum.repos.d/jenkins.repo http://jenkins-ci.org/redhat/jenkins.repo $ sudo rpm --import http://pkg.jenkins-ci.org/redhat/jenkins-ci.org.key $ sudo yum install -y jenkins ...... 配置 Jenkins 的安装目录是:/var/lib/jenkins/Jenkins 的皮遏制文件地址:/etc/sysconfig/jenkins这里介绍下三个比较重要的配置:JENKINS_HOMEJENKINS_USERJENKINS_PORT JENKINS_HOME是Jenkins的主目录,Jenkins工作的目录都放在这里,Jenkins储存...
相关文章
文章评论
共有0条评论来说两句吧...
文章二维码
点击排行
推荐阅读
最新文章
- CentOS8编译安装MySQL8.0.19
- CentOS7,8上快速安装Gitea,搭建Git服务器
- CentOS6,7,8上安装Nginx,支持https2.0的开启
- CentOS关闭SELinux安全模块
- SpringBoot2初体验,简单认识spring boot2并且搭建基础工程
- SpringBoot2整合MyBatis,连接MySql数据库做增删改查操作
- CentOS8安装MyCat,轻松搞定数据库的读写分离、垂直分库、水平分库
- CentOS8安装Docker,最新的服务器搭配容器使用
- CentOS7,CentOS8安装Elasticsearch6.8.6
- Red5直播服务器,属于Java语言的直播服务器