Kubernetes管理经验-低调大师

Kubernetes管理经验

2019-05-28 786

集群管理相关命令

kubectl get cs

# 查看节点
kubectl get nodes

kubectl get ing pdd --n java
# 不调度
kubectl taint nodes node1 key=value:NoSchedule
kubectl cluster-info dump


kubectl get svc --sort-by=.metadata.creationTimestamp
kubectl get no --sort-by=.metadata.creationTimestamp
kubectl get po --field-selector spec.nodeName=xxxx
kubectl get events  --field-selector involvedObject.kind=Service --sort-by='.metadata.creationTimestamp'

参考链接:

kubernetes 节点维护 cordon, drain, uncordon

应用管理相关

kubectl top pod
kubectl delete deployment,services -l app=nginx 
kubectl scale deployment/nginx-deployment --replicas=2
kubectl get svc --all-namespaces=true

强制删除

有时删除pv/pvc时会有问题,这个使用得加2个命令参数--grace-period=0 --force

删除所有失败的pod

  kubectl get po --all-namespaces --field-selector 'status.phase==Failed'
  kubectl delete po  --field-selector 'status.phase==Failed'

一些技巧

k8s目前没有没有类似docker-compose的depends_on依赖启动机制,建议使用wait-for-it重写镜像的command.

集群管理经(教)验(训)

节点问题

taint别乱用

kubectl taint nodes xx  elasticsearch-test-ready=true:NoSchedule
kubectl taint nodes xx  elasticsearch-test-ready:NoSchedule-

master节点本身就自带taint,所以才会导致我们发布的容器不会在master节点上面跑.但是如果自定义taint的话就要注意了!所有DaemonSet和kube-system,都需要带上相应的tolerations.不然该节点会驱逐所有不带这个tolerations的容器,甚至包括网络插件,kube-proxy,后果相当严重,请注意

taint跟tolerations是结对对应存在的,操作符也不能乱用

NoExecute

      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoExecute

NoExecute是立刻驱逐不满足容忍条件的pod,该操作非常凶险,请务必先行确认系统组件有对应配置tolerations.

特别注意用Exists这个操作符是无效的,必须用Equal

NoSchedule

      tolerations:
        - key: "elasticsearch-exclusive"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "elasticsearch-exclusive"
          operator: "Equal"
          value: "true"
          effect: "NoExecute"

kubectl taint node cn-shenzhen.xxxx elasticsearch-exclusive=true:NoSchedule

是尽量不往这上面调度,但实际上还是会有pod在那上面跑

Exists和Exists随意使用,不是很影响

值得一提的是,同一个key可以同时存在多个effect

Taints:             elasticsearch-exclusive=true:NoExecute
                    elasticsearch-exclusive=true:NoSchedule

其他参考链接：

隔离节点的正确步骤

# 驱逐除了ds以外所有的pod
kubectl drain <node name>   --ignore-daemonsets
kubectl cordon <node name>

这个时候运行get node命令,状态会变

node.xx   Ready,SchedulingDisabled   <none>   189d   v1.11.5

最后

kubectl delete <node name>

维护节点的正确步骤

kubectl drain <node name> --ignore-daemonsets
kubectl uncordon <node name>

节点出现磁盘压力(DiskPressure)

--eviction-hard=imagefs.available<15%,memory.available<300Mi,nodefs.available<10%,nodefs.inodesFree<5%

kubelet在启动时指定了磁盘压力,以阿里云为例,imagefs.available<15%意思是说容器的读写层少于15%的时候,节点会被驱逐.节点被驱逐的后果就是产生DiskPressure这种状况,并且节点上再也不能运行任何镜像,直至磁盘问题得到解决.如果节点上容器使用了宿主目录,这个问题将会是致命的.因为你不能把目录删除掉,但是真是这些宿主机的目录堆积,导致了节点被驱逐.

所以,平时要养好良好习惯,容器里面别瞎写东西(容器里面写文件会占用ephemeral-storage,ephemeral-storage过多pod会被驱逐),多使用无状态型容器,谨慎选择存储方式,尽量别用hostpath这种存储

出现状况时,真的有种欲哭无泪的感觉.

Events:
  Type     Reason                 Age                   From                                            Message
  ----     ------                 ----                  ----                                            -------
  Warning  FreeDiskSpaceFailed    23m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 5182058496 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  ImageGCFailed          18m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 6089891840 bytes, but freed 0 bytes
  Warning  FreeDiskSpaceFailed    13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Warning  ImageGCFailed          13m                   kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4953321472 bytes, but freed 0 bytes
  Normal   NodeHasNoDiskPressure  10m (x5 over 47d)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasNoDiskPressure
  Normal   Starting               10m                   kube-proxy, node.xxxx1  Starting kube-proxy.
  Normal   NodeHasDiskPressure    10m (x4 over 42m)     kubelet, node.xxxx1     Node node.xxxx1 status is now: NodeHasDiskPressure
  Warning  EvictionThresholdMet   8m29s (x19 over 42m)  kubelet, node.xxxx1     Attempting to reclaim ephemeral-storage
  Warning  ImageGCFailed          3m4s                  kubelet, node.xxxx1     failed to garbage collect required amount of images. Wanted to free 4920913920 bytes, but freed 0 bytes

参考链接:

节点CPU彪高

有可能是节点在进行GC(container GC/image GC),用describe node查查.我有次遇到这种状况,最后节点上的容器少了很多,也是有点郁闷

Events:
  Type     Reason                 Age                 From                                         Message
  ----     ------                 ----                ----
  Warning  ImageGCFailed          45m                 kubelet, cn-shenzhen.xxxx  failed to get image stats: rpc error: code = DeadlineExceeded desc = context deadline exceeded

参考:

kubelet 源码分析：Garbage Collect

对象问题

pod

pod频繁重启

原因有多种,不可一概而论

资源达到limit设置值

调高limit或者检查应用

Readiness/Liveness connection refused

Readiness检查失败的也会重启,但是Readiness检查失败不一定是应用的问题,如果节点本身负载过重,也是会出现connection refused或者timeout

这个问题要上节点排查

pod被驱逐(Evicted)

节点加了污点导致pod被驱逐
ephemeral-storage超过限制被驱逐
1. EmptyDir 的使用量超过了他的 SizeLimit，那么这个 pod 将会被驱逐
2. Container 的使用量（log，如果没有 overlay 分区，则包括 imagefs）超过了他的 limit，则这个 pod 会被驱逐
3. Pod 对本地临时存储总的使用量（所有 emptydir 和 container）超过了 pod 中所有container 的 limit 之和，则 pod 被驱逐

ephemeral-storage是一个pod用的临时存储.

resources:
       requests: 
           ephemeral-storage: "2Gi"
       limits:
           ephemeral-storage: "3Gi"

节点被驱逐后通过get po还是能看到,用describe命令,可以看到被驱逐的历史原因

Message: The node was low on resource: ephemeral-storage. Container codis-proxy was using 10619440Ki, which exceeds its request of 0.

参考:

kubectl exec 进入容器失败

这种问题我在搭建codis-server的时候遇到过,当时没有配置就绪以及健康检查.但获取pod描述的时候,显示running.其实这个时候容器以及不正常了.

~ kex codis-server-3 sh
rpc error: code = 2 desc = containerd: container not found
command terminated with exit code 126

解决办法:删了这个pod,配置livenessProbe

pod的virtual host name

Deployment衍生的pod,virtual host name就是pod name.

StatefulSet衍生的pod,virtual host name是<pod name>.<svc name>.<namespace>.svc.cluster.local.相比Deployment显得更有规律一些.而且支持其他pod访问

pod接连Crashbackoff

Crashbackoff有多种原因.

沙箱创建(FailedCreateSandBox)失败,多半是cni网络插件的问题

镜像拉取,有中国特色社会主义的问题,可能太大了,拉取较慢

也有一种可能是容器并发过高,流量雪崩导致.

比如,现在有3个容器abc,a突然遇到流量洪峰导致内部奔溃,继而Crashbackoff,那么a就会被service剔除出去,剩下的bc也承载不了那么多流量,接连崩溃,最终网站不可访问.这种情况,多见于高并发网站+低效率web容器.

在不改变代码的情况下,最优解是增加副本数,并且加上hpa,实现动态伸缩容.

deploy

MinimumReplicationUnavailable

如果deploy配置了SecurityContext,但是api-server拒绝了,就会出现这个情况,在api-server的容器里面,去掉SecurityContextDeny这个启动参数.

具体见Using Admission Controllers

service

建了一个服务,但是没有对应的po,会出现什么情况?

请求时一直不会有响应,直到request timeout

参考

Configure Out Of Resource Handling

service connection refuse

原因可能有

pod没有设置readinessProbe,请求到未就绪的pod
kube-proxy宕机了(kube-proxy负责转发请求)
网络过载

service没有负载均衡

检查一下是否用了headless service.headless service是不会自动负载均衡的...

kind: Service
spec:
# clusterIP: None的即为`headless service`
  type: ClusterIP
  clusterIP: None

具体表现service没有自己的虚拟IP,nslookup会出现所有pod的ip.但是ping的时候只会出现第一个pod的ip

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.31.10.94 172-31-10-94.consul.default.svc.cluster.local
Address 2: 172.31.10.95 172-31-10-95.consul.default.svc.cluster.local
Address 3: 172.31.11.176 172-31-11-176.consul.default.svc.cluster.local

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.973 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.170 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.170/0.571/0.973 ms

/ # ping consul
PING consul (172.31.10.94): 56 data bytes
64 bytes from 172.31.10.94: seq=0 ttl=62 time=0.206 ms
64 bytes from 172.31.10.94: seq=1 ttl=62 time=0.178 ms
^C
--- consul ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.178/0.192/0.206 ms

普通的type: ClusterIP service,nslookup会出现该服务自己的IP

/ # nslookup consul
nslookup: can't resolve '(null)': Name does not resolve

Name:      consul
Address 1: 172.30.15.52 consul.default.svc.cluster.local

ReplicationController不更新

ReplicationController不是用apply去更新的,而是kubectl rolling-update,但是这个指令也废除了,取而代之的是kubectl rollout.所以应该使用kubectl rollout作为更新手段,或者懒一点,apply file之后,delete po.

尽量使用deploy吧.

StatefulSet更新失败

StatefulSet是逐一更新的,观察一下是否有Crashbackoff的容器,有可能是这个容器导致更新卡住了,删掉即可.

进阶调度

使用亲和度确保节点在目标节点上运行

        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: elasticsearch-test-ready
                operator: Exists

参考链接:

使用反亲和度确保每个节点只跑同一个应用

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: 'app'
                operator: In
                values:
                - nginx-test2
            topologyKey: "kubernetes.io/hostname"
            namespaces: 
            - test

容忍运行

master节点之所以不允许普通镜像,是因为master节点带了污点,如果需要强制在master上面运行镜像,则需要容忍相应的污点.

      tolerations:
        - effect: NoSchedule
          key: node-role.kubernetes.io/master
          operator: Exists
        - effect: NoSchedule
          key: node.cloudprovider.kubernetes.io/uninitialized
          operator: Exists

阿里云Kubernetes问题

修改默认ingress

新建一个指向ingress的负载均衡型svc,然后修改一下kube-system下nginx-ingress-controller启动参数.

        - args:
            - /nginx-ingress-controller
            - '--configmap=$(POD_NAMESPACE)/nginx-configuration'
            - '--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services'
            - '--udp-services-configmap=$(POD_NAMESPACE)/udp-services'
            - '--annotations-prefix=nginx.ingress.kubernetes.io'
            - '--publish-service=$(POD_NAMESPACE)/<自定义svc>'
            - '--v=2'

LoadBalancer服务一直没有IP

具体表现是EXTERNAL-IP一直显示pending.

~ kg svc consul-web
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
consul-web   LoadBalancer   172.30.13.122   <pending>     443:32082/TCP   5m

这问题跟Alibaba Cloud Provider这个组件有关,cloud-controller-manager有3个组件,他们需要内部选主,可能哪里出错了,当时我把其中一个出问题的pod删了,就好了.

清理Statefulset动态PVC

目前阿里云Statefulset动态PVC用的是nas。

对于这种存储，需要先把容器副本将为0，或者整个Statefulset删除。
删除PVC
把nas挂载到任意一台服务器上面，然后删除pvc对应nas的目录。

升级到v1.12.6-aliyun.1之后节点可分配内存变少

该版本每个节点保留了1Gi,相当于整个集群少了N GB(N为节点数)供Pod分配.

如果节点是4G的,Pod请求3G,极其容易被驱逐.

建议提高节点规格.

Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.6-aliyun.1", GitCommit:"8cb561c", GitTreeState:"", BuildDate:"2019-04-22T11:34:20Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

新加节点出现NetworkUnavailable

RouteController failed to create a route

看一下kubernetes events,是否出现了

timed out waiting for the condition -> WaitCreate: ceate route for table vtb-wz9cpnsbt11hlelpoq2zh error, Aliyun API Error: RequestId: 7006BF4E-000B-4E12-89F2-F0149D6688E4 Status Code: 400 Code: QuotaExceeded Message: Route entry quota exceeded in this route table

出现这个问题是因为达到了VPC的自定义路由条目限制,默认是48,需要提高vpc_quota_route_entrys_num的配额

参考(应用调度相关):

微信关注我们

原文链接：https://yq.aliyun.com/articles/703971

转载内容版权归作者及来源网站所有！

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

ACK security enhancement

BYOK https://github.com/AliyunContainerService/ack-kms-plugin This is KMS provider plugin for Alibaba Cloud - Enable encryption at rest of Kubernetes secret backed by Alibaba Cloud Key Management Service Here let us verify the secret encryption on ACK cluster. Firstly create one ACK cluster on Alibaba Cloud Container Service console, refine the apiserver configuration an

2019-05-28

509

5月29日，阿里云发布企业级云灾备解决方案：为制造、金融、医疗等企业提供一键容灾能力，例如业务恢复、数据保护和网络自愈，最大程度保护本地和云上业务稳定运行，此外，云上灾备成本相对传统线下节省50%。据第三方数据显示，到2020年，全球产生的数据量将达到40ZB，灾备是保障数据和业务安全的关键一环。而传统灾备高成本、高浪费、低利用率，且建设时间长，对运维人员要求极高，云灾备成大势所趋。Gartner预计，到2020年，90%的容灾操作会发生在云端。此次发布的企业级云灾备解决方案来自阿里巴巴IT基础设施云化的灾备经验，完全省去灾备机房的建设规划，大幅节约建设成本与软硬件运维成本。该方案采用了国内首个磁盘级数据持续复制技术，同时支持混合云和跨云的多平台融合架构，为企业提供五大能力：用户数据中心和公共云的相互容灾；业务不停机，完成容灾演

2019-05-29

640

资源下载

更多资源

优质分享App

近一个月的开发和优化，本站点的第一个app全新上线。该app采用极致压缩，本体才4.36MB。系统里面做了大量数据访问、缓存优化。方便用户在手机上查看文章。后续会推出HarmonyOS的适配版本。

Spring

Spring框架（Spring Framework）是由Rod Johnson于2002年提出的开源Java企业级应用框架，旨在通过使用JavaBean替代传统EJB实现方式降低企业级编程开发的复杂性。该框架基于简单性、可测试性和松耦合性设计理念，提供核心容器、应用上下文、数据访问集成等模块，支持整合Hibernate、Struts等第三方框架，其适用范围不仅限于服务器端开发，绝大多数Java应用均可从中受益。

Rocky Linux

Rocky Linux（中文名：洛基）是由Gregory Kurtzer于2020年12月发起的企业级Linux发行版，作为CentOS稳定版停止维护后与RHEL（Red Hat Enterprise Linux）完全兼容的开源替代方案，由社区拥有并管理，支持x86_64、aarch64等架构。其通过重新编译RHEL源代码提供长期稳定性，采用模块化包装和SELinux安全架构，默认包含GNOME桌面环境及XFS文件系统，支持十年生命周期更新。

Sublime Text

Sublime Text具有漂亮的用户界面和强大的功能，例如代码缩略图，Python的插件，代码段等。还可自定义键绑定，菜单和工具栏。Sublime Text 的主要功能包括：拼写检查，书签，完整的 Python API ， Goto 功能，即时项目切换，多选择，多窗口等等。Sublime Text 是一个跨平台的编辑器，同时支持Windows、Linux、Mac OS X等操作系统。