扩展 Kubernetes 之 Kubectl Plugin

王磊-AI基础 • 2023-01-02 • 云技术社区 • 547 阅读

更新：本文中的 plugin 例子 cluster-group 已被合入 krew index.

简介

kubectl 是重要的 kubernetes 管理/运维工具

kubectl 功能非常强大, 常见的命令使用方式可以参考 kubectl --help，或者这篇文章

这篇文章首先会简单介绍几个 kubectl 你可能不知道的小技巧，主要篇幅介绍 kubectl 的 plugin.

kubectl 小技巧

设置自动补全 kubectl completion zsh
检查资源 SPEC (有没有遇到过想看SPEC 只能去查API文档或者翻代码的情况?) kubectl explain [--recursive]
给常用的命令设置 alias, 比如笔者常用的: kns="kubectl -n kube-system", kna="kubectl --all-namespaces=true, kcc="kubectl config use-context, kgy="kubectl get -o yaml", 或者直接使用这个项目生成的 alias, 这个项目使用一套规则生成了 800 多个 aliases

kubectl plguin

kubectl 支持一种简单的 plugin 机制，支持通过 kubectl 调用另一个二进制，完成 kubernetes 相关的一些功能（其实对二进制执行的功能没有任何限制）

目前这种机制并没有在 kubectl 和 plugin 传递任何信息，只对 plugin 有两点要求

plugin 为可执行文件
plugin 可执行文件的名字为 kubectl-$plugin_name

krew

本地安装方式很简单，只需要把可执行文件移动到比如 /usr/local/bin ，并且命名为 kubectl-$plugin_name即可。但是做好到插件如何分享，以及如何获取别人安装到插件呢。kubectl 提供了一个 krew（他本身也是一个插件）工具提供了相应到功能

Available Commands:
  help        Help about any command
  info        Show information about a kubectl plugin
  install     Install kubectl plugins
  list        List installed kubectl plugins
  search      Discover kubectl plugins
  uninstall   Uninstall plugins
  update      Update the local copy of the plugin index
  upgrade     Upgrade installed plugins to newer versions
  version     Show krew version and diagnostics

检索插件

可以使用命令 kubectl krew search, 但是这上面到介绍比较简介，更好到方式是到这个 index页面查看介绍和去对应的 github 仓库查看详细介绍。

➜ kubectl krew search
NAME                            DESCRIPTION                                         INSTALLED
access-matrix                   Show an RBAC access matrix for server resources     no
advise-psp                      Suggests PodSecurityPolicies for cluster.           no
auth-proxy                      Authentication proxy to a pod or service            no
bulk-action                     Do bulk actions on Kubernetes resources.            no
ca-cert                         Print the PEM CA certificate of the current clu...  no
capture                         Triggers a Sysdig capture to troubleshoot the r...  no
...

安装插件

使用 kubectl krew install

➜ kubectl krew install custom-cols
Updated the local copy of plugin index.
Installing plugin: custom-cols
Installed plugin: custom-cols
\\
 | Use this plugin:
 | 	kubectl custom-cols
 | Documentation:
 | 	https://github.com/webofmars/kubectl-custom-cols
 | Caveats:
 | \\
 |  | The list of templates is for now limited and can be retrieved with the --help option.
 |  | Please feel free to submit any PR upstream (see github repo) to add more.
 | /
/
WARNING: You installed a plugin from the krew-index plugin repository.
   These plugins are not audited for security by the Krew maintainers.
   Run them at your own risk.
(base)

推荐插件介绍

change-ns

切换 ns, 用于切换 namespace，切换后会设置在 kubeconfig 中，后续的操作就不用再加 --namespaces 了。不过设置了 namespace 之后需要注意的是后续的命令默认 namespace 都是这个设置值了，如果你在 yaml 中没有写名 namespace，资源可能不会创建到你期望的 default 目录下面了.

➜ kubectl change-ns kube-system
namespace changed to "kube-system"

cssh

ssh 到 kubernetes node 上面去，会自动从 node 信息中提取外网 ip，并连接 tmux 尝试做 ssh 登陆.

 > kubectl cssh --help
Allows users to SSH into Kubernetes nodes by opening a new tmux pane for each matching node

Options:
  -a, --address-type='ExternalIP': Node address type to query for (e.g. InternalIP/ExternalIP)
  -i, --identity-file='': Selects a file from which the identity (private key) for public key authentication is read
  -l, --selector='': Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2)
  -p, --port='': SSH port
  -u, --username='': SSH Username
  
> kubectl cssh --username=ubuntu

image

debug/spy

两个插件的作用差不多，目的都是进去 container 的命名空间进行 debug，不同到是debug 依赖一个 EphemeralContainers feature, 而 spy 不依赖

➜ kubectl spy kube-dns-d5876cbfd-r8kh4
loading spy pod...
If you don't see a command prompt, try pressing enter.
/ # ps
PID   USER     TIME  COMMAND
    1 root     12:56 /dnsmasq-nanny -v=2 -logtostderr -configDir=/etc/k8s/dns/dnsmasq-nanny -restartDnsmasq=true -- -k --cache-size=1000 --log-facility=- --server=/cluster.local/12
   16 root     14:55 /usr/sbin/dnsmasq -k --cache-size=1000 --log-facility=- --server=/cluster.local/127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0
   21 root      0:00 sh
   28 root      0:00 ps

还有一个常用的 debug 命令也可以设做 alias

kubectl run --rm -i -t test --image=byrnedo/alpine-curl --restart=Never  --limits=cpu=10m,memory=10Mi --command=true /bin/sh

tree

用 tree的形式展示 Kubernetes objects

➜ kubectl tree deployment kube-dns
NAMESPACE    NAME                              READY  REASON  AGE
kube-system  Deployment/kube-dns               -              286d
kube-system  ├─ReplicaSet/kube-dns-898dbbfc6   -              286d
kube-system  └─ReplicaSet/kube-dns-d5876cbfd   -              141d
kube-system    ├─Pod/kube-dns-d5876cbfd-r8kh4  True           141d
kube-system    └─Pod/kube-dns-d5876cbfd-w8xvh  True           141d

trace/sniff

分别用 bpftrace/tcpdump 工具对 pod 进行debug

kubectl trace run ip-180-12-0-152.ec2.internal -f read.bt

image.png

➜ kubectl sniff prometheus-k8s-0 -n monitoring
INFO[0000] sniffing method: upload static tcpdump
INFO[0000] using tcpdump path at: '/Users/leiwang/.krew/store/sniff/v1.3.1/static-tcpdump'
INFO[0000] no container specified, taking first container we found in pod.
INFO[0000] selected container: 'prometheus'
INFO[0000] sniffing on pod: 'prometheus-k8s-0' [namespace: 'monitoring', container: 'prometheus', filter: '', interface: 'any']
INFO[0000] uploading static tcpdump binary from: '/Users/leiwang/.krew/store/sniff/v1.3.1/static-tcpdump' to: '/tmp/static-tcpdump'
INFO[0000] uploading file: '/Users/leiwang/.krew/store/sniff/v1.3.1/static-tcpdump' to '/tmp/static-tcpdump' on container: 'prometheus'
INFO[0000] executing command: '[/bin/sh -c ls -alt /tmp/static-tcpdump]' on container: 'prometheus', pod: 'prometheus-k8s-0', namespace: 'monitoring'
INFO[0000] command: '[/bin/sh -c ls -alt /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :''

dig

获取关于 kubernetes node 的一切信息

# 这个工具并没有在 krew index 发布，所以用 go get 安装
go get -u github.com/sysdiglabs/kubectl-dig/cmd/kubectl-dig

image

warp

是 kubectl run + sshd-rsync 命令的合成，可以方便在在 pod 中执行一个本地文件.

# Start nodejs project in node container
cd examples/nodejs
kubectl warp -i -t --image node testing-node -- npm run watch

get-all

get all, 慎用，和 kubectl 的 get --all 不同，这个命令是真的 all

➜ kubectl get-all
NAME                                                                                                        NAMESPACE     AGE
componentstatus/scheduler                                                                                                 <unknown>
componentstatus/controller-manager                                                                                        <unknown>
componentstatus/etcd-0                                                                                                    <unknown>
configmap/token-100001343833                                                                                100001343833  69d
configmap/token-100002873007                                                                                100002873007  77d
...
...

grep

grep by name

➜ kubectl grep pods -A nginx
NAMESPACE    NAME                                READY   STATUS    RESTART   AGE
default      nginx-6dc5bfc797-vwdz7              1/1     Running   0         31d
monitoring   prometheus-nginx-656ddc9c86-nq9dk   1/1     Running   1         181d

konfig

当你频繁创建 kubernetes 集群需要 import 配置的时候很有用

kubectl konfig import --save ~/Downloads/cls-5en24mcc-config

doctor

诊断 kubernetes 集群，目前做了一下的一些检查

core component health (etcd cluster members, scheduler, controller-manager)
orphan endpoints (endpoints with no ipv4 attached)
persistent-volume available & unclaimed
persistent-volume-claim in lost state
k8s nodes that are not in ready state
orphan replicasets (desired number of replicas are bigger than 0 but the available replicas are 0)
leftover replicasets (desired number of replicas and the available # of replicas are 0)
orphan deployments (desired number of replicas are bigger than 0 but the available replicas are 0)
leftover deployments (desired number of replicas and the available # of replicas are 0)
leftover cronjobs (last active date is more than 30 days)

open-svc

利用 kubectl proxy 开启远程转发，方便debug

➜ kubectl open-svc prometheus-k8s -n monitoring
Starting to serve on 127.0.0.1:8001
Opening service/prometheus-k8s in the default browser...

image

resource-capacity/view-utilization

观察 node/pod/namespace 等的资源申请和使用情况

➜ kubectl resource-capacity --pods
NODE          NAMESPACE      POD                                         CPU REQUESTS   CPU LIMITS     MEMORY REQUESTS   MEMORY LIMITS
*             *              *                                           39728m (45%)   52108m (59%)   85069Mi (43%)     111368Mi (56%)

10.0.0.10     *              *                                           762m (19%)     1252m (31%)    585Mi (8%)        2399Mi (34%)
10.0.0.10     kube-system    ccs-log-collector-r4w6m                     300m (7%)      1000m (25%)    238Mi (3%)        1907Mi (27%)
10.0.0.10     kube-system    etcd-10.0.0.10                              0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    gpu-quota-admission-10.0.0.10               0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    kube-apiserver-10.0.0.10                    0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    kube-controller-manager-10.0.0.10           0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    kube-dns-d5876cbfd-w8xvh                    260m (6%)      0m (0%)        66Mi (0%)         162Mi (2%)
10.0.0.10     kube-system    kube-proxy-8m2cf                            0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    kube-router-n86t5                           100m (2%)      150m (3%)      100Mi (1%)        150Mi (2%)
10.0.0.10     monitoring     node-exporter-hdspv                         102m (2%)      102m (2%)      180Mi (2%)        180Mi (2%)
10.0.0.10     kube-system    tke-bridge-agent-dlmhs                      0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.10     kube-system    tke-cni-agent-cdfhs                         0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)

10.0.0.13     *              *                                           7632m (48%)    9782m (61%)    24290Mi (80%)     28981Mi (96%)
10.0.0.13     100010987341   a2sc75rcc7x8x664-64ff96c4d5-wsks6           2000m (12%)    2000m (12%)    2048Mi (6%)       2048Mi (6%)
10.0.0.13     100010987341   a2sc75rcc7x8x664-8459df9d4d-dc48m           2000m (12%)    2000m (12%)    10240Mi (34%)     10240Mi (34%)
10.0.0.13     100010987341   a2sc75rcc7x8x664-8459df9d4d-kktgx           2000m (12%)    2000m (12%)    10240Mi (34%)     10240Mi (34%)
10.0.0.13     kube-system    ccs-log-collector-flmcq                     300m (1%)      1000m (6%)     238Mi (0%)        1907Mi (6%)
10.0.0.13     kube-system    ip-masq-agent-lnq4c                         0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.13     kube-system    kube-proxy-sjn7j                            0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.13     kube-system    kube-router-d8dgw                           100m (0%)      150m (0%)      100Mi (0%)        150Mi (0%)
10.0.0.13     monitoring     node-exporter-jths9                         102m (0%)      102m (0%)      180Mi (0%)        180Mi (0%)
10.0.0.13     monitoring     prometheus-k8s-test-3-0                     65m (0%)       15m (0%)       110Mi (0%)        60Mi (0%)
10.0.0.13     monitoring     prometheus-k8s-test-5-0                     65m (0%)       15m (0%)       110Mi (0%)        60Mi (0%)
10.0.0.13     kube-system    service-controller-6fcb5fc4f4-cvmfz         250m (1%)      1000m (6%)     256Mi (0%)        1024Mi (3%)
10.0.0.13     541004974      test-75c58cd5f7-qgm7m                       750m (4%)      1500m (9%)     768Mi (2%)        3072Mi (10%)
10.0.0.13     kube-system    tke-bridge-agent-9n2k9                      0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)
10.0.0.13     kube-system    tke-cni-agent-pmwpf                         0m (0%)        0m (0%)        0Mi (0%)          0Mi (0%)

➜ kubectl view-utilization node -h
CPU   : ▄▆▄▇▃▃▆▃▅▂▆▃
Memory: ▇▅▂▆▁▃▃▄▇▂▇▃
             CPU                   Memory
Node          Req   %R  Lim    %L   Req   %R   Lim    %L
10.0.0.10    0.76  19%  1.3   31%  600M   8%  2.5G   36%
10.0.0.13     6.9  43%  8.3   52%   23G  78%   25G   86%
10.0.0.17       1  25%  3.3   82%    1G  15%  4.3G   63%
10.0.0.31     1.5  38%  2.3   57%  2.5G  13%  4.3G   23%
10.0.0.4       11  75%   13   86%   23G  74%   25G   83%
10.0.0.44    0.57  14%  1.3   32%  640M   3%  2.4G   13%

实践

实践部分我们开发一个方便多 cluster 操作的简单插件。在操作多集群到时候切换 context 可能会比较麻烦，这里我们参考 ansible 的 inventory 设计，支持写入一个配置文件，把 cluster 进行分组管理，选择一个或者所有 group 的 cluster 执行 kubectl 命令。

插件代码在 kubectl-clusters

下载脚本，安装到 /usr/local/bin, 运行

# exec in all groups 
➜ kubectl clusters all get pod
[GROUP]: test --------------------------------------------------------------------------------
[CLUSTER]: cls-test1 --------------------------------------------------
[DEBUG] kubectl --context=cls-test1 --namespace=default get pod
NAME                                      READY   STATUS    RESTARTS   AGE
nginx-6dc5bfc797-vwdz7                    1/1     Running   0          32d


[GROUP]: prod --------------------------------------------------------------------------------
[CLUSTER]: cls-prod --------------------------------------------------
[DEBUG] kubectl --context=cls-qcvhpqog get pod
NAME                                READY   STATUS    RESTARTS   AGE
a4cgfxv7srbfhbsn-78479b5cf7-f85d8   2/2     Running   0          37d


# exec in single group
➜ kubectl clusters prod get pod
[GROUP]: prod --------------------------------------------------------------------------------
[CLUSTER]: cls-prod --------------------------------------------------
[DEBUG] kubectl --context=cls-qcvhpqog get pod
NAME                                READY   STATUS    RESTARTS   AGE
a4cgfxv7srbfhbsn-78479b5cf7-f85d8   2/2     Running   0          37d

提交到 krew index 目录