02 k8s安装v2

Contents

家庭网络k8s部署测试

参考: 官网容器运行时的文档:https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/

参考: 官网kubeadm创建集群的文档: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

参考: 网友创建集群的模板: https://blog.csdn.net/qq_44956318/article/details/121335756

参考: 网友创建集群的模板: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html

参考:阿里云ack集群网络参考: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/plan-cidr-blocks-for-an-ack-cluster-1?spm=a2c4g.11186623.0.0.43b77678ng1wZI

前置说明

场景

场景: k8s云服务商性价比不足。(阿里云ack标准版开三个节点无负载,一天要百来块RMB。 实际上云厂商有提供边缘云服务,暂没有深入去了解)

场景: 自有IDC或企业有公网ip池,想配k8s。

场景:想亲自搭建一个k8s集群,对k8s整体有个理解。

之前已经装过一次了, 但是不小心control plane 搞炸了(没做高可用, 覆盖了配置, kubelet也连不上了)。。。 直接重装好了。

prerequisite

安装和简单使用过kubectl和minikube (https://kubernetes.io/docs/tasks/tools/)

有大体上看过官网concepts和tutorials文档部分(https://kubernetes.io/docs/concepts/, https://kubernetes.io/docs/tutorials/)

使用阿里云ack或其他云服务厂商提供的k8s服务,创建或实践过简单的storageclass, configmap, secret, pod, deployment, statefulset, service, ingress

关于本文的说明

这里主要使用kubeadm初始化k8s,但是有一些系统级的配置、容器运行时、网络相关的内容,可能需要手动配置。

注意,下面的实践,主要基于systemd作为1号进程的linux发行版(sysV不一定适合)。安装的k8s是目前最新的1.30

hostname ip 操作系统 cri k8s role ceph role
wangjm-B550M-K-1 192.168.1.8 ubuntu22 containerd control plane, worker mon,osd,mgr
wangjm-B550M-K-2 192.168.1.9 ubuntu22 containerd control plane, worker mon,osd
wangjm-B550M-K-3 192.168.1.10 ubuntu22 cri-o control plane, worker mon,osd
jingmin-kube-master1 192.168.1.1 centos 8 stream containerd control plane rgw gateway, mgr
jingmin-kube-archlinux 192.168.1.7 archlinux containerd worker

前置设置(每台主机都要设置)

主要是一些操作系统级别的设置

设置主机名

#主机1
hostnamectl set-hostname jingmin-kube-master1

#主机2
hostnamectl set-hostname jingmin-kube-archlinux

...

开启ntp时间同步

vim /etc/systemd/timesyncd.conf
cat /etc/systemd/timesyncd.conf
[Time]
NTP=ntp.ntsc.ac.cn


timedatectl set-ntp up
systemctl restart systemd-timesyncd.service
journalctl -u systemd-timesyncd --no-hostname --since "1 day ago"

加载内核模块

linux开启内核转发,加载转发和容器需要的模块

参考:https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#install-and-configure-prerequisites

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# 应用 sysctl 参数而不重新启动
sudo sysctl --system

通过运行以下指令确认 br_netfilteroverlay 模块被加载:

lsmod | grep br_netfilter
lsmod | grep overlay

通过运行以下指令确认 net.bridge.bridge-nf-call-iptablesnet.bridge.bridge-nf-call-ip6tablesnet.ipv4.ip_forward 系统变量在你的 sysctl 配置中被设置为 1:

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

关swap

查看目前是否有swap

free -m

修改系统加载时不使用swap

cat "vm.swappiness = 0" >> /etc/sysctl.d/k8s.conf
sysctl --system

关闭所有的swap (本次启动期间有效)

swapoff -a

关闭所有的swap (下次启动有效)

vim /etc/fstab

很多人说这里直接注释掉swap分区那一行就可以了。

但是有的系统不适用。systemd会自动扫描gpt分区,加载其中的swap分区。要么fdisk工具删分区,要么这里/etc/fstab中swap分区的options那一列由defaults改为noauto,即不自动挂载。

安装k8s工具

主要是安装 kubeadm、kubelet 和 kubectl

参考官方文档: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl

参考: https://discuss.kubernetes.io/t/facing-challanges-for-installation-of-kubernest-cluster-setup-through-kubeadm-on-rhel8-9-vm-404-for-https-packages-cloud-google-com-yum-repos-kubernetes-el7-x86-64-repodata-repomd-xml-ip-142-250-115-139/27345/2

centos/fedora/redhat是这样:

# 使用这个源
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/repodata/repomd.xml.key
EOF


ubuntu是这样

更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包:

sudo apt-get update
# apt-transport-https 可能是一个虚拟包(dummy package);如果是的话,你可以跳过安装这个包
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
下载用于 Kubernetes 软件包仓库的公共签名密钥。所有仓库都使用相同的签名密钥,因此你可以忽略URL中的版本:

# 如果 `/etc/apt/keyrings` 目录不存在,则应在 curl 命令之前创建它,请阅读下面的注释。
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
说明:
在低于 Debian 12 和 Ubuntu 22.04 的发行版本中,/etc/apt/keyrings 默认不存在。 应在 curl 命令之前创建它。

添加 Kubernetes apt 仓库。 请注意,此仓库仅包含适用于 Kubernetes 1.30 的软件包; 对于其他 Kubernetes 次要版本,则需要更改 URL 中的 Kubernetes 次要版本以匹配你所需的次要版本 (你还应该检查正在阅读的安装文档是否为你计划安装的 Kubernetes 版本的文档)。

# 此操作会覆盖 /etc/apt/sources.list.d/kubernetes.list 中现存的所有配置。
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

禁用selinux

# 将 SELinux 设置为 permissive 模式(相当于将其禁用)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

sudo systemctl enable --now kubelet

安装容器环境(可选)

docker/docker.io/podman-docker/podman

实际上,如果是给k8s用, 只需要配置后一节的容器运行时(CRI, container running time)即可。 这里装一下更高层的容器环境也可。

配置容器运行时CRI

容器运行时介绍

容器化发展大概是这样的: 裸机部署应用 -》容器化单机部署(docker run 单镜像)-》容器化单机部署(docker compose 一组应用)-》多节点容器化部署k8s (允许跨节点,伸缩。。)

容器化最初是docker独一家,开源,但仅支持单机使用。redhat, google等公司及开发者的加入,容器编排逐渐成熟,目前看起来最受市场认可的是google的k8s标准。然后有社区维护的k8s和各云厂商提供的k8s服务。

docker和k8s在oci层面达成了一致。(OCI – Open Container Initiative. It standardizes container images and runtimes.)。也就是说它们使用一样的镜像,底层是一样的运行时。

基于oci标准,docker和k8s有各自的运行时实现。对于已经安装了docker的主机,k8s支持直接使用docker的运行时containerd。对于没有安装docker的主机,可以直接安装cri-o等运行时。k8s在containerd/cri-o之上抽象出一个统一的接口层cri, 供k8s上层使用。

An infographic illustrating the container ecosystem.

参考了这里的说明:https://phoenixnap.com/kb/docker-vs-containerd-vs-cri-o

img

参考了这里的说明: https://vineetcic.medium.com/the-differences-between-docker-containerd-cri-o-and-runc-a93ae4c9fdac

安装配置containerd运行时(docker环境)

安装containerd

安装docker-ce ,也就是社区版docker(安装docker时, centos可能默认替换为开源的podman)

yum install docker-ce

从前面的docker/k8s分层图,可以知道,安装docker时,已经安装了containerd。

配置containerd

参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd

参考: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd

可以使用默认的模板:

containerd config default > /etc/containerd/config.toml

也可以直接创建/etc/containerd/config.toml

然后调整或加入以下配置:

vim /etc/containerd/config.toml

#disabled_plugins = ["cri"]
enabled_plugins = ["cri"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.k8s.io/pause:3.9"

安装docker时,默认没有启用k8s需要的cri插件,这里启用,并设置为使用systemd管理的cgroup。

我主机上自带的pause版本较老,后面kubeadm初始化k8s的时候有警告。改用新一点的版本3.9。

配置containerd代理(可选)

后面用kubeadm初始化k8s的时候,发现有镜像拉不下来,可以选择这里加下代理配置。

vim /lib/systemd/system/containerd.service

临时配置下containerd.service的代理(加了Evironment两行),用完可以注释掉

# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target

[Service]
Environment="HTTP_PROXY=http://192.168.1.7:8889"
Environment="HTTPS_PROXY=http://192.168.1.7:8889"
Environment="NO_PROXY=ole12138.top,localhost,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd

Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

重载服务,重启服务

systemctl daemon-reload
systemctl restart containerd

配置开机启动相关服务

systemctl enable containerd
systemctl enable kubelet
#systemctl enable docker

配置docker代理(可选)

如果手动docker pull镜像, 可以设置下docker代理(kubeadm调用的是containerd,不是这里的代理):

https://docs.docker.com/config/daemon/systemd/#httphttps-proxy

https://docs.docker.com/network/proxy/

安装配置CRI-O运行时(padman环境)

有一台主机装的是podman,不是docker。

参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubeadm.md

参考: https://github.com/cri-o/cri-o/blob/main/install.md

参考: https://github.com/cri-o/packaging/blob/main/README.md

参考: https://kubernetes.io/blog/2023/10/10/cri-o-community-package-infrastructure/

参考: https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/

对于ubuntu而言, 前面已经配过了k8s社区源

# 此操作会覆盖 /etc/apt/sources.list.d/kubernetes.list 中现存的所有配置。
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

还需要配一下cri-o的社区源(目前是预览版)

curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.30/deb/Release.key |
    gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://pkgs.k8s.io/addons:/cri-o:/stable://v1.30/deb/ /" |
    tee /etc/apt/sources.list.d/cri-o.list

更新包meta,并安装

apt-get update
apt-get install -y cri-o 

启动

systemctl start crio.service
systemctl enable crio.service

发现podman和cri-o有点不兼容。podman太老了。

更新podman源

echo 'deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/unstable/xUbuntu_22.04/ /' | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:unstable.list
curl -fsSL https://download.opensuse.org/repositories/devel:kubic:libcontainers:unstable/xUbuntu_22.04/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/devel_kubic_libcontainers_unstable.gpg > /dev/null
sudo apt update
sudo apt install podman

参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubernetes.md

参考: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-join/

配置cri-o代理

参考: https://jbn1233.medium.com/docker-cri-o-behind-http-proxy-4a5645a9ff7b

root@wangjm-B550M-K-3:~# vim /lib/systemd/system/crio.service 
root@wangjm-B550M-K-3:~# cat /lib/systemd/system/crio.service 
[Unit]
Description=Container Runtime Interface for OCI (CRI-O)
Documentation=https://github.com/cri-o/cri-o
Wants=network-online.target
Before=kubelet.service
After=network-online.target

[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/crio
Environment=GOTRACEBACK=crash
Environment="HTTP_PROXY=http://192.168.1.7:8889"
Environment="HTTPS_PROXY=http://192.168.1.7:8889"
Environment="NO_PROXY=ole12138.top,localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
ExecStart=/usr/bin/crio \
          $CRIO_CONFIG_OPTIONS \
          $CRIO_RUNTIME_OPTIONS \
          $CRIO_STORAGE_OPTIONS \
          $CRIO_NETWORK_OPTIONS \
          $CRIO_METRICS_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
TasksMax=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
OOMScoreAdjust=-999
TimeoutStartSec=0
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target
Alias=cri-o.service

配置第一个control plane 节点

注: 有些地方,将 control plane节点称作master节点。 实际上, 之前确实是叫master的, 后来改名为 control plane了。 参考: https://stackoverflow.com/questions/68860301/what-is-the-difference-between-master-node-and-control-plane-on-kubernetes

初始化master(control plane)节点

现在开始安装k8s,这里使用kubadm安装

官网kubeadm创建集群的文档: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

官网kubeadm创建集群的中文文档: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

方法一: 可以参考这位网友的方法,直接调整kubeadm的配置文件

参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html

方法二: 也可以在安装时指定选项(这种方式使用默认的谷歌镜像源gcr.io,需要按前面的操作,配置代理)

参考: https://blog.csdn.net/qq_44956318/article/details/121335756

kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16

注意: 还有个--control-plane-endpoint选项, 是作为api-server高可用(需要额外的负载均衡器,对外/对内提供统一访问时才配置的. 错误配置会导致异常.)

参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

参考:

记录下生成的内容,后面要用

root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.532835ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.501293034s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: o6g6nc.p444h6gg0rzu0kzl
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
        --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673 

复制kubectl用到的配置

按照上面的提示,master节点上执行

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

然后kubectl命令就可以使用了,可以查看和操作k8s中的各种资源了。

允许control plane调度pod

如果你希望能够在控制平面节点上调度 Pod,例如单机 Kubernetes 集群,请运行:

网络上的版本(旧版本可用):

参考: https://stackoverflow.com/questions/68860301/what-is-the-difference-between-master-node-and-control-plane-on-kubernetes

默认情况下主节点是不会调度运行pod的,如果想要master节点也可以运行pod,执行如下指令

kubectl taint nodes --all node-role.kubernetes.io/master-

末尾的减号标识去除相应的taint

官网的说明(新版本可用)

官网参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#control-plane-node-isolation

参考: https://stackoverflow.com/questions/56162944/master-tainted-no-pods-can-be-deployed

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

安装网络组件(第一个control plane上装一次, cluster扩张时各节点自动安装)

安装网络组件flannel

参考:https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

参考:https://kubernetes.io/zh-cn/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model

参考: https://kubernetes.io/zh-cn/docs/concepts/cluster-administration/addons/#networking-and-network-policy

参考:https://github.com/flannel-io/flannel#deploying-flannel-manually

参考: https://github.com/flannel-io/flannel?tab=readme-ov-file#deploying-flannel-with-kubectl

参考: https://github.com/flannel-io/flannel/blob/master/Documentation/kubernetes.md

wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

vim kube-flannel.yml

修改其中关于pod-cidr相关的设置,网段改为与上面pod-network-cidr一致(net-conf.json相关部分):

  net-conf.json: |
    {
      "Network": "172.30.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

参考: https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#key-command-line-options

参考: https://github.com/containernetworking/cni/issues/486

flannel默认会使用默认网关所在的接口。

由于我在网关路由器上也安装了k8s,而它的默认网关是PPPoE拨号连接建立的ppp0接口地址。而与k8s通信的接口是另一块网卡接口。拨号连接断开重连,会导致k8s网络异常。

(我家庭网络网关路由上也装了k8s,拨号上网会建立起一个ppp0接口,默认安装flannel可能用的这个接口。ppp0重连,公网ip变动后,k8s服务异常。)

其中容器的参数追加两行iface-can-reach的配置,如下:

      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface-can-reach
        - 192.168.1.1

kubectl logs看了下flannel日志,网关路由器上的flannel这样好像选了lo网卡。算了,也能用了。实在不行可以试试--iface-regex参数,正则指定下网段(暂未尝试)。

使用--iface-regex的配置,如下:

      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface-regex
        - 192.168.*

最终,flannel完整的yaml配置文件如下:

apiVersion: v1
kind: Namespace
metadata:
  labels:
    k8s-app: flannel
    pod-security.kubernetes.io/enforce: privileged
  name: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: flannel
  name: flannel
  namespace: kube-flannel
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: flannel
  name: flannel
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - networking.k8s.io
  resources:
  - clustercidrs
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: flannel
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-flannel
---
apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "172.30.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-flannel
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app: flannel
    k8s-app: flannel
    tier: node
  name: kube-flannel-ds
  namespace: kube-flannel
spec:
  selector:
    matchLabels:
      app: flannel
      k8s-app: flannel
  template:
    metadata:
      labels:
        app: flannel
        k8s-app: flannel
        tier: node
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface-can-reach
        - 192.168.1.1
        command:
        - /opt/bin/flanneld
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        image: docker.io/flannel/flannel:v0.22.1
        name: kube-flannel
        resources:
          requests:
            cpu: 100m
            memory: 50Mi
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
          privileged: false
        volumeMounts:
        - mountPath: /run/flannel
          name: run
        - mountPath: /etc/kube-flannel/
          name: flannel-cfg
        - mountPath: /run/xtables.lock
          name: xtables-lock
      hostNetwork: true
      initContainers:
      - args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        command:
        - cp
        image: docker.io/flannel/flannel-cni-plugin:v1.2.0
        name: install-cni-plugin
        volumeMounts:
        - mountPath: /opt/cni/bin
          name: cni-plugin
      - args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        command:
        - cp
        image: docker.io/flannel/flannel:v0.22.1
        name: install-cni
        volumeMounts:
        - mountPath: /etc/cni/net.d
          name: cni
        - mountPath: /etc/kube-flannel/
          name: flannel-cfg
      priorityClassName: system-node-critical
      serviceAccountName: flannel
      tolerations:
      - effect: NoSchedule
        operator: Exists
      volumes:
      - hostPath:
          path: /run/flannel
        name: run
      - hostPath:
          path: /opt/cni/bin
        name: cni-plugin
      - hostPath:
          path: /etc/cni/net.d
        name: cni
      - configMap:
          name: kube-flannel-cfg
        name: flannel-cfg
      - hostPath:
          path: /run/xtables.lock
          type: FileOrCreate
        name: xtables-lock

应用此插件(网络插件)

kubectl apply -f ./kube-flannel.yml

重要的TIPS

安装失败时的重置

如果安装过程中出现问题,尽量查下日志, 以及google下。

journalctl -xeu containerd

实在不行,删除重来

参考:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/

参考: https://stackoverflow.com/questions/57648829/how-to-fix-timeout-at-waiting-for-the-kubelet-to-boot-up-the-control-plane-as-st

kubeadm reset

也可用于卸载集群

参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html

参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#remove-the-node

想要撤销kubeadm执行的操作,首先要排除节点,并确保该节点为空, 然后再将其关闭。

在Master节点上运行:

kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
kubectl delete node <node name>

然后在需要移除的节点上,重置kubeadm的安装状态:

sudo kubeadm reset

如果你想重新配置集群,使用新的参数重新运行kubeadm init或者kubeadm join即可。

If your cluster was setup to utilize IPVS, run ipvsadm –clear (or similar) to reset your system’s IPVS tables.

 ipvsadm --clear

在 Kubernetes 中,使用 iptables 来实现网络隔离和转发。如果您需要重置 iptables 防火墙规则,您可以执行以下命令:

iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

这个命令将清除所有的 iptables 防火墙规则,并将计数器重置为零。这将影响所有正在运行的容器,因此请谨慎使用。在执行此命令之前,请确保您已经备份了 iptables 配置文件以及任何其他重要的配置文件,以防止数据丢失或损坏。

需要注意的是,在 Kubernetes 中,Iptables 是由 kube-proxy 进程自动管理的,如果您重置 iptables 规则,kube-proxy 将会自动重建 iptables 规则。因此,重置 iptables 规则只是一个临时的解决方案,如果您需要永久更改 iptables 规则,应该修改 kube-proxy 的配置文件并重新启动 kube-proxy 进程。

节点重新加入集群

参考: https://aiops.com/news/post/13773.html

kubernetes集群删除节点

以下操作都是在主节点下操作

一、先将节点设置为维护模式

我们可以通过输入以下命令来查看节点名称

kubectl get nodes

我们要删除的节点名称为k8s-node2,

我们通过以下命令将k8s-node2节点设置为维护模式

kubectl drain k8s-node2 –delete-local-data –force –ignore-daemonsets node/k8s-node2

二、删除节点

kubectl delete node k8s-node2

三、确认是否已经删除

kubectl get nodes

这样就完成了集群节点删除 。

如果节点想要重新加入集群,可以通过以下操作来重新加入。

一、生成token

我们在主节点输入以下命令来重新生成token

kubeadm token create –print-join-command

二、node节点重新加入集群

以下操作在node节点操作

停掉kubelet

systemctl stop kubelet

删除之前的相关文件

rm -rf /etc/kubernetes/*

重新加入集群,这里复制上一步生成的token,重新加入节点即可。

增加worker节点

worker节点配置运行时

同样需要确认和安装容器运行时。在node节点上来一遍 配置容器运行时CRI

安装过docker,看起来已经包含了containerd

ls /run/containerd/containerd.sock

有结果,确认是containerd (这里有说明: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-runtime)

mkdir -p /etc/containerd
vim /etc/containerd/config.toml

调整配置,内容略(前面章节有)。

直接从master那边复制过来也行:

scp 192.168.1.1:/etc/containerd/config.toml /etc/containerd/

后面在工作节点上 kubeadm join操作的时候,这里有点问题。gcr.io不通,导致创建 kube-proxy和flannel的相关pod一直失败。

可以配置代理,kubectl delete pod 删一下旧的pod,会自动创建新的。

也可以参考这位网友的方法,直接调整kubeadm的配置文件。参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html

systemctl enable kubelet
systemctl enable containerd
#systemctl enable docker

worker节点加入到k8s集群

使用前面master节点 kubeadm init成功之后的输出(添加节点的输出)。

kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
        --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673 

docker添加代理(也可以直接从master节点复制过来scp 192.168.1.8:/etc/docker/daemon.json ./):

vim /etc/docker/daemon.json
{
  "proxies": {
    "http-proxy": "http://192.168.1.7:8889",
    "https-proxy": "http://192.168.1.7:8889",
    "no-proxy": "*.test.example.com,.example.org,*.ole12138.top,.ole12138.top,127.0.0.0/8,10.0.0.0,172.16.0.0/12,192.168.0.0/16"
  }
}

重启服务

systemctl daemon-reload 
systemctl restart docker

复制主节点kubectl配置

mkdir -p root/.kube
scp 192.168.1.8:/root/.kube/config /root/.kube/

或者

$HOME/.kube
scp root@192.168.1.8:/etc/kubernetes/admin.conf $HOME/.kube/config

刷新token,并增加worker节点

参考: https://www.cnblogs.com/hongdada/p/9854696.html

参考: https://blog.csdn.net/mailjoin/article/details/79686934

通过kubeadm初始化后,都会提供node加入的token:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 18.16.202.35:6443 --token zr8n5j.yfkanjio0lfsupc0 --discovery-token-ca-cert-hash sha256:380b775b7f9ea362d45e4400be92adc4f71d86793ba6aae091ddb53c489d218c

默认token的有效期为24小时,当过期之后,该token就不可用了。

解决方法如下:

  1. 重新生成新的token

    [root@node1 flannel]# kubeadm  token create
    kiyfhw.xiacqbch8o8fa8qj
    [root@node1 flannel]# kubeadm  token list
    TOKEN                     TTL         EXPIRES                     USAGES                   DESCRIPTION   EXTRA GROUPS
    gvvqwk.hn56nlsgsv11mik6   <invalid>   2018-10-25T14:16:06+08:00   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
    kiyfhw.xiacqbch8o8fa8qj   23h         2018-10-27T06:39:24+08:00   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
  2. 获取ca证书sha256编码hash值

    [root@node1 flannel]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
    5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
  3. 节点加入集群

      kubeadm join 18.16.202.35:6443 --token kiyfhw.xiacqbch8o8fa8qj --discovery-token-ca-cert-hash sha256:5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057

几秒钟后,您应该注意到kubectl get nodes在主服务器上运行时输出中的此节点。

上面太繁琐,一步到位:

kubeadm token create --print-join-command

增加control plane 节点()

对于docker/containerd容器节点

首先和增加普通worker节点一样

先在原control plane上生成 加入集群的命令

root@wangjm-B550M-K-1:~/k8s/cni# kubeadm token create --print-join-command
kubeadm join 192.168.1.8:6443 --token 2v0g5t.o2xz7jdnw2uagx5d --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673 

参考: 添加controlplane https://blog.slys.dev/adding-worker-and-control-plane-nodes-to-the-kubernetes-cluster/

参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

参考: https://www.reddit.com/r/kubernetes/comments/18xddaj/how_do_i_add_another_control_plane_node_using/

在已有的control plane上执行, 更新certs, 获取添加普通节点命令(重要:必须)

root@wangjm-B550M-K-1:~/k8s/cni# kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0aaf6a3a45a766033aaebd91d7e8e511fd0e801961b2895228a0a8989b976ee3

然后综合上述命令,

在新节点(需要添加为 上),执行:

root@wangjm-B550M-K-2:~# kubeadm join 192.168.1.8:6443 --token 2v0g5t.o2xz7jdnw2uagx5d --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673  --control-plane --certificate-key 0aaf6a3a45a766033aaebd91d7e8e511fd0e801961b2895228a0a8989b976ee3
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: 
One or more conditions for hosting a new control plane instance is not satisfied.

unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address

Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.


To see the stack trace of this error execute with --v=5 or higher

这里增加了--control-plane --certificate-key两个选项

这里失败的原因是, 之前在第一个节点上初始化集群的时候, 没有提供--control-plane-endpoint选项, 是作为api-server高可用(需要额外的负载均衡器,对外/对内提供统一访问时才配置的. 错误配置会导致异常.)

这里就先不添加 新的control plane 的节点了.

参考: https://www.reddit.com/r/kubernetes/comments/18xddaj/how_do_i_add_another_control_plane_node_using/

对于podman+cri-o容器节点

参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubeadm.md

参考: https://github.com/cri-o/cri-o/blob/main/install.md

参考: https://github.com/cri-o/packaging/blob/main/README.md

参考: https://kubernetes.io/blog/2023/10/10/cri-o-community-package-infrastructure/

参考: https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/

如果是podman+ cri-o, 还需要加上:--cri-socket unix:///var/run/crio/crio.sock

root@wangjm-B550M-K-3:~# kubeadm join 192.168.1.8:6443 --token k24o0h.n7ewbmkrr07tuvvu --discovery-token-ca-cert-hash sha256:109777566281a9d7222ddb1c4f6ff151b818996d27bb74e549741623d047326c --control-plane --certificate-key 15e8796e94f4106f6ba01dcd78068104f9240d9e5b38ed5c1285141118958dba --cri-socket unix:///var/run/crio/crio.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki"
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-3] and IPs [172.31.0.1 192.168.1.10 192.168.1.8]
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-3] and IPs [192.168.1.10 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-3] and IPs [192.168.1.10 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.675331ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
{"level":"warn","ts":"2024-05-06T20:11:43.46408+0800","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000503880/192.168.1.8:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node wangjm-b550m-k-3 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-3 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

配置kubectl

按照上面的提示,master节点上执行

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

然后kubectl命令就可以使用了,可以查看和操作k8s中的各种资源了。

在控制平面节点上调度 Pod

网络上的版本:

默认情况下主节点是不会调度运行pod的,如果想要master节点也可以运行pod,执行如下指令

kubectl taint nodes --all node-role.kubernetes.io/master-

末尾的减号标识去除相应的taint

官网参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#control-plane-node-isolation

如果你希望能够在控制平面节点上调度 Pod,例如单机 Kubernetes 集群,请运行:

kubectl taint nodes --all node-role.kubernetes.io/control-plane-

最终,看下当前节点情况

root@wangjm-B550M-K-3:~# kubectl get nodes
NAME               STATUS   ROLES           AGE     VERSION
wangjm-b550m-k-1   Ready    control-plane   3h13m   v1.30.0
wangjm-b550m-k-2   Ready    control-plane   153m    v1.30.0
wangjm-b550m-k-3   Ready    control-plane   5m17s   v1.30.0

容器存储配置

k8s将各节点的cpu,内存都统一管理起来。但是节点的外存却不在调度与管理范围之内。

前面安装好k8s的master和node节点之后,允许各节点使用各自节点上的硬盘(hostPath类型的存储)。这种存储使用是有限制的,当pod发生调度到其他节点时,在之前节点hostPath存储的数据并不会同步到新节点上。

hostPath只适合存储临时数据。

对于需要持久化的数据,需要使用nfs网络存储,或ceph分布式存储,或其他云厂商提供的云存储服务等。

配置ceph网络存储 //todo 详见另外的文章:K8s使用外部Ceph集群

参考: https://github.com/ceph/ceph-csi?tab=readme-ov-file#overview

参考: https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md

参考: https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md#deployment-with-kubernetes

参考: https://github.com/ceph/ceph-csi/tree/devel/deploy/rbd/kubernetes

参考: https://www.cnblogs.com/hukey/p/17828946.html

参考: https://juejin.cn/post/7296756504912330767

参考: https://github.com/ceph/ceph-csi/blob/devel/examples/rbd/storageclass.yaml

wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csidriver.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-rbdplugin.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-nodeplugin-rbac.yaml
wget https://github.com/ceph/ceph-csi/blob/devel/deploy/rbd/kubernetes/csi-config-map.yaml

创建新的命名空间(不知道是否需要, 先创建再说)

root@wangjm-B550M-K-1:~/k8s/csi/ceph/rbd# kubectl create ns ceph-rbd
namespace/ceph-rbd created
root@wangjm-B550M-K-1:~/k8s/csi/ceph/rbd# kubectl config set-context --current --namespace ceph-rbd
Context "kubernetes-admin@kubernetes" modified.

Create CSIDriver object:

kubectl create -f csidriver.yaml

Deploy RBACs for sidecar containers and node plugins:

vim csi-provisioner-rbac.yaml
:%s/namespace\: default/namespace\: ceph-rbd/g

vim csi-nodeplugin-rbac.yaml
:%s/namespace\: default/namespace\: ceph-rbd/g

kubectl create -f csi-provisioner-rbac.yaml
kubectl create -f csi-nodeplugin-rbac.yaml

Those manifests deploy service accounts, cluster roles and cluster role bindings. These are shared for both RBD and CephFS CSI plugins, as they require the same permissions.

//todo https://github.com/ceph/ceph-csi/tree/devel/deploy/rbd/kubernetes

//todo https://juejin.cn/post/7296756504912330767

配置nfs网络存储

nfs服务器

内网找一台主机,装nfs服务,提供nfs存储服务,方便k8s使用。

我这台电脑是archlinux,安装命令如下

pacman -Syy nfs-utils

nfs主要分两个版本 v3或v4。 注意:v4并不一定更好,协议和机制有点不同,所有现在是市面上两个版本的nfs都有使用的。

nfs服务器端全局配置在/etc/nfs.conf, 将文件系统导出为nfs目录的配置在/etc/exports中配置,然后执行exportfs -arv使配置生效。

# 启动服务
systemctl enable nfsv4-server
systemctl start nfsv4-server

# 编辑配置,将/data/ole/nfs下的几个目录导出为nfs文件夹,供内网挂载使用
vim /etc/exports
cat /etc/exports
# /etc/exports - exports(5) - directories exported to NFS clients
#
# Example for NFSv3:
#  /srv/home        hostname1(rw,sync) hostname2(ro,sync)
# Example for NFSv4:
#  /srv/nfs4        hostname1(rw,sync,fsid=0)
#  /srv/nfs4/home   hostname1(rw,sync,nohide)
# Using Kerberos and integrity checking:
#  /srv/nfs4        *(rw,sync,sec=krb5i,fsid=0)
#  /srv/nfs4/home   *(rw,sync,sec=krb5i,nohide)
#
# Use `exportfs -arv` to reload.

/ole/data/nfs/public 192.168.0.0/16(rw) 172.16.0.0/12(rw) 10.0.0.0/8(rw)
/ole/data/nfs/sync   192.168.0.0/16(rw,sync) 172.16.0.0/12(rw,sync) 10.0.0.0/8(rw,sync)
/ole/data/nfs/async  192.168.0.0/16(rw,async) 172.16.0.0/12(rw,async) 10.0.0.0/8(rw,async)
/ole/data/nfs/no_root_squash   192.168.0.0/16(rw,sync,no_root_squash) 172.16.0.0/12(rw,sync,no_root_squash) 10.0.0.0/8(rw,sync,no_root_squash)

后面k8s里使用的是这里/ole/data/nfs/no_root_squash的配置,注意这个配置及其不安全。暂时就我一个人用,就给它这个权限了。。。

What is root_squash? 什么是root_squash?

root_squash will allow the root user on the client to both access and create files on the NFS server as root. root_squash 将允许客户端上的 root 用户以 root 身份在 NFS 服务器上访问和创建文件。 Technically speaking, this option will force NFS to change the client’s root to an anonymous ID and, in effect, this will increase security by preventing ownership of the root account on one system migrating to the other system. 从技术上讲,此选项将强制 NFS 将客户端的 root 更改为匿名 ID,实际上,这将通过防止一个系统上的 root 帐户的所有权迁移到另一系统来提高安全性。 This is needed if you are hosting root filesystems on the NFS server (especially for diskless clients); with this in mind, it can be used (sparingly) for selected hosts, but you should not use no_root_squash unless you are aware of the consequences. 如果您在 NFS 服务器上托管根文件系统(特别是无盘客户端),则需要这样做;考虑到这一点,它可以(谨慎地)用于选定的主机,但您不应该使用 no_root_squash ,除非您知道后果。

参考: https://www.thegeekdiary.com/basic-nfs-security-nfs-no_root_squash-and-suid/

no_root_squash:登入 NFS 主机使用分享目录的使用者,如果是 root 的话,那么对于这个分享的目录来说,他就具有 root 的权限!这个项目『极不安全』,不建议使用! root_squash:在登入 NFS 主机使用分享之目录的使用者如果是 root 时,那么这个使用者的权限将被压缩成为匿名使用者,通常他的 UID 与 GID 都会变成 nobody 那个系统账号的身份;

修改了下文件夹权限(好像是因为权限不足)

chmod 777 /ole/data/nfs/no_root_squash

注意以上操作是极其不负责任和不安全的。。。以后再说

然后使之生效

exportfs -arv

查看网络上的nfs共享文件夹

showmount -e

可以看到已经允许挂载

[root@jingmin-kube-archlinux k8s]# showmount -e
Export list for jingmin-kube-archlinux:
/ole/data/nfs/no_root_squash 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/async          10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/sync           10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/public         10.0.0.0/8,172.16.0.0/12,192.168.0.0/16

在k8s中配置nfs相关的storageclass和provisioner

创建nfs provisioner和storageclass,以及rbac相关的权限对象。

参考:https://www.cnblogs.com/devopsyyds/p/16246116.html

参考:https://jimmysong.io/kubernetes-handbook/practice/using-nfs-for-persistent-storage.html

参考:https://medium.com/@myte/kubernetes-nfs-and-dynamic-nfs-provisioning-97e2afb8b4a9

注意这里,设为了默认存储类storageclass.kubernetes.io/is-default-class: "true"

[root@jingmin-kube-archlinux k8s]# cat ./nfs-storageclass.yaml 
## 创建了一个存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: fuseim.pri/ifs  
parameters:
  archiveOnDelete: "true"  ## 删除pv的时候,pv的内容是否要备份

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    # replace with namespace where provisioner is deployed
    kubernetes.io/metadata.name: nfs
  # replace with namespace where provisioner is deployed
  name: nfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/nfs-subdir-external-provisioner:v4.0.2
          # resources:
          #    limits:
          #      cpu: 10m
          #    requests:
          #      cpu: 10m
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: fuseim.pri/ifs  
            - name: NFS_SERVER
              value: 192.168.1.7 ## 指定自己nfs服务器地址
            - name: NFS_PATH  
              value: /ole/data/nfs/no_root_squash  ## nfs服务器共享的目录
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.1.7
            path: /ole/data/nfs/no_root_squash
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: nfs
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: nfs
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: nfs
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

确认已经创建了storageclass

[root@jingmin-kube-archlinux k8s]# kubectl get sc
NAME                    PROVISIONER      RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-storage (default)   fuseim.pri/ifs   Delete          Immediate           false                  15m

确认对应命名空间下已经创建了相应的资源

[root@jingmin-kube-archlinux k8s]# kubectl get all -n nfs
NAME                                          READY   STATUS    RESTARTS   AGE
pod/nfs-client-provisioner-5c7b77d69d-26498   1/1     Running   0          15m

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nfs-client-provisioner   1/1     1            1           15m

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/nfs-client-provisioner-5c7b77d69d   1         1         1       15m

测试nfs存储类可用

创建一个测试pvc (persistent volume claim,永久卷存储请求。也就是申请一块永久存储)

[root@jingmin-kube-archlinux k8s]# cat ./test-pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  namespace: default
spec:
  #storageClassName: nfs-storage
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 200Mi

执行,并确认申请到了存储

[root@jingmin-kube-archlinux k8s]# kubectl apply -f ./test-pvc.yaml 
persistentvolumeclaim/test-pvc created
[root@jingmin-kube-archlinux k8s]# kubectl get pvc
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
test-pvc   Bound    pvc-c559dceb-0965-4fbc-ba2d-2d11b76e0751   200Mi      RWX            nfs-storage    6s
[root@jingmin-kube-archlinux k8s]# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
pvc-c559dceb-0965-4fbc-ba2d-2d11b76e0751   200Mi      RWX            Delete           Bound    default/test-pvc   nfs-storage             9s

删除测试用的pvc

[root@jingmin-kube-archlinux k8s]# kubectl delete pvc test-pvc

节点压力驱逐

参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/kubelet-integration/

参考: https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/

参考: https://cloud.tencent.com/developer/article/1664414

参考: https://www.cnblogs.com/lianngkyle/p/16582132.html

kubeadm默认配置的kubelet可以没有加参数,这里设置一下节点压力驱逐的参数。防止内存、磁盘不足后系统hang住。

以下是错误做法(注意到有同目录下drop-file)

[root@jingmin-kube-master1 system]# cd /usr/lib/systemd/system
[root@jingmin-kube-master1 system]# ls
...
kubelet.service
kubelet.service.d
...

有个drop-file,八成是里面会覆盖掉默认的配置

[root@jingmin-kube-master1 system]# cd kubelet.service.d/
[root@jingmin-kube-master1 kubelet.service.d]# ls
10-kubeadm.conf
[root@jingmin-kube-master1 kubelet.service.d]# cat 10-kubeadm.conf 
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

果然,最后两行,覆盖了原来的命令。

同时还注意到,用户自定义的配置,应该放到/etc/sysconfig/kubelet里。

OK,恢复下原来的/usr/lib/systemd/system/kubelet.service文件。(仅适用于radhat系linux发行版,其他发行版也是类似的改环境变量的值)

修改下/etc/sysconfig/kubelet

[root@jingmin-kube-master1 kubelet.service.d]# vim /etc/sysconfig/kubelet 
[root@jingmin-kube-master1 kubelet.service.d]# cat /etc/sysconfig/kubelet 
KUBELET_EXTRA_ARGS="--node-ip=192.168.1.1 \
--kube-reserved=cpu=200m,memory=250Mi \
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \
--eviction-max-pod-grace-period=30 \
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"

重载服务配置,重启服务

[root@jingmin-kube-master1 kubelet.service.d]# systemctl daemon-reload 
[root@jingmin-kube-master1 kubelet.service.d]# systemctl restart kubelet.service 
[root@jingmin-kube-master1 kubelet.service.d]# systemctl status kubelet.service 

如果服务状态异常,可以看下日志

[root@jingmin-kube-master1 kubelet.service.d]# journalctl -exu kubelet

//todo coredns启动失败,flannel问题

参考: https://stackoverflow.com/questions/61373366/networkplugin-cni-failed-to-set-up-pod-xxxxx-network-failed-to-set-bridge-add

参考: https://github.com/kubernetes/kubernetes/issues/39557

暂时未成功, 晚点重新部署下.

一次重置集群失败的排查

root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--control-plane-endpoint=control.ole12138.top \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [control.ole12138.top kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.668546ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is not healthy after 4m0.000489035s

Unfortunately, an error has occurred:
        context deadline exceeded

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

参考: https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#options-for-software-load-balancing

参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/

好像是因为--control-plane-endpoint选项是作为k8s的api-server的负载均衡统一入口提供的, 但是我这里没这个条件(或者说有点复杂, 需要keepalived+haproxy).

去掉这个选项后再试一下.

root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.532835ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.501293034s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: o6g6nc.p444h6gg0rzu0kzl
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
        --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673 

成功了….


评论

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注