Contents
家庭网络k8s部署测试
参考: 官网容器运行时的文档:https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/
参考: 官网kubeadm创建集群的文档: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
参考: 网友创建集群的模板: https://blog.csdn.net/qq_44956318/article/details/121335756
参考: 网友创建集群的模板: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html
参考:阿里云ack集群网络参考: https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/plan-cidr-blocks-for-an-ack-cluster-1?spm=a2c4g.11186623.0.0.43b77678ng1wZI
前置说明
场景
场景: k8s云服务商性价比不足。(阿里云ack标准版开三个节点无负载,一天要百来块RMB。 实际上云厂商有提供边缘云服务,暂没有深入去了解)
场景: 自有IDC或企业有公网ip池,想配k8s。
场景:想亲自搭建一个k8s集群,对k8s整体有个理解。
之前已经装过一次了, 但是不小心control plane 搞炸了(没做高可用, 覆盖了配置, kubelet也连不上了)。。。 直接重装好了。
prerequisite
安装和简单使用过kubectl和minikube (https://kubernetes.io/docs/tasks/tools/)
有大体上看过官网concepts和tutorials文档部分(https://kubernetes.io/docs/concepts/, https://kubernetes.io/docs/tutorials/)
使用阿里云ack或其他云服务厂商提供的k8s服务,创建或实践过简单的storageclass, configmap, secret, pod, deployment, statefulset, service, ingress
关于本文的说明
这里主要使用kubeadm初始化k8s,但是有一些系统级的配置、容器运行时、网络相关的内容,可能需要手动配置。
注意,下面的实践,主要基于systemd作为1号进程的linux发行版(sysV不一定适合)。安装的k8s是目前最新的1.30
hostname | ip | 操作系统 | cri | k8s role | ceph role |
---|---|---|---|---|---|
wangjm-B550M-K-1 | 192.168.1.8 | ubuntu22 | containerd | control plane, worker | mon,osd,mgr |
wangjm-B550M-K-2 | 192.168.1.9 | ubuntu22 | containerd | control plane, worker | mon,osd |
wangjm-B550M-K-3 | 192.168.1.10 | ubuntu22 | cri-o | control plane, worker | mon,osd |
jingmin-kube-master1 | 192.168.1.1 | centos 8 stream | containerd | control plane | rgw gateway, mgr |
jingmin-kube-archlinux | 192.168.1.7 | archlinux | containerd | worker |
前置设置(每台主机都要设置)
主要是一些操作系统级别的设置
设置主机名
#主机1
hostnamectl set-hostname jingmin-kube-master1
#主机2
hostnamectl set-hostname jingmin-kube-archlinux
...
开启ntp时间同步
vim /etc/systemd/timesyncd.conf
cat /etc/systemd/timesyncd.conf
[Time]
NTP=ntp.ntsc.ac.cn
timedatectl set-ntp up
systemctl restart systemd-timesyncd.service
journalctl -u systemd-timesyncd --no-hostname --since "1 day ago"
加载内核模块
linux开启内核转发,加载转发和容器需要的模块
参考:https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#install-and-configure-prerequisites
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sudo sysctl --system
通过运行以下指令确认 br_netfilter
和 overlay
模块被加载:
lsmod | grep br_netfilter
lsmod | grep overlay
通过运行以下指令确认 net.bridge.bridge-nf-call-iptables
、net.bridge.bridge-nf-call-ip6tables
和 net.ipv4.ip_forward
系统变量在你的 sysctl
配置中被设置为 1:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
关swap
查看目前是否有swap
free -m
修改系统加载时不使用swap
cat "vm.swappiness = 0" >> /etc/sysctl.d/k8s.conf
sysctl --system
关闭所有的swap (本次启动期间有效)
swapoff -a
关闭所有的swap (下次启动有效)
vim /etc/fstab
很多人说这里直接注释掉swap分区那一行就可以了。
但是有的系统不适用。systemd会自动扫描gpt分区,加载其中的swap分区。要么fdisk工具删分区,要么这里/etc/fstab
中swap分区的options那一列由defaults改为noauto,即不自动挂载。
安装k8s工具
主要是安装 kubeadm、kubelet 和 kubectl
参考官方文档: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl
参考: https://discuss.kubernetes.io/t/facing-challanges-for-installation-of-kubernest-cluster-setup-through-kubeadm-on-rhel8-9-vm-404-for-https-packages-cloud-google-com-yum-repos-kubernetes-el7-x86-64-repodata-repomd-xml-ip-142-250-115-139/27345/2
centos/fedora/redhat是这样:
# 使用这个源
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/repodata/repomd.xml.key
EOF
ubuntu是这样
更新 apt 包索引并安装使用 Kubernetes apt 仓库所需要的包:
sudo apt-get update
# apt-transport-https 可能是一个虚拟包(dummy package);如果是的话,你可以跳过安装这个包
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
下载用于 Kubernetes 软件包仓库的公共签名密钥。所有仓库都使用相同的签名密钥,因此你可以忽略URL中的版本:
# 如果 `/etc/apt/keyrings` 目录不存在,则应在 curl 命令之前创建它,请阅读下面的注释。
# sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
说明:
在低于 Debian 12 和 Ubuntu 22.04 的发行版本中,/etc/apt/keyrings 默认不存在。 应在 curl 命令之前创建它。
添加 Kubernetes apt 仓库。 请注意,此仓库仅包含适用于 Kubernetes 1.30 的软件包; 对于其他 Kubernetes 次要版本,则需要更改 URL 中的 Kubernetes 次要版本以匹配你所需的次要版本 (你还应该检查正在阅读的安装文档是否为你计划安装的 Kubernetes 版本的文档)。
# 此操作会覆盖 /etc/apt/sources.list.d/kubernetes.list 中现存的所有配置。
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
更新 apt 包索引,安装 kubelet、kubeadm 和 kubectl,并锁定其版本:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
禁用selinux
# 将 SELinux 设置为 permissive 模式(相当于将其禁用)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
安装容器环境(可选)
docker/docker.io/podman-docker/podman
实际上,如果是给k8s用, 只需要配置后一节的容器运行时(CRI, container running time)即可。 这里装一下更高层的容器环境也可。
配置容器运行时CRI
容器运行时介绍
容器化发展大概是这样的: 裸机部署应用 -》容器化单机部署(docker run 单镜像)-》容器化单机部署(docker compose 一组应用)-》多节点容器化部署k8s (允许跨节点,伸缩。。)
容器化最初是docker独一家,开源,但仅支持单机使用。redhat, google等公司及开发者的加入,容器编排逐渐成熟,目前看起来最受市场认可的是google的k8s标准。然后有社区维护的k8s和各云厂商提供的k8s服务。
docker和k8s在oci层面达成了一致。(OCI – Open Container Initiative. It standardizes container images and runtimes.)。也就是说它们使用一样的镜像,底层是一样的运行时。
基于oci标准,docker和k8s有各自的运行时实现。对于已经安装了docker的主机,k8s支持直接使用docker的运行时containerd。对于没有安装docker的主机,可以直接安装cri-o等运行时。k8s在containerd/cri-o之上抽象出一个统一的接口层cri, 供k8s上层使用。

参考了这里的说明:https://phoenixnap.com/kb/docker-vs-containerd-vs-cri-o
参考了这里的说明: https://vineetcic.medium.com/the-differences-between-docker-containerd-cri-o-and-runc-a93ae4c9fdac
安装配置containerd运行时(docker环境)
安装containerd
安装docker-ce ,也就是社区版docker(安装docker时, centos可能默认替换为开源的podman)
yum install docker-ce
从前面的docker/k8s分层图,可以知道,安装docker时,已经安装了containerd。
配置containerd
参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd
参考: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd
可以使用默认的模板:
containerd config default > /etc/containerd/config.toml
也可以直接创建/etc/containerd/config.toml
。
然后调整或加入以下配置:
vim /etc/containerd/config.toml
#disabled_plugins = ["cri"]
enabled_plugins = ["cri"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.k8s.io/pause:3.9"
安装docker时,默认没有启用k8s需要的cri插件,这里启用,并设置为使用systemd管理的cgroup。
我主机上自带的pause版本较老,后面kubeadm初始化k8s的时候有警告。改用新一点的版本3.9。
配置containerd代理(可选)
后面用kubeadm初始化k8s的时候,发现有镜像拉不下来,可以选择这里加下代理配置。
vim /lib/systemd/system/containerd.service
临时配置下containerd.service的代理(加了Evironment两行),用完可以注释掉
# Copyright The containerd Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target local-fs.target
[Service]
Environment="HTTP_PROXY=http://192.168.1.7:8889"
Environment="HTTPS_PROXY=http://192.168.1.7:8889"
Environment="NO_PROXY=ole12138.top,localhost,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
LimitNOFILE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
重载服务,重启服务
systemctl daemon-reload
systemctl restart containerd
配置开机启动相关服务
systemctl enable containerd
systemctl enable kubelet
#systemctl enable docker
配置docker代理(可选)
如果手动docker pull镜像, 可以设置下docker代理(kubeadm调用的是containerd,不是这里的代理):
https://docs.docker.com/config/daemon/systemd/#httphttps-proxy
https://docs.docker.com/network/proxy/
安装配置CRI-O运行时(padman环境)
有一台主机装的是podman,不是docker。
参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubeadm.md
参考: https://github.com/cri-o/cri-o/blob/main/install.md
参考: https://github.com/cri-o/packaging/blob/main/README.md
参考: https://kubernetes.io/blog/2023/10/10/cri-o-community-package-infrastructure/
参考: https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/
对于ubuntu而言, 前面已经配过了k8s社区源
# 此操作会覆盖 /etc/apt/sources.list.d/kubernetes.list 中现存的所有配置。
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
还需要配一下cri-o的社区源(目前是预览版)
curl -fsSL https://pkgs.k8s.io/addons:/cri-o:/stable:/v1.30/deb/Release.key |
gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://pkgs.k8s.io/addons:/cri-o:/stable://v1.30/deb/ /" |
tee /etc/apt/sources.list.d/cri-o.list
更新包meta,并安装
apt-get update
apt-get install -y cri-o
启动
systemctl start crio.service
systemctl enable crio.service
发现podman和cri-o有点不兼容。podman太老了。
更新podman源
echo 'deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/unstable/xUbuntu_22.04/ /' | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:unstable.list
curl -fsSL https://download.opensuse.org/repositories/devel:kubic:libcontainers:unstable/xUbuntu_22.04/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/devel_kubic_libcontainers_unstable.gpg > /dev/null
sudo apt update
sudo apt install podman
参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubernetes.md
参考: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-join/
配置cri-o代理
参考: https://jbn1233.medium.com/docker-cri-o-behind-http-proxy-4a5645a9ff7b
root@wangjm-B550M-K-3:~# vim /lib/systemd/system/crio.service
root@wangjm-B550M-K-3:~# cat /lib/systemd/system/crio.service
[Unit]
Description=Container Runtime Interface for OCI (CRI-O)
Documentation=https://github.com/cri-o/cri-o
Wants=network-online.target
Before=kubelet.service
After=network-online.target
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/crio
Environment=GOTRACEBACK=crash
Environment="HTTP_PROXY=http://192.168.1.7:8889"
Environment="HTTPS_PROXY=http://192.168.1.7:8889"
Environment="NO_PROXY=ole12138.top,localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
ExecStart=/usr/bin/crio \
$CRIO_CONFIG_OPTIONS \
$CRIO_RUNTIME_OPTIONS \
$CRIO_STORAGE_OPTIONS \
$CRIO_NETWORK_OPTIONS \
$CRIO_METRICS_OPTIONS
ExecReload=/bin/kill -s HUP $MAINPID
TasksMax=infinity
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
OOMScoreAdjust=-999
TimeoutStartSec=0
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Alias=cri-o.service
配置第一个control plane 节点
注: 有些地方,将 control plane节点称作master节点。 实际上, 之前确实是叫master的, 后来改名为 control plane了。 参考: https://stackoverflow.com/questions/68860301/what-is-the-difference-between-master-node-and-control-plane-on-kubernetes
初始化master(control plane)节点
现在开始安装k8s,这里使用kubadm安装
官网kubeadm创建集群的文档: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
官网kubeadm创建集群的中文文档: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/
方法一: 可以参考这位网友的方法,直接调整kubeadm的配置文件
参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html
方法二: 也可以在安装时指定选项(这种方式使用默认的谷歌镜像源gcr.io,需要按前面的操作,配置代理)
参考: https://blog.csdn.net/qq_44956318/article/details/121335756
kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
注意: 还有个--control-plane-endpoint
选项, 是作为api-server高可用(需要额外的负载均衡器,对外/对内提供统一访问时才配置的. 错误配置会导致异常.)
参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint
参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint
参考:
记录下生成的内容,后面要用
root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.532835ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.501293034s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: o6g6nc.p444h6gg0rzu0kzl
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
--discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673
复制kubectl用到的配置
按照上面的提示,master节点上执行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
然后kubectl命令就可以使用了,可以查看和操作k8s中的各种资源了。
允许control plane调度pod
如果你希望能够在控制平面节点上调度 Pod,例如单机 Kubernetes 集群,请运行:
网络上的版本(旧版本可用):
参考: https://stackoverflow.com/questions/68860301/what-is-the-difference-between-master-node-and-control-plane-on-kubernetes
默认情况下主节点是不会调度运行pod的,如果想要master节点也可以运行pod,执行如下指令
kubectl taint nodes --all node-role.kubernetes.io/master-
末尾的减号标识去除相应的taint
官网的说明(新版本可用)
官网参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#control-plane-node-isolation
参考: https://stackoverflow.com/questions/56162944/master-tainted-no-pods-can-be-deployed
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
安装网络组件(第一个control plane上装一次, cluster扩张时各节点自动安装)
安装网络组件flannel
参考:https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network
参考:https://kubernetes.io/zh-cn/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model
参考: https://kubernetes.io/zh-cn/docs/concepts/cluster-administration/addons/#networking-and-network-policy
参考:https://github.com/flannel-io/flannel#deploying-flannel-manually
参考: https://github.com/flannel-io/flannel?tab=readme-ov-file#deploying-flannel-with-kubectl
参考: https://github.com/flannel-io/flannel/blob/master/Documentation/kubernetes.md
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
vim kube-flannel.yml
修改其中关于pod-cidr相关的设置,网段改为与上面pod-network-cidr一致(net-conf.json相关部分):
net-conf.json: |
{
"Network": "172.30.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
参考: https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#key-command-line-options
参考: https://github.com/containernetworking/cni/issues/486
flannel默认会使用默认网关所在的接口。
由于我在网关路由器上也安装了k8s,而它的默认网关是PPPoE拨号连接建立的ppp0接口地址。而与k8s通信的接口是另一块网卡接口。拨号连接断开重连,会导致k8s网络异常。
(我家庭网络网关路由上也装了k8s,拨号上网会建立起一个ppp0接口,默认安装flannel可能用的这个接口。ppp0重连,公网ip变动后,k8s服务异常。)
其中容器的参数追加两行iface-can-reach的配置,如下:
containers:
- args:
- --ip-masq
- --kube-subnet-mgr
- --iface-can-reach
- 192.168.1.1
kubectl logs
看了下flannel日志,网关路由器上的flannel这样好像选了lo网卡。算了,也能用了。实在不行可以试试--iface-regex
参数,正则指定下网段(暂未尝试)。
使用--iface-regex
的配置,如下:
containers:
- args:
- --ip-masq
- --kube-subnet-mgr
- --iface-regex
- 192.168.*
最终,flannel完整的yaml配置文件如下:
apiVersion: v1
kind: Namespace
metadata:
labels:
k8s-app: flannel
pod-security.kubernetes.io/enforce: privileged
name: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: flannel
name: flannel
namespace: kube-flannel
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: flannel
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- apiGroups:
- networking.k8s.io
resources:
- clustercidrs
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: flannel
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
---
apiVersion: v1
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
} net-conf.json: |
{
"Network": "172.30.0.0/16",
"Backend": {
"Type": "vxlan"
}
}kind: ConfigMap
metadata:
labels:
app: flannel
k8s-app: flannel
tier: node
name: kube-flannel-cfg
namespace: kube-flannel
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: flannel
k8s-app: flannel
tier: node
name: kube-flannel-ds
namespace: kube-flannel
spec:
selector:
matchLabels:
app: flannel
k8s-app: flannel
template:
metadata:
labels:
app: flannel
k8s-app: flannel
tier: node
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- args:
- --ip-masq
- --kube-subnet-mgr
- --iface-can-reach
- 192.168.1.1
command:
- /opt/bin/flanneld
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: EVENT_QUEUE_DEPTH
value: "5000"
image: docker.io/flannel/flannel:v0.22.1
name: kube-flannel
resources:
requests:
cpu: 100m
memory: 50Mi
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
privileged: false
volumeMounts:
- mountPath: /run/flannel
name: run
- mountPath: /etc/kube-flannel/
name: flannel-cfg
- mountPath: /run/xtables.lock
name: xtables-lock
hostNetwork: true
initContainers:
- args:
- -f
- /flannel
- /opt/cni/bin/flannel
command:
- cp
image: docker.io/flannel/flannel-cni-plugin:v1.2.0
name: install-cni-plugin
volumeMounts:
- mountPath: /opt/cni/bin
name: cni-plugin
- args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
command:
- cp
image: docker.io/flannel/flannel:v0.22.1
name: install-cni
volumeMounts:
- mountPath: /etc/cni/net.d
name: cni
- mountPath: /etc/kube-flannel/
name: flannel-cfg
priorityClassName: system-node-critical
serviceAccountName: flannel
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- hostPath:
path: /run/flannel
name: run
- hostPath:
path: /opt/cni/bin
name: cni-plugin
- hostPath:
path: /etc/cni/net.d
name: cni
- configMap:
name: kube-flannel-cfg
name: flannel-cfg
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
应用此插件(网络插件)
kubectl apply -f ./kube-flannel.yml
重要的TIPS
安装失败时的重置
如果安装过程中出现问题,尽量查下日志, 以及google下。
journalctl -xeu containerd
实在不行,删除重来
参考:https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
参考: https://stackoverflow.com/questions/57648829/how-to-fix-timeout-at-waiting-for-the-kubelet-to-boot-up-the-control-plane-as-st
kubeadm reset
也可用于卸载集群
参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html
参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#remove-the-node
想要撤销kubeadm执行的操作,首先要排除节点,并确保该节点为空, 然后再将其关闭。
在Master节点上运行:
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
kubectl delete node <node name>
然后在需要移除的节点上,重置kubeadm的安装状态:
sudo kubeadm reset
如果你想重新配置集群,使用新的参数重新运行kubeadm init
或者kubeadm join
即可。
If your cluster was setup to utilize IPVS, run ipvsadm –clear (or similar) to reset your system’s IPVS tables.
ipvsadm --clear
在 Kubernetes 中,使用 iptables 来实现网络隔离和转发。如果您需要重置 iptables 防火墙规则,您可以执行以下命令:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
这个命令将清除所有的 iptables 防火墙规则,并将计数器重置为零。这将影响所有正在运行的容器,因此请谨慎使用。在执行此命令之前,请确保您已经备份了 iptables 配置文件以及任何其他重要的配置文件,以防止数据丢失或损坏。
需要注意的是,在 Kubernetes 中,Iptables 是由 kube-proxy 进程自动管理的,如果您重置 iptables 规则,kube-proxy 将会自动重建 iptables 规则。因此,重置 iptables 规则只是一个临时的解决方案,如果您需要永久更改 iptables 规则,应该修改 kube-proxy 的配置文件并重新启动 kube-proxy 进程。
节点重新加入集群
参考: https://aiops.com/news/post/13773.html
kubernetes集群删除节点
以下操作都是在主节点下操作
一、先将节点设置为维护模式
我们可以通过输入以下命令来查看节点名称
kubectl get nodes
我们要删除的节点名称为k8s-node2,
我们通过以下命令将k8s-node2节点设置为维护模式
kubectl drain k8s-node2 –delete-local-data –force –ignore-daemonsets node/k8s-node2
二、删除节点
kubectl delete node k8s-node2
三、确认是否已经删除
kubectl get nodes
这样就完成了集群节点删除 。
如果节点想要重新加入集群,可以通过以下操作来重新加入。
一、生成token
我们在主节点输入以下命令来重新生成token
kubeadm token create –print-join-command
二、node节点重新加入集群
以下操作在node节点操作
停掉kubelet
systemctl stop kubelet
删除之前的相关文件
rm -rf /etc/kubernetes/*
重新加入集群,这里复制上一步生成的token,重新加入节点即可。
增加worker节点
worker节点配置运行时
同样需要确认和安装容器运行时。在node节点上来一遍 配置容器运行时CRI
安装过docker,看起来已经包含了containerd
ls /run/containerd/containerd.sock
有结果,确认是containerd (这里有说明: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-runtime)
mkdir -p /etc/containerd
vim /etc/containerd/config.toml
调整配置,内容略(前面章节有)。
直接从master那边复制过来也行:
scp 192.168.1.1:/etc/containerd/config.toml /etc/containerd/
后面在工作节点上 kubeadm join
操作的时候,这里有点问题。gcr.io不通,导致创建 kube-proxy和flannel的相关pod一直失败。
可以配置代理,kubectl delete pod
删一下旧的pod,会自动创建新的。
也可以参考这位网友的方法,直接调整kubeadm的配置文件。参考: https://www.cnblogs.com/RainingNight/p/using-kubeadm-to-create-a-cluster-1-12.html
systemctl enable kubelet
systemctl enable containerd
#systemctl enable docker
worker节点加入到k8s集群
使用前面master节点 kubeadm init
成功之后的输出(添加节点的输出)。
kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
--discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673
docker添加代理(也可以直接从master节点复制过来scp 192.168.1.8:/etc/docker/daemon.json ./
):
vim /etc/docker/daemon.json
{
"proxies": {
"http-proxy": "http://192.168.1.7:8889",
"https-proxy": "http://192.168.1.7:8889",
"no-proxy": "*.test.example.com,.example.org,*.ole12138.top,.ole12138.top,127.0.0.0/8,10.0.0.0,172.16.0.0/12,192.168.0.0/16"
}
}
重启服务
systemctl daemon-reload
systemctl restart docker
复制主节点kubectl配置
mkdir -p root/.kube
scp 192.168.1.8:/root/.kube/config /root/.kube/
或者
$HOME/.kube
scp root@192.168.1.8:/etc/kubernetes/admin.conf $HOME/.kube/config
刷新token,并增加worker节点
参考: https://www.cnblogs.com/hongdada/p/9854696.html
参考: https://blog.csdn.net/mailjoin/article/details/79686934
通过kubeadm初始化后,都会提供node加入的token:
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 18.16.202.35:6443 --token zr8n5j.yfkanjio0lfsupc0 --discovery-token-ca-cert-hash sha256:380b775b7f9ea362d45e4400be92adc4f71d86793ba6aae091ddb53c489d218c
默认token的有效期为24小时,当过期之后,该token就不可用了。
解决方法如下:
-
重新生成新的token
[root@node1 flannel]# kubeadm token create kiyfhw.xiacqbch8o8fa8qj [root@node1 flannel]# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS gvvqwk.hn56nlsgsv11mik6 <invalid> 2018-10-25T14:16:06+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token kiyfhw.xiacqbch8o8fa8qj 23h 2018-10-27T06:39:24+08:00 authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
-
获取ca证书
sha256
编码hash值[root@node1 flannel]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' 5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
-
节点加入集群
kubeadm join 18.16.202.35:6443 --token kiyfhw.xiacqbch8o8fa8qj --discovery-token-ca-cert-hash sha256:5417eb1b68bd4e7a4c82aded83abc55ec91bd601e45734d6aba85de8b1ebb057
几秒钟后,您应该注意到kubectl get nodes
在主服务器上运行时输出中的此节点。
上面太繁琐,一步到位:
kubeadm token create --print-join-command
增加control plane 节点()
对于docker/containerd容器节点
首先和增加普通worker节点一样
先在原control plane上生成 加入集群的命令
root@wangjm-B550M-K-1:~/k8s/cni# kubeadm token create --print-join-command
kubeadm join 192.168.1.8:6443 --token 2v0g5t.o2xz7jdnw2uagx5d --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673
参考: 添加controlplane https://blog.slys.dev/adding-worker-and-control-plane-nodes-to-the-kubernetes-cluster/
参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
参考: https://www.reddit.com/r/kubernetes/comments/18xddaj/how_do_i_add_another_control_plane_node_using/
在已有的control plane上执行, 更新certs, 获取添加普通节点命令(重要:必须)
root@wangjm-B550M-K-1:~/k8s/cni# kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
0aaf6a3a45a766033aaebd91d7e8e511fd0e801961b2895228a0a8989b976ee3
然后综合上述命令,
在新节点(需要添加为 上),执行:
root@wangjm-B550M-K-2:~# kubeadm join 192.168.1.8:6443 --token 2v0g5t.o2xz7jdnw2uagx5d --discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673 --control-plane --certificate-key 0aaf6a3a45a766033aaebd91d7e8e511fd0e801961b2895228a0a8989b976ee3
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance to a cluster that doesn't have a stable controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
To see the stack trace of this error execute with --v=5 or higher
这里增加了--control-plane --certificate-key
两个选项
这里失败的原因是, 之前在第一个节点上初始化集群的时候, 没有提供--control-plane-endpoint
选项, 是作为api-server高可用(需要额外的负载均衡器,对外/对内提供统一访问时才配置的. 错误配置会导致异常.)
这里就先不添加 新的control plane 的节点了.
参考: https://www.reddit.com/r/kubernetes/comments/18xddaj/how_do_i_add_another_control_plane_node_using/
对于podman+cri-o容器节点
参考: https://github.com/cri-o/cri-o/blob/main/tutorials/kubeadm.md
参考: https://github.com/cri-o/cri-o/blob/main/install.md
参考: https://github.com/cri-o/packaging/blob/main/README.md
参考: https://kubernetes.io/blog/2023/10/10/cri-o-community-package-infrastructure/
参考: https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/
如果是podman+ cri-o, 还需要加上:--cri-socket unix:///var/run/crio/crio.sock
root@wangjm-B550M-K-3:~# kubeadm join 192.168.1.8:6443 --token k24o0h.n7ewbmkrr07tuvvu --discovery-token-ca-cert-hash sha256:109777566281a9d7222ddb1c4f6ff151b818996d27bb74e549741623d047326c --control-plane --certificate-key 15e8796e94f4106f6ba01dcd78068104f9240d9e5b38ed5c1285141118958dba --cri-socket unix:///var/run/crio/crio.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[download-certs] Saving the certificates to the folder: "/etc/kubernetes/pki"
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-3] and IPs [172.31.0.1 192.168.1.10 192.168.1.8]
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-3] and IPs [192.168.1.10 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-3] and IPs [192.168.1.10 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.675331ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
{"level":"warn","ts":"2024-05-06T20:11:43.46408+0800","logger":"etcd-client","caller":"v3@v3.5.10/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000503880/192.168.1.8:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node wangjm-b550m-k-3 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-3 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
配置kubectl
按照上面的提示,master节点上执行
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
然后kubectl命令就可以使用了,可以查看和操作k8s中的各种资源了。
在控制平面节点上调度 Pod
网络上的版本:
默认情况下主节点是不会调度运行pod的,如果想要master节点也可以运行pod,执行如下指令
kubectl taint nodes --all node-role.kubernetes.io/master-
末尾的减号标识去除相应的taint
官网参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#control-plane-node-isolation
如果你希望能够在控制平面节点上调度 Pod,例如单机 Kubernetes 集群,请运行:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
最终,看下当前节点情况
root@wangjm-B550M-K-3:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
wangjm-b550m-k-1 Ready control-plane 3h13m v1.30.0
wangjm-b550m-k-2 Ready control-plane 153m v1.30.0
wangjm-b550m-k-3 Ready control-plane 5m17s v1.30.0
容器存储配置
k8s将各节点的cpu,内存都统一管理起来。但是节点的外存却不在调度与管理范围之内。
前面安装好k8s的master和node节点之后,允许各节点使用各自节点上的硬盘(hostPath类型的存储)。这种存储使用是有限制的,当pod发生调度到其他节点时,在之前节点hostPath存储的数据并不会同步到新节点上。
hostPath只适合存储临时数据。
对于需要持久化的数据,需要使用nfs网络存储,或ceph分布式存储,或其他云厂商提供的云存储服务等。
配置ceph网络存储 //todo 详见另外的文章:K8s使用外部Ceph集群
参考: https://github.com/ceph/ceph-csi?tab=readme-ov-file#overview
参考: https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md
参考: https://github.com/ceph/ceph-csi/blob/devel/docs/deploy-rbd.md#deployment-with-kubernetes
参考: https://github.com/ceph/ceph-csi/tree/devel/deploy/rbd/kubernetes
参考: https://www.cnblogs.com/hukey/p/17828946.html
参考: https://juejin.cn/post/7296756504912330767
参考: https://github.com/ceph/ceph-csi/blob/devel/examples/rbd/storageclass.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csidriver.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-rbdplugin.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml
wget https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/rbd/kubernetes/csi-nodeplugin-rbac.yaml
wget https://github.com/ceph/ceph-csi/blob/devel/deploy/rbd/kubernetes/csi-config-map.yaml
创建新的命名空间(不知道是否需要, 先创建再说)
root@wangjm-B550M-K-1:~/k8s/csi/ceph/rbd# kubectl create ns ceph-rbd
namespace/ceph-rbd created
root@wangjm-B550M-K-1:~/k8s/csi/ceph/rbd# kubectl config set-context --current --namespace ceph-rbd
Context "kubernetes-admin@kubernetes" modified.
Create CSIDriver object:
kubectl create -f csidriver.yaml
Deploy RBACs for sidecar containers and node plugins:
vim csi-provisioner-rbac.yaml
:%s/namespace\: default/namespace\: ceph-rbd/g
vim csi-nodeplugin-rbac.yaml
:%s/namespace\: default/namespace\: ceph-rbd/g
kubectl create -f csi-provisioner-rbac.yaml
kubectl create -f csi-nodeplugin-rbac.yaml
Those manifests deploy service accounts, cluster roles and cluster role bindings. These are shared for both RBD and CephFS CSI plugins, as they require the same permissions.
//todo https://github.com/ceph/ceph-csi/tree/devel/deploy/rbd/kubernetes
//todo https://juejin.cn/post/7296756504912330767
配置nfs网络存储
nfs服务器
内网找一台主机,装nfs服务,提供nfs存储服务,方便k8s使用。
我这台电脑是archlinux,安装命令如下
pacman -Syy nfs-utils
nfs主要分两个版本 v3或v4。 注意:v4并不一定更好,协议和机制有点不同,所有现在是市面上两个版本的nfs都有使用的。
nfs服务器端全局配置在/etc/nfs.conf
, 将文件系统导出为nfs目录的配置在/etc/exports
中配置,然后执行exportfs -arv
使配置生效。
# 启动服务
systemctl enable nfsv4-server
systemctl start nfsv4-server
# 编辑配置,将/data/ole/nfs下的几个目录导出为nfs文件夹,供内网挂载使用
vim /etc/exports
cat /etc/exports
# /etc/exports - exports(5) - directories exported to NFS clients
#
# Example for NFSv3:
# /srv/home hostname1(rw,sync) hostname2(ro,sync)
# Example for NFSv4:
# /srv/nfs4 hostname1(rw,sync,fsid=0)
# /srv/nfs4/home hostname1(rw,sync,nohide)
# Using Kerberos and integrity checking:
# /srv/nfs4 *(rw,sync,sec=krb5i,fsid=0)
# /srv/nfs4/home *(rw,sync,sec=krb5i,nohide)
#
# Use `exportfs -arv` to reload.
/ole/data/nfs/public 192.168.0.0/16(rw) 172.16.0.0/12(rw) 10.0.0.0/8(rw)
/ole/data/nfs/sync 192.168.0.0/16(rw,sync) 172.16.0.0/12(rw,sync) 10.0.0.0/8(rw,sync)
/ole/data/nfs/async 192.168.0.0/16(rw,async) 172.16.0.0/12(rw,async) 10.0.0.0/8(rw,async)
/ole/data/nfs/no_root_squash 192.168.0.0/16(rw,sync,no_root_squash) 172.16.0.0/12(rw,sync,no_root_squash) 10.0.0.0/8(rw,sync,no_root_squash)
后面k8s里使用的是这里/ole/data/nfs/no_root_squash
的配置,注意这个配置及其不安全。暂时就我一个人用,就给它这个权限了。。。
What is root_squash? 什么是root_squash?
root_squash will allow the root user on the client to both access and create files on the NFS server as root. root_squash 将允许客户端上的 root 用户以 root 身份在 NFS 服务器上访问和创建文件。 Technically speaking, this option will force NFS to change the client’s root to an anonymous ID and, in effect, this will increase security by preventing ownership of the root account on one system migrating to the other system. 从技术上讲,此选项将强制 NFS 将客户端的 root 更改为匿名 ID,实际上,这将通过防止一个系统上的 root 帐户的所有权迁移到另一系统来提高安全性。 This is needed if you are hosting root filesystems on the NFS server (especially for diskless clients); with this in mind, it can be used (sparingly) for selected hosts, but you should not use no_root_squash unless you are aware of the consequences. 如果您在 NFS 服务器上托管根文件系统(特别是无盘客户端),则需要这样做;考虑到这一点,它可以(谨慎地)用于选定的主机,但您不应该使用 no_root_squash ,除非您知道后果。
参考: https://www.thegeekdiary.com/basic-nfs-security-nfs-no_root_squash-and-suid/
no_root_squash:登入 NFS 主机使用分享目录的使用者,如果是 root 的话,那么对于这个分享的目录来说,他就具有 root 的权限!这个项目『极不安全』,不建议使用! root_squash:在登入 NFS 主机使用分享之目录的使用者如果是 root 时,那么这个使用者的权限将被压缩成为匿名使用者,通常他的 UID 与 GID 都会变成 nobody 那个系统账号的身份;
修改了下文件夹权限(好像是因为权限不足)
chmod 777 /ole/data/nfs/no_root_squash
注意以上操作是极其不负责任和不安全的。。。以后再说
然后使之生效
exportfs -arv
查看网络上的nfs共享文件夹
showmount -e
可以看到已经允许挂载
[root@jingmin-kube-archlinux k8s]# showmount -e
Export list for jingmin-kube-archlinux:
/ole/data/nfs/no_root_squash 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/async 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/sync 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
/ole/data/nfs/public 10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
在k8s中配置nfs相关的storageclass和provisioner
创建nfs provisioner和storageclass,以及rbac相关的权限对象。
参考:https://www.cnblogs.com/devopsyyds/p/16246116.html
参考:https://jimmysong.io/kubernetes-handbook/practice/using-nfs-for-persistent-storage.html
参考:https://medium.com/@myte/kubernetes-nfs-and-dynamic-nfs-provisioning-97e2afb8b4a9
注意这里,设为了默认存储类storageclass.kubernetes.io/is-default-class: "true"
[root@jingmin-kube-archlinux k8s]# cat ./nfs-storageclass.yaml
## 创建了一个存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: fuseim.pri/ifs
parameters:
archiveOnDelete: "true" ## 删除pv的时候,pv的内容是否要备份
---
apiVersion: v1
kind: Namespace
metadata:
labels:
# replace with namespace where provisioner is deployed
kubernetes.io/metadata.name: nfs
# replace with namespace where provisioner is deployed
name: nfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/nfs-subdir-external-provisioner:v4.0.2
# resources:
# limits:
# cpu: 10m
# requests:
# cpu: 10m
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: fuseim.pri/ifs
- name: NFS_SERVER
value: 192.168.1.7 ## 指定自己nfs服务器地址
- name: NFS_PATH
value: /ole/data/nfs/no_root_squash ## nfs服务器共享的目录
volumes:
- name: nfs-client-root
nfs:
server: 192.168.1.7
path: /ole/data/nfs/no_root_squash
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
确认已经创建了storageclass
[root@jingmin-kube-archlinux k8s]# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-storage (default) fuseim.pri/ifs Delete Immediate false 15m
确认对应命名空间下已经创建了相应的资源
[root@jingmin-kube-archlinux k8s]# kubectl get all -n nfs
NAME READY STATUS RESTARTS AGE
pod/nfs-client-provisioner-5c7b77d69d-26498 1/1 Running 0 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nfs-client-provisioner 1/1 1 1 15m
NAME DESIRED CURRENT READY AGE
replicaset.apps/nfs-client-provisioner-5c7b77d69d 1 1 1 15m
测试nfs存储类可用
创建一个测试pvc (persistent volume claim,永久卷存储请求。也就是申请一块永久存储)
[root@jingmin-kube-archlinux k8s]# cat ./test-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
namespace: default
spec:
#storageClassName: nfs-storage
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Mi
执行,并确认申请到了存储
[root@jingmin-kube-archlinux k8s]# kubectl apply -f ./test-pvc.yaml
persistentvolumeclaim/test-pvc created
[root@jingmin-kube-archlinux k8s]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
test-pvc Bound pvc-c559dceb-0965-4fbc-ba2d-2d11b76e0751 200Mi RWX nfs-storage 6s
[root@jingmin-kube-archlinux k8s]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-c559dceb-0965-4fbc-ba2d-2d11b76e0751 200Mi RWX Delete Bound default/test-pvc nfs-storage 9s
删除测试用的pvc
[root@jingmin-kube-archlinux k8s]# kubectl delete pvc test-pvc
节点压力驱逐
参考: https://kubernetes.io/zh-cn/docs/setup/production-environment/tools/kubeadm/kubelet-integration/
参考: https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/node-pressure-eviction/
参考: https://cloud.tencent.com/developer/article/1664414
参考: https://www.cnblogs.com/lianngkyle/p/16582132.html
kubeadm默认配置的kubelet可以没有加参数,这里设置一下节点压力驱逐的参数。防止内存、磁盘不足后系统hang住。
以下是错误做法(注意到有同目录下drop-file)
[root@jingmin-kube-master1 system]# cd /usr/lib/systemd/system
[root@jingmin-kube-master1 system]# ls
...
kubelet.service
kubelet.service.d
...
有个drop-file,八成是里面会覆盖掉默认的配置
root@jingmin-kube-master1 system]# cd kubelet.service.d/
[root@jingmin-kube-master1 kubelet.service.d]# ls
[10-kubeadm.conf
root@jingmin-kube-master1 kubelet.service.d]# cat 10-kubeadm.conf
[# Note: This dropin only works with kubeadm and kubelet v1.11+
Service]
[Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
果然,最后两行,覆盖了原来的命令。
同时还注意到,用户自定义的配置,应该放到/etc/sysconfig/kubelet
里。
OK,恢复下原来的/usr/lib/systemd/system/kubelet.service
文件。(仅适用于radhat系linux发行版,其他发行版也是类似的改环境变量的值)
修改下/etc/sysconfig/kubelet
root@jingmin-kube-master1 kubelet.service.d]# vim /etc/sysconfig/kubelet
[root@jingmin-kube-master1 kubelet.service.d]# cat /etc/sysconfig/kubelet
[KUBELET_EXTRA_ARGS="--node-ip=192.168.1.1 \
--kube-reserved=cpu=200m,memory=250Mi \
--eviction-hard=memory.available<5%,nodefs.available<10%,imagefs.available<10% \
--eviction-soft=memory.available<10%,nodefs.available<15%,imagefs.available<15% \
--eviction-soft-grace-period=memory.available=2m,nodefs.available=2m,imagefs.available=2m \
--eviction-max-pod-grace-period=30 \
--eviction-minimum-reclaim=memory.available=0Mi,nodefs.available=500Mi,imagefs.available=500Mi"
重载服务配置,重启服务
root@jingmin-kube-master1 kubelet.service.d]# systemctl daemon-reload
[root@jingmin-kube-master1 kubelet.service.d]# systemctl restart kubelet.service
[root@jingmin-kube-master1 kubelet.service.d]# systemctl status kubelet.service [
如果服务状态异常,可以看下日志
root@jingmin-kube-master1 kubelet.service.d]# journalctl -exu kubelet [
//todo coredns启动失败,flannel问题
参考: https://stackoverflow.com/questions/61373366/networkplugin-cni-failed-to-set-up-pod-xxxxx-network-failed-to-set-bridge-add
参考: https://github.com/kubernetes/kubernetes/issues/39557
暂时未成功, 晚点重新部署下.
一次重置集群失败的排查
root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--control-plane-endpoint=control.ole12138.top \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [control.ole12138.top kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.668546ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is not healthy after 4m0.000489035s
Unfortunately, an error has occurred:
context deadline exceeded
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
参考: https://github.com/kubernetes/kubeadm/blob/main/docs/ha-considerations.md#options-for-software-load-balancing
参考: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
好像是因为--control-plane-endpoint
选项是作为k8s的api-server的负载均衡统一入口提供的, 但是我这里没这个条件(或者说有点复杂, 需要keepalived+haproxy).
去掉这个选项后再试一下.
root@wangjm-B550M-K-1:~# kubeadm init \
--apiserver-advertise-address=192.168.1.8 \
--kubernetes-version v1.30.0 \
--service-cidr=172.31.0.0/20 \
--pod-network-cidr=172.30.0.0/16
[init] Using Kubernetes version: v1.30.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local wangjm-b550m-k-1] and IPs [172.31.0.1 192.168.1.8]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost wangjm-b550m-k-1] and IPs [192.168.1.8 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests"
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.532835ms
[api-check] Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is healthy after 3.501293034s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node wangjm-b550m-k-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: o6g6nc.p444h6gg0rzu0kzl
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.1.8:6443 --token o6g6nc.p444h6gg0rzu0kzl \
--discovery-token-ca-cert-hash sha256:4de7f3170388d48445b64c4d0c10529cf901523a8a27e1d0b1b202f11f697673
成功了….
发表回复