在云环境中,Kubernetes 控制面的高可用通常交给云厂商的 LoadBalancer 来解决。但在以下场景中,这条路并不成立:

  • 私有 IDC / 裸金属集群
  • 边缘计算、离线环境
  • 公有云迁回自建机房(缺少云 LB、云 SLB)
  • 希望减少组件复杂度、避免引入 keepalived / VRRP

我之前有了解过 sealos,它提供了一种非常“反常规但优雅”的方案:
不做 VIP,不做 VRRP,而是用 IPVS + LVScare 在每个节点本地维护 kube-apiserver 的高可用访问能力。

本文就是在这一思路启发下,对 LVScare 作为 Kubernetes 高可用组件 的一次整理与实践总结。

传统 Kubernetes API Server 高可用方案回顾

方案一:云 LoadBalancer

优点:

  • 简单、稳定
  • 对使用者透明

缺点:

  • 强依赖云厂商
  • 私有化 / IDC 场景不可用

方案二:VRRP + keepalived(VIP)

           VIP:6443
                 |
      +----------+------------+
      |                       |
    Nginx   <-keepalived->   Nginx(Backup)
      |
  +---+--------------------------+
  |               |              |
apiserver      apiserver     apiserver

问题在于:

  • 需要额外的 LB 节点
  • 需要维护 VRRP / VIP
  • 架构组件偏重

sealos 的思路:去中心化的 API Server 高可用

sealos 提出的核心思路是:

不再提供一个“中心入口”,而是让每个节点都具备访问“所有 API Server 的能力”。

hosts 配置

假设节点主机名和 IP 如下,在每个节点上配置 hosts。

10.0.0.1 apiserver.cluster.local # 临时解析到第一个master的ip,集群搭建完成后会改掉
10.0.0.1 master1
10.0.0.2 master2
10.0.0.3 master3
10.0.0.11 node1

安装软件

配置 Kubernetes 软件源:

cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.33/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

安装所需软件:

yum install -y runc ipset ipvsadm
yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

安装containerd:

tar xf containerd-2.x.x-linux-amd64.tar.gz -C /usr/local
  • 配置systemd服务
cat > /usr/local/lib/systemd/system/containerd.service << "EOF"
[Unit]
Description=containerd container runtime
Documentation=https://containerd.io
After=network.target

[Service]
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/containerd
Type=notify
Delegate=yes
KillMode=process
Restart=always
RestartSec=5
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target
EOF
  • 设置服务开机自启
systemctl enable containerd kubelet
  • 配置 containerd
mkdir -p /etc/containerd

cat > /etc/containerd/config.toml << "EOF"
version = 3
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
  address = "/run/containerd/containerd.sock"

[plugins.'io.containerd.internal.v1.opt']
  path = "/var/lib/containerd"

[plugins.'io.containerd.grpc.v1.cri']
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"

[plugins.'io.containerd.cri.v1.runtime']
  enable_selinux = false
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  device_ownership_from_security_context = false

[plugins.'io.containerd.cri.v1.images']
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true

[plugins.'io.containerd.cri.v1.images'.pinned_images]
  sandbox = "registry.k8s.io/pause:3.10"

[plugins.'io.containerd.cri.v1.runtime'.cni]
  conf_dir = "/etc/cni/net.d"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runhcs-wcow-process]
  runtime_type = "io.containerd.runhcs.v1"

[plugins.'io.containerd.cri.v1.images'.registry]
  config_path = "/etc/containerd/certs.d"
EOF

配置私有仓库:

  • 你需要一个私有仓库,并且做了镜像代理:

    • hub.yourdomain.com/docker.io -> 可以直接代理拉取docker.io的镜像
    • hub.yourdomain.com/registry.k8s.io -> 可以直接代理拉取registry.k8s.io的镜像

如果你的私有仓库还不具备这样的能力,你需要手动搬运镜像并且适当改造以下containerd的配置文件。

mkdir -p /etc/containerd/certs.d/{docker.io,registry.k8s.io}

cat > /etc/containerd/certs.d/docker.io/hosts.toml << "EOF"
[host]

[host."https://hub.yourdomain.com/v2/docker.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
EOF

cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << "EOF"
[host]

[host."https://hub.yourdomain.com/v2/registry.k8s.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
EOF

运行 containerd:

systemctl start containerd

以上操作,在每个节点都要执行。

初始化第一个控制节点

生成 kubeadm 配置文件:

kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm.yaml

在此基础上,修改:

apiVersion: kubeadm.k8s.io/v1beta4
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.0.0.1 # 修改,节点的IP
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  imagePullSerial: true
  name: master1             # 修改,节点的主机名
  taints: null
timeouts:
  controlPlaneComponentHealthCheck: 4m0s
  discovery: 5m0s
  etcdAPICall: 2m0s
  kubeletHealthCheck: 4m0s
  kubernetesAPICall: 1m0s
  tlsBootstrap: 5m0s
  upgradeManifests: 5m0s
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1   # 新增配置段,设置kube-proxy为ipvs模式
kind: KubeProxyConfiguration
mode: ipvs
---
apiServer:
  certSANs:  # 新增配置段,添加其他master节点的相关信息
  - master1
  - master2
  - master3
  - 10.0.0.1
  - 10.0.0.2
  - 10.0.0.3
  - 100.100.100.100  # 固定写 100.100.100.100,后面用作负载均衡IP
  - apiserver.cluster.local
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s # 修改证书有效期为10年
certificateValidityPeriod: 87600h0m0s   # 修改证书有效期为10年
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: apiserver.cluster.local:6443  # 添加这个配置
controllerManager: {}
dns: {}
encryptionAlgorithm: RSA-2048
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: hub.yourdomain.com/docker.io/kubesphere  # 修改为私有仓库地址,使用kubesphere是因为这个仓库有完整的Kubernetes镜像
kind: ClusterConfiguration
kubernetesVersion: 1.33.3    # 修改版本号,一般安装这个版本的最新小版本
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12  # 按需修改
  podSubnet: 10.244.0.0/16     # 按需修改
proxy: {}
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10   # 如果修改了serviceSubnet,这个也要改
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
crashLoopBackOff: {}
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
    text:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

拉取镜像:

kubeadm config images pull --config kubeadm.yaml

初始化节点:

kubeadm init --upload-certs --config kubeadm.yaml
init 完成后,会输出两条 join 命令,一条是添加控制节点,一条是添加工作节点。

添加控制节点

执行第一个控制节点输出的那条添加控制节点的 kubeadm join 命令即可。

kubeadm join apiserver.cluster.local:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx \
--control-plane --certificate-key xxx

添加工作节点

为了实现 kube-apiserver 的高可用,引入一个负载均衡工具 lvscare,其工作原理如下:

  +----------+                     +---------------+  virturl server: 100.100.100.100:6443
  | master1  |<--------------------|  nodes(ipvs)  |    real servers:
  +----------+                     |---------------+          10.0.0.1:6443
                                   |                          10.0.0.2:6443
  +----------+                     |                          10.0.0.3:6443
  | master2  |<--------------------+
  +----------+                     |
                                   |
  +----------+                     |
  | master3  |<--------------------+
  +----------+

lvscare 做为静态 Pod 运行,会被 kubelet 拉起。

配置静态 Pod 部署文件:

注意修改 rs 的地址
mkdir -p /etc/kubernetes/manifests

cat > /etc/kubernetes/manifests/lvscare.yaml << "EOF"
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: kube-lvscare
  namespace: kube-system
spec:
  containers:
  - args:
    - care
    - --vs
    - 100.100.100.100:6443
    - --health-path
    - /healthz
    - --health-schem
    - https
    - --rs
    - 10.0.0.1:6443    # 第一个控制节点的apiserver
    - --rs
    - 10.0.0.2:6443    # 第二个控制节点的apiserver
    - --rs
    - 10.0.0.3:6443    # 第三个控制节点的apiserver
    command:
    - /usr/bin/lvscare
    image: hub.yourdomain.com/docker.io/labring/lvscare:v5.0.1
    imagePullPolicy: IfNotPresent
    name: kube-lvscare
    resources: {}
    securityContext:
      privileged: true
    volumeMounts:
    - mountPath: /lib/modules
      name: lib-modules
      readOnly: true
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /lib/modules
      type: ""
    name: lib-modules
status: {}
EOF

执行那条添加工作节点的 kubeadm join 命令即可。

kubeadm join apiserver.cluster.local:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:xxx

安装 canal 网络插件

下载 canal 部署文件:

https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/canal.yaml

默认网段:10.244.0.0/16
如果前面部署 Kubernetes 集群修改过这个网段,那么 canal的yaml也需要改成一样的网段。
  # Flannel network configuration. Mounted into the flannel container.
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

部署 canal:

kubectl apply -f canal.yaml

修改 apiserver.cluster.local hosts 解析

  • 工作节点
100.100.100.100 apiserver.cluster.local
  • 控制节点

控制节点的 apiserver.cluster.local 解析到本机 IP 地址(或127.0.0.1)即可。

标签: none

添加新评论