Contents

1 网络拓扑
2 Question&Solution
- 2.1 Q: ppp重连后请求ingress中的服务失败
  - 2.1.1 QQ: nginx error log显示还是访问旧ip，没有访问ppp重连后的新公网ip
  - 2.1.2 QQ: 在k8s集群上各节点curl –resolve xxx.domain:443:ingress_ip https://xxx.domain 有的节点成功，有的节点失败

网络拓扑

eg. 将nacos.ole121238.cn解析到（或转发到）cluster的ingress地址（基于metallb的LoadBalancer）。

阿里云ecs转发配置（这台服务器有公网ip，dns解析为ole12138.cn）

nginx配置

[root@iZbp10a4jqmwddltilchn0Z ~]# cat /etc/nginx/nginx.conf
# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 1024;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    tcp_nopush          on;
    keepalive_timeout   65;
    types_hash_max_size 4096;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    include /etc/nginx/conf.d/*.conf;

}

stream {
    include /etc/nginx/tcp.d/*.conf;
}

然后是子目录中的配置

[root@iZbp10a4jqmwddltilchn0Z ~]# cat /etc/nginx/tcp.d/80_443.conf 
upstream ole12138_top_10080 {
    hash $remote_addr consistent;
    server ole12138.top:10080 max_fails=3 fail_timeout=10s;
}

server {
    listen 80;
    proxy_connect_timeout 20s;
    #proxy_timeout 5m;
    proxy_pass ole12138_top_10080;
}


upstream ole12138_top_10443 {
    hash $remote_addr consistent;
    server ole12138.top:10443 max_fails=3 fail_timeout=10s;
}

server {
    listen 443;
    proxy_connect_timeout 20s;
    #proxy_timeout 5m;
    proxy_pass ole12138_top_10443;
}

家庭网网关（linux软路由）配置

nginx配置

（这里只列出了子目录中的配置）

[root@jingmin-kube-master1 ~]# cat /etc/nginx/tcp.d/10080_10443.conf 
upstream 192.168.1.100_80 {
    hash $remote_addr consistent;
    server 192.168.1.100:80 max_fails=3 fail_timeout=10s;
}

server {
    listen 10080;
    proxy_connect_timeout 20s;
    #proxy_timeout 5m;
    proxy_pass 192.168.1.100_80;
}


upstream 192.168.1.100_443 {
    hash $remote_addr consistent;
    server 192.168.1.100:443 max_fails=3 fail_timeout=10s;
}

server {
    listen 10443;
    proxy_connect_timeout 20s;
    #proxy_timeout 5m;
    proxy_pass 192.168.1.100_443;
}

通过以上配置，公网对 nacos.ole12138.cn的请求，dns解析到了那台阿里云ecs服务器；经过这台阿里云ecs服务器转发，到了家庭网网关ole12138.top:10080/10443；家庭网网关linux又转发到192.168.1.100:80/443; 而192.168.1.100:80/443这个地址，是家庭网内部k8s的ingress服务地址。

这里经过阿里云服务器转发的原因是：家庭网虽然有公网ip，虽然ddns可以绑定到域名。但是电信这边，对于家庭宽带，默认是封了80/443端口的。

Question&Solution

Q: ppp重连后请求ingress中的服务失败

ppp重连后公网IP改变，之前的网络不一定可达。链路多处位置都有故障。

QQ: nginx error log显示还是访问旧ip，没有访问ppp重连后的新公网ip

参考： https://serverfault.com/questions/1010342/nginx-using-resolver-in-a-stream

SS:

首先需要确定ddns已经更新域名解析。可以在各主机上dig查看下。

nginx 启动时缓存了域名解析，若要更新域名解析，默认需要重启nginx。

加入了 resolver配置，以及 valid选项，无效。（社区版nginx 这两个配置是无效的）

可以使用变量的方式（但无法使用upstream），也可以使用商业版nginx。

也可以ppp重连时，触发自动更新ddns域名脚本，也触发重启各主机上nginx的脚本。（或者crond定时重启nginx）

生成密钥对

ssh-keygen

公钥丢到要登录的远程服务器

//公钥丢到要登录的远程服务器
//输密码，复制
scp ./id_rsa.pub root@ole12138.cn:/root/.ssh/

//登录远程服务器
ssh root@ole12138.cn

//将刚刚上传的公钥追加到对应的文件/root/.ssh/authorized_keys 末尾
cd /root/.ssh
cat id_rsa.pub >> authorized_keys

//从远程服务器退出
exit

在本机尝试使用刚刚密钥对中的私钥登录远程服务器

ssh -i /root/.ssh/id_rsa root@ole12138.cn

参考： https://developer.aliyun.com/article/269047

没问题后，远程执行命令

sleep 60 && ssh -i /root/.ssh/id_rsa root@ole12138.cn "nginx -s reload"

这里的意思是，暂停60s后，登陆远程服务器，重载一下nginx. （社区版nginx的stream中的resolver的valid配置无效，upstream中出现的域名解析，需要通过重载配置或重启nginx才能生效。）

将此内容追加到/etc/ppp/if-up.local中去，ppp重连后，并更新dns后，60s后重启一下远程服务器上的nginx.

最好在远程服务器上，也加个定时任务，保证域名解析能更新掉。

QQ: 在k8s集群上各节点`curl --resolve xxx.domain:443:ingress_ip https://xxx.domain` 有的节点成功，有的节点失败

SS:

CNI (我这边用的是flannel) 默认绑定的是默认网关所在的设备。

ppp0作为默认网关的主机（那台PPPoE拨号连接的那台网关主机），ppp重连后，发现路由表中有linkdown的路由。

所以需要修改下flannel配置文件，正则指定下绑定的网卡（ip或name）。

参考： https://github.com/flannel-io/flannel/blob/master/Documentation/configuration.md#key-command-line-options

参考： https://github.com/containernetworking/cni/issues/486

flannel默认会使用默认网关所在的接口。

由于我在网关路由器上也安装了k8s，而它的默认网关是PPPoE拨号连接建立的ppp0接口地址。而与k8s通信的接口是另一块网卡接口。拨号连接断开重连，会导致k8s网络异常。

（我家庭网络网关路由上也装了k8s，拨号上网会建立起一个ppp0接口，默认安装flannel可能用的这个接口。ppp0重连，公网ip变动后，k8s服务异常。）

其中容器的参数追加两行iface-can-reach的配置，如下：

      containers:
      - args:
        - --ip-masq
        - --kube-subnet-mgr
        - --iface-can-reach
        - 192.168.1.1

kubectl logs看了下flannel日志，网关路由器上的flannel这样好像选了lo网卡。算了，也能用了。实在不行可以试试--iface-regex参数，正则指定下网段（暂未尝试）。

网络拓扑

网络拓扑

Question&Solution

Q: ppp重连后请求ingress中的服务失败

QQ: nginx error log显示还是访问旧ip，没有访问ppp重连后的新公网ip

QQ: 在k8s集群上各节点`curl --resolve xxx.domain:443:ingress_ip https://xxx.domain` 有的节点成功，有的节点失败

评论

发表回复取消回复

网络拓扑

网络拓扑

Question&Solution

Q: ppp重连后请求ingress中的服务失败

QQ: nginx error log显示还是访问旧ip，没有访问ppp重连后的新公网ip

QQ: 在k8s集群上各节点curl --resolve xxx.domain:443:ingress_ip https://xxx.domain 有的节点成功，有的节点失败

评论

发表回复 取消回复

QQ: 在k8s集群上各节点`curl --resolve xxx.domain:443:ingress_ip https://xxx.domain` 有的节点成功，有的节点失败

发表回复取消回复