金仓赵今麦的KES RWC集群扩缩容奇遇记
前情提要:初探 RWC 秘境
上回说到,金仓赵今麦在师父的指导下成功搭建了 KES RWC 三节点集群。看着监控面板上跳动的数据流,她仿佛看到了数字世界的血脉在三个节点间奔涌不息。但师父的一席话让她陷入沉思:"集群如同活物,需懂得呼吸吐纳之道。今日教你集群的'生长术'与'缩骨功'。"
回归现实。金仓数据库中默认配套了集群管理的图形化操作工具。但对于一些权限管控严格操作环境,或者操作系统以命令行模式启动,就只能使用指令对数据库集群进行管理和维护。
为了应对业务扩张和数据量增长,或者建设完善多机房、异地容灾机制,我们时常需要对数据库集群进行扩容。
Part 1. 集群生长的秘密仪式
神秘祭坛的召唤
赵今麦轻点终端,三节点的运行状态如星图般展开:
[kingbase@kes1 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 ~]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
1 | node1 | primary | * running | | running | 2970 | no | n/a
2 | node2 | standby | running | node1 | running | 2167 | no | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 2163 | no | 0 second(s) ago
[kingbase@kes1 ~]$
"就像给大树嫁接新枝,"师父的声音在耳边响起,"需先寻得灵土——准备同源而生的 kes4 服务器。"
血脉相连的仪式
配置免密通道时,赵今麦仿佛在节点间架设无形的桥梁:
[kingbase@kes1 zip]$ sudo ./trust_cluster.sh
...
connect to "kes4" from current node by 'ssh' root:0 kingbase:0..... OK
check ssh connection success!
[kingbase@kes1 zip]$
这让她想起武侠小说中的经脉贯通,节点间的信任通道就是集群的任督二脉。
生命复刻的魔法
透过魔法球,我们看到了新土壤的养分足以支撑新树移栽。
# 从 kes1 复制客户端工具和授权文件到 kes4 服务器
[kingbase@kes1 V009R004C010]$ pwd
/opt/Kingbase/ES/V9/KESRealPro/V009R004C010
[kingbase@kes1 V009R004C010]$ scp -r ClientTools license.dat kes4:/opt/Kingbase/ES/V9/
...
# 修改配置文件
[kingbase@kes4 zip]$ diff install.conf install.conf.bak | grep '^<'
< all_ip=(kes1 kes2 kes3 kes4)
< net_device=(ens160 ens160 ens160 ens160)
< net_device_ip=(192.168.43.91 192.168.43.92 192.168.43.93 192.168.43.94)
< expand_type="0"
< primary_ip="kes1"
< expand_ip="192.168.43.94"
< node_id="4"
< sync_type="0"
< install_dir="/home/kingbase/cluster/install"
< zip_package="/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip/db.zip"
< net_device=(ens160)
< net_device_ip=(192.168.43.94)
[kingbase@kes4 zip]$
执行扩容时,屏幕闪烁的代码如同跳动的符文:
[kingbase@kes4 zip]$ pwd
/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip
[kingbase@kes4 zip]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.168.43.94" ..... OK
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] file format is correct ... OK
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
/home/kingbase/cluster/install/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /home/kingbase/cluster/install/kingbase/data -h kes1 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_4
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[NOTICE] standby node "node4" (ID: 4) successfully registered
[2025-03-07 23:56:01] [NOTICE] redirecting logging output to "/home/kingbase/cluster/install/kingbase/log/kbha.log"
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes4 zip]$
赵今麦屏息凝视,直到新节点如破茧之蝶般现身集群:
此刻的监控面板上,数据洪流自动分出一支注入新节点,宛如江河开凿新河道般自然。
Part 2. 缩容的精准手术
诊断病灶节点
当需要下线 node3 时,赵今麦先以"望闻问切"之法检查节点状态:
[kingbase@kes3 ~]$ repmgr node check
Node “node3”:
Server role: OK (node is standby)
Replication lag: OK (0 seconds)
WAL archiving: OK (0 pending archive ready files)
Upstream connection: OK (node “node3” (ID: 3) is attached to expected upstream node “node1” (ID: 1))
Downstream servers: OK (this node has no downstream nodes)
Replication slots: OK (node has no physical replication slots)
Missing physical replication slots: OK (node has no missing physical replication slots)
Configured data directory: OK (configured “data_directory” is “/home/kingbase/cluster/install/kingbase/data”)
确认节点健康后,她开始准备这场"无痛摘除术"。
精准切割术
修改配置文件如同调整手术方案,注意定位文件中 [shrink] 部分:
[kingbase@kes1 zip]$ diff install.conf install.conf.bak | grep '^<'
< shrink_type="0"
< primary_ip="kes1"
< shrink_ip="kes3"
< node_id="3"
< install_dir="/home/kingbase/cluster/install"
[kingbase@kes1 zip]$
执行缩容命令时,她仿佛看到数据流被优雅地重定向:
[kingbase@kes1 zip]$ ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "kes3" ..... OK
[RUNNING] The /home/kingbase/cluster/install/kingbase/bin dir exist on "kes3" ... OK
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes3 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[INFO] node:kes3 can be deleted ... OK
[NOTICE] unregistering node 3
[INFO] standby unregistration complete
2025-03-08 00:06:44 DB on "[localhost]" stop success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 zip]$
整个过程如庖丁解牛,node3 平滑下线不留隐患。
术后康复观察
确认集群状态,四节点缩容成三节点后,已自动重组为新的稳定三角:
[kingbase@kes1 zip]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=kes1 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=kes2 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.43.94 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[kingbase@kes1 zip]$
后记:集群生命周期的顿悟
"这是数字生命的呼吸,"师父缓缓道,"优秀的 DBA 不是操作工,而是把握集群生命韵律的医者。KES 的扩缩容不是简单的加减法,而是让系统始终保持'黄金平衡点'的艺术。"
月色下的机房,赵今麦凝视着呼吸灯有节奏的明灭。
她突然领悟到:
"扩缩容的真谛,在于让数据库学会像生物一样——在春天生长,在秋天沉淀。就像《庄子》所言:'其生若浮,其死若休'。每个节点的加入都为集群注入新活力,而优雅下线则是数字生命的轮回。"
但望着集群安装脚本,她又在思考:
"若能在 dry-run 中预见操作结果,就像拥有预见未来的水晶球,该多好?"
[kingbase@kes4 zip]$ ./cluster_install.sh help
Do not choose any method, install/expand/shrink!
机房外,晨曦微露。
赵今麦知道,这场关于数据库生命奥秘的探索,才刚刚开始......
金仓数据库产品体验官招募 ING
🎁福利时间
关注本公众号,在本文留言,点赞数最多的一位送出金仓定制台历一本。截止时间:3月12日17点。
Have a nice day ~ ☕
🌻 往期精彩 ▼
-
「合集」三年 50 篇,TiDB 干货全收录 -
「合集」MySQL 8.x 系列文章汇总 -
GQL:SQL的新兄弟 -
TiDB 新朋友 DBdoctor -
Oracle 数据库全面升级为 23ai -
广东的崖山,中国的崖山数据库 -
TiDB v8 发版!超硬核 v8 引擎! -
几张图带你了解 TiDB 架构演进 -
Easysearch 性能测试方法概要 -
一文带你了解 GB 18030-2022 字符集 -
一文带你了解 KING BASE 金仓数据库 -
全球 Oracle ACE 社区突破 500 位成员 -
如何选择适合的 MySQL Connector/J 版本 -
即将告别 PG 12,建议升级到 PG 16.3 版本 -
IvorySQL 4.0 发布!新增支持 Ubuntu 系统 -
一文了解金仓数据库 KES 的 SQL Server 兼容性 -
G-Star Landscape 2.0 重磅发布,助力开源生态再升级 -
【一文讲透(番外篇)】如何编译安装KWDB v2.0.4数据库 -
TiDB x DeepSeek 打造更好用的国产知识库问答系统解决方案
-- / END / --
👉 这里可以找到我
-
微信公众号: @少安事务所 -
ITPUB: @少安事务所 -
TiDB 专栏: @ShawnYan -
PGFans: @严少安 -
墨天轮: @严少安
👉 这里有得聊
如果对国产基础软件(操作系统、数据库、中间件)感兴趣,可以加群一起聊聊。 关注微信公众号:少安事务所,后台回复[群],即可看到入口。
如果这篇文章为你带来了灵感或启发,请帮忙『三连』吧,感谢!ღ( ´・ᴗ・` )~
本文分享自微信公众号 - 少安事务所(mysqloffice)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。