Policy Information
实际工作中,可能会碰到集群脑裂的情况,在脑裂时,会出现双 primary情况。这时,需要用户介入,人工判断哪个节点的数据最新,减少数据丢失。
一、测试环境信息
操作系统:
[kingbase@node1 bin]$ cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
数据库:
[kingbase@node1 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.
test=# select version();
version
----------------------------------------------------------------------------------------
KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)
./repmgr node rejoin -h 192.168.4.51-U esrep -d esrep --force-rewind
[kingbase@node2 bin]$ ./sys_ctl -D /Kingbase/ES/V9/cluster/data stop
[kingbase@node2 bin]$ ./repmgr node rejoin -h 192.168.4.51 -U esrep -d esrep --force-rewind
[NOTICE] rejoin target is node "node1" (ID: 1)
[NOTICE] executing sys_rewind
[DETAIL] sys_rewind command is "/Kingbase/ES/V9/cluster/kingbase/bin/sys_rewind -D '/Kingbase/ES/V9/cluster/data' --source-server='host=192.168.4.51 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'"
sys_rewind: 服务器在时间线1上的WAL位置0/A2B4940处发生了分歧
sys_rewind: 从时间线1上0/A2B4898处的最后一个普通检查点倒带
sys_rewind: 查找从 2024-02-04 17:51:06.234116 CST 到 2024-02-04 17:51:06.348186 CST 的最后一个公共检查点开始时间,以 "0.114070" 秒为单位.
sys_rewind: collect the number of WAL files to be processed:5, start time from 2024-02-04 17:51:06.349546 CST to 2024-02-04 17:51:06.407383 CST, cost "0.057837" seconds.
sys_rewind: 从目标服务器 0/A2B4898 读取WAL到 0/E0000A0 (端点 0/E0000A0)
sys_rewind: read the local Wal file information, start time from 2024-02-04 17:51:06.349546 CST to 2024-02-04 17:51:06.408433 CST, cost "0.000988" seconds.
sys_rewind: file replication start time from 2024-02-04 17:51:06.408545 CST to 2024-02-04 17:51:07.139666 CST, cost "0.731121" seconds.
sys_rewind: 更新控制文件:最小恢复点为 '0/E00FD98',最小恢复点TLI 为 '1',数据库状态为 'in archive recovery'
sys_rewind: 我们将删除 dir '/Kingbase/ES/V9/cluster/data/sys_replslot/repmgr_slot_5.rewind' 及其中的所有 file/dir.
sys_rewind: 我们将删除 dir '/Kingbase/ES/V9/cluster/data/sys_replslot/repmgr_slot_4.rewind' 及其中的所有 file/dir.
sys_rewind: 我们将删除 dir '/Kingbase/ES/V9/cluster/data/sys_replslot/repmgr_slot_3.rewind' 及其中的所有 file/dir.
sys_rewind: rewind start wal location 0/A2B4868 (file 00000001000000000000000A), end wal location 0/E00FD98 (file 00000001000000000000000E). wal data increment:62829(kB). time from 2024-02-04 17:51:06.408545 CST to 2024-02-04 17:51:07.202757 CST, in "0.968641" seconds.
sys_rewind: 完成!
[NOTICE] 0 files copied to /Kingbase/ES/V9/cluster/data
[INFO] creating replication slot as user "esrep"
[NOTICE] setting node 2's upstream to node 1
[WARNING] unable to ping "host=192.168.4.52 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000"
[DETAIL] KCIping() returned "KCIPING_NO_RESPONSE"
[NOTICE] begin to start server at 2024-02-04 17:51:07.218904
[NOTICE] starting server using "/Kingbase/ES/V9/cluster/kingbase/bin/sys_ctl -w -t 90 -D '/Kingbase/ES/V9/cluster/data' -l /Kingbase/ES/V9/cluster/kingbase/bin/logfile start"
[NOTICE] start server finish at 2024-02-04 17:51:07.538016
[NOTICE] NODE REJOIN successful
[DETAIL] node 2 is now attached to node 1
[kingbase@node2 bin]$
评论