您现在的位置是:首页 >技术交流 >RabbitMQ集群镜像模式崩溃网站首页技术交流

RabbitMQ集群镜像模式崩溃

Yu_- 2026-07-04 00:01:04
简介RabbitMQ集群镜像模式崩溃

一、问题

集群在部署上是没问题的,正常运行了好长一段时间。但在我后续搭建其他服务的时候集群突然崩溃了,不知道是不是消息队列太多导致?

[root@controller01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@controller01
[{nodes,[{disc,[rabbit@controller01]},
         {ram,[rabbit@controller03,rabbit@controller02]}]},
 {running_nodes,[rabbit@controller01]},
 {cluster_name,<<"rabbit@controller01">>},
 {partitions,[{rabbit@controller01,[rabbit@controller02,
                                    rabbit@controller03]}]},
 {alarms,[{rabbit@controller01,[]}]}]

集群状态中显示,集群处在分裂状态,只剩下controller01节点还在运行着,剩下两个节点干啥去了?

使用systemctl status rabbitmq-server命令检测另外两台掉了的节点,也妹问题啊。查看日志,没提供什么有效信息。

[root@controller01 ~]# systemctl status rabbitmq-server
● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2025-01-17 22:37:09 EST; 3 weeks 5 days ago
 Main PID: 21937 (beam.smp)
   Status: "Initialized"
   CGroup: /system.slice/rabbitmq-server.service
           ├─21937 /usr/lib64/erlang/erts-8.3.5.3/bin/beam.smp -W w -A 96 -P 1048576 -t 5000000 -stbt db -zdbbl ...
           ├─22170 erl_child_setup 1024
           ├─22186 inet_gethost 4
           └─22187 inet_gethost 4

Jan 17 22:37:08 controller01 systemd[1]: Starting RabbitMQ broker...
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: RabbitMQ 3.6.16. Copyright (C) 2007-2018 Pivotal Softwa...Inc.
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: ##  ##      Licensed under the MPL.  See http://www.rab...com/
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: ##  ##
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: ##########  Logs: /var/log/rabbitmq/rabbit@controller01.log
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: ######  ##        /var/log/rabbitmq/rabbit@controller01....log
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: ##########
Jan 17 22:37:08 controller01 rabbitmq-server[21937]: Starting broker...
Jan 17 22:37:09 controller01 systemd[1]: Started RabbitMQ broker.
Jan 17 22:37:09 controller01 rabbitmq-server[21937]: completed with 0 plugins.
Hint: Some lines were ellipsized, use -l to show in full.

[root@controller02 ~]# systemctl status rabbitmq-server
● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2025-01-17 22:49:18 EST; 3 weeks 5 days ago
 Main PID: 21800 (beam.smp)
   Status: "Initialized"
   CGroup: /system.slice/rabbitmq-server.service
           ├─21800 /usr/lib64/erlang/erts-8.3.5.3/bin/beam.smp -W w -A 96 -P 1048576 -t 5000000 -stbt db -zdbbl ...
           ├─22033 erl_child_setup 1024
           ├─22049 inet_gethost 4
           └─22050 inet_gethost 4

Jan 17 22:49:18 controller02 systemd[1]: Started RabbitMQ broker.
Jan 17 22:49:19 controller02 rabbitmq-server[21800]: completed with 0 plugins.
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: RabbitMQ 3.6.16. Copyright (C) 2007-2018 Pivotal Softwa...Inc.
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: ##  ##      Licensed under the MPL.  See http://www.rab...com/
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: ##  ##
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: ##########  Logs: /var/log/rabbitmq/rabbit@controller02.log
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: ######  ##        /var/log/rabbitmq/rabbit@controller02....log
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: ##########
Jan 17 22:51:43 controller02 rabbitmq-server[21800]: Starting broker...
Jan 17 22:51:44 controller02 rabbitmq-server[21800]: completed with 0 plugins.
Hint: Some lines were ellipsized, use -l to show in full.

[root@controller03 ~]# systemctl status rabbitmq-server
● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2025-01-17 22:49:21 EST; 3 weeks 5 days ago
 Main PID: 21384 (beam.smp)
   Status: "Initialized"
   CGroup: /system.slice/rabbitmq-server.service
           ├─21384 /usr/lib64/erlang/erts-8.3.5.3/bin/beam.smp -W w -A 96 -P ...
           ├─21617 erl_child_setup 1024
           ├─21633 inet_gethost 4
           └─21634 inet_gethost 4

Jan 17 22:49:21 controller03 systemd[1]: Started RabbitMQ broker.
Jan 17 22:49:21 controller03 rabbitmq-server[21384]: completed with 0 plugins.
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: RabbitMQ 3.6.16. Copyrig...
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: ##  ##      Licensed und...
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: ##  ##
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: ##########  Logs: /var/l...
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: ######  ##        /var/l...
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: ##########
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: Starting broker...
Jan 17 22:52:09 controller03 rabbitmq-server[21384]: completed with 0 plugins.
Hint: Some lines were ellipsized, use -l to show in full.

因为做的镜像模式,所以不敢直接关闭服务,害怕数据不统一,所以这时需要谨慎操作。

二、解决过程

集群状态中,集群还是存在的,所以,我们需要在全部集群上,先把任务停止掉

这条命令会停止RabbitMQ接受和处理消息,但不关闭RabbitMQ服务

[root@controller01 ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@controller01

[root@controller02 ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@controller02

[root@controller03 ~]# rabbitmqctl stop_app
Stopping rabbit application on node rabbit@controller03

再看看集群状态?

[root@controller01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@controller01
[{nodes,[{disc,[rabbit@controller01]},
         {ram,[rabbit@controller02,rabbit@controller03]}]},
 {alarms,[]}]

耶!集群还存在就好

因为之前已经成功把集群搭起来了,所以这时不用做什么搭建集群的配置,直接在全部节点开启RabbitMQ,让它接受和处理消息

[root@controller01 ~]# rabbitmqctl start_app
Starting node rabbit@controller01

[root@controller02 ~]# rabbitmqctl start_app
Starting node rabbit@controller02
Error: unable to connect to node rabbit@controller02: nodedown

DIAGNOSTICS
===========

attempted to contact: [rabbit@controller02]

rabbit@controller02:
  * connected to epmd (port 4369) on controller02
  * epmd reports: node 'rabbit' not running at all
                  no other nodes on controller02
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-85@controller02'
- home dir: /var/lib/rabbitmq
- cookie hash: V+zquSQpuK8W6GX64HNaIQ==

[root@controller03 ~]# rabbitmqctl start_app
Starting node rabbit@controller03

欸!2号节点叛逆,有不对劲的输出

先等等,查看集群状态判断是不是只有2号节点不正常

[root@controller01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@controller01
[{nodes,[{disc,[rabbit@controller01]},
         {ram,[rabbit@controller03,rabbit@controller02]}]},
 {running_nodes,[rabbit@controller03,rabbit@controller01]},
 {cluster_name,<<"rabbit@controller01">>},
 {partitions,[]},
 {alarms,[{rabbit@controller03,[]},{rabbit@controller01,[]}]}]

还真是,这时3号节点也加入到集群运行中来了,之前是只有1号节点的

回过去看刚才2号节点的输出,说rabbit@controller02节点没有启动,错误建议:需要启动该节点

先查看服务状态

[root@controller02 ~]# systemctl status rabbitmq-server
● rabbitmq-server.service - RabbitMQ broker
   Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2025-02-13 01:47:46 EST; 1min 21s ago
  Process: 21800 ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server (code=exited, status=1/FAILURE)
 Main PID: 21800 (code=exited, status=1/FAILURE)
   Status: "Initialized"

Feb 13 01:47:43 controller02 rabbitmq-server[21800]: ##  ##
Feb 13 01:47:43 controller02 rabbitmq-server[21800]: ##########  Logs: /var/log/rabbitmq/rabbit@controller02.log
Feb 13 01:47:43 controller02 rabbitmq-server[21800]: ######  ##        /var/log/rabbitmq/rabbit@controller02-sasl.log
Feb 13 01:47:43 controller02 rabbitmq-server[21800]: ##########
Feb 13 01:47:43 controller02 rabbitmq-server[21800]: Starting broker...
Feb 13 01:47:45 controller02 rabbitmq-server[21800]: {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{{failed_to_cluster_with,[rabbit@controller01,rabbit@...rmal,[]]}}}"}
Feb 13 01:47:45 controller02 rabbitmq-server[21800]: Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{{failed_to_cluster_with,[rabbit@controller01,rabbit@c...."},{rabbit,s
Feb 13 01:47:46 controller02 systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
Feb 13 01:47:46 controller02 systemd[1]: Unit rabbitmq-server.service entered failed state.
Feb 13 01:47:46 controller02 systemd[1]: rabbitmq-server.service failed.
Hint: Some lines were ellipsized, use -l to show in full.

那我们就给他重启一下

[root@controller02 ~]# systemctl restart rabbitmq-server

再查看集群状态

[root@controller01 ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@controller01
[{nodes,[{disc,[rabbit@controller01]},
         {ram,[rabbit@controller03,rabbit@controller02]}]},
 {running_nodes,[rabbit@controller02,rabbit@controller03,rabbit@controller01]},
 {cluster_name,<<"rabbit@controller01">>},
 {partitions,[]},
 {alarms,[{rabbit@controller02,[]},
          {rabbit@controller03,[]},
          {rabbit@controller01,[]}]}]

好!修好咯,其他靠RabbitMQ进行消息传递的服务也正常不报错了

 

风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。