本文最后更新于 319 天前,其中的信息可能已经过时,如有错误请发送邮件到wuxianglongblog@163.com
MySQL的MHA高可用架构部署实战案例
一.Master High Availability(简称"MHA")概述
1.什么是MHA
我们知道传统的异步主从复制最擅长的是为我们解决物理损坏(比如服务器宕机,磁盘损坏,磁盘坏道,文件被误删除等)。
Master High Availability(简称"MHA")目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司的youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。
在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。
MHA Manager:
可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。
MHA Node:
运行在每台MySQL服务器上。
MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。
使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。
2.MHA软件的角色组件概述
MHA Manager:(管理软件)
masterha_manager:
启动MHA。
masterha_check_ssh:
检查MHA的SSH配置状况。
masterha_check_repl:
检查主从的状态。
masterha_master_monitor:
检查master是否宕机。
masterha_check_status:
检查当前MHA运行状态。
masterha_master_switch:
控制故障转移。
masterha_conf_host:
添加或删除配置的server信息。
MHA Node:(被管理端软件,这些工具通常由MHA Manager的脚本触发,无需人为操作,了解即可)
save_binary_logs:
保存和复制master的二进制日志。
apply_diff_relay_logs:
识别差异的中继日志事件并将其差异的事件应用于其它的。
purge_relay_logs:
清除中继日志(不会阻塞SQL线程)。
3.下载MHA软件
需要"科学上网":(版本最低,最高版本是0.55)
https://code.google.com/archive/p/mysql-master-ha/
GitHub下载地址:(版本较低,属于0.56版本)
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
推荐下载0.58的版本github地址:(针对于MySQL 8.0建议使用该版本)
https://github.com/yoshinorim/mha4mysql-manager/releases
温馨提示:
(1)尽管我们实验使用的是MySQL 5.7版本,但实验我们还是可以基于MHA 0.56版本来安装哟,该版本在生产环境中已经广泛使用,因此无需担心其稳定性;
(2)如果使用MySQL 8.0版本请修改其默认的用户密码认证策略,改用Native的密码策略,推荐使用的MHA版本为0.58及以上版本;
(3)截至2021年2月17日,我能下载MHA最新的版本为"mha4mysql-manager-0.58-0.el7.centos.noarch.rpm",我已经放在软件包了;
3.MHA工作原理
高可用:
最擅长的是为我们解决物理损坏。
启动MHA Manager:
调用"masterha_manager"脚本启动MHA manager程序。
监控:
通过"masterha_master_monitor"心跳检测脚本监控所有数据库节点,尤其是监控主库。
默认探测4次,每次的探测间隔时间可以使用"ping_interval"来指定哟~如果"ping_interval=2",则表示间隔时间为"4 x 2 = 8"秒,在8秒内主库依旧没有响应,就认为主库宕机了,就得进入故障转移(Fail over)过程。
MHA故障转移(Fail over)流程如下所示:
(1)主库宕机后,故障转移(Fail over)过程从库选主策略:
1)可以主观的设置优先级,只需要在节点加入"candidate_master=1"参数,需要注意的是,如果备选主日志量落后master太多(默认中继日志为100MB),也不会被选择为新主,但可以通过"check_repl_delay=0",表示不检查日志落后的情景。
2)根据日志量最接近主库的数据;
3)日志量也一样的话,就得看文件的顺序;
(2)日志补偿阶段:
1)主库ssh能连上(说明只是MySQL服务挂掉,操作系统并没有挂掉),立即保存缺失部分的二进制日志到各个从库(默认是"/var/tmp"目录)并恢复;
2)主库ssh不能连上,两个从库的中继日志进行对比(apply_diff_relay_logs)自动进行差异补偿;
(3)主从身份切换:
所有从库取消和原有主库的复制关系(会自动执行"STOP SLAVE; RESET SLAVE ALL;"命令),而后新竹库和剩余从库重新构建主从关系。
(4)故障库自动被剔除集群:
通过"masterha_conf_host"工具将故障点的配置信息从配置文件中移除。
(5)MHA是一次性的高可用:
当上述故障转移完成后,MHA Manager会自动退出。
综上所述,是MHA的基础环境所有具备的功能,不得不说MHA是一个很优秀的开源组件,但该组件也有一些不足之处,如下所示:
(1)应用透明:
所谓的应用透明指的是尽管后端数据库主库发生故障转移,应用程序无需修改IP地址,这样应用程序是感知不到后端存储是否损坏(唯一的感知可能就是有短暂时间连接不到后端数据库)。
MHA组件已经实现了应用透明vip功能,但只能同机房使用,无法跨机房或者跨网络。如果有必要,可以借助类似于keepalive工具实现vip功能。
(2)"日志补偿阶段"存在丢失风险:
在上面提到的"日志补偿阶段"是有可能存在数据丢失的风险。我们可以借助于MySQL支持的半同步复制策略来尽可能的避免主库数据不一致。
但我们之前已经分析过"半同步复制"策略如果从库超时则会自动切换到传统异步复制策略,因此MHA并没有采取半同步复制,而是使用了"binlog server"组件来实时备份主库的二进制日志文件。
(3)告警功能:
当主从发生故障后,无法通知运维人员,因此需要我们借助一些工具来实现告警功能。
(4)故障自愈:(待二次开发)
当集群出现故障后,无法自己定位问题,并修复问题,而是将故障节点踢出集群,需要运维人员手动修复。这样需要小伙伴二次开发MHA,也可以用现有的解决方案,如:"MHA + K8S + Operator","MGR + mysqlsh"等架构实现方式。
二.MHA基础架构规划和实施
1.集群规划
mysql201.oldboyedu.com:
主库,MHA Node
mysql202.oldboyedu.com:
从库,MHA Node
mysql203.oldboyedu.com:
从库,MHA Node,MHA Manager
2.准备环境(基于GTID搭建的一主两从架构)
基于GTID部署一主两从架构,此步骤我在上一章节已经做过了,此处省略。
如下所示,我们在主库(mysql201.oldboyedu.com)可以查看从库的链接信息:
[root@mysql201.oldboyedu.com ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql201 [(none)]>
mysql201 [(none)]>SELECT @@server_id;
+-------------+
| @@server_id |
+-------------+
| 201 |
+-------------+
1 row in set (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>SHOW SLAVE HOSTS;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------+------+-----------+--------------------------------------+
| 202 | | 3306 | 201 | 4ffdeb3d-70c2-11eb-a18a-000c2958d75b |
| 203 | | 3306 | 201 | 332ccee4-50fb-11eb-b153-000c29820c67 |
+-----------+------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>SHOW PROCESSLIST;
+----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
| 7 | copy | 172.200.1.202:55872 | NULL | Binlog Dump GTID | 29417 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 8 | copy | 172.200.1.203:58786 | NULL | Binlog Dump GTID | 29301 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 19 | root | localhost | NULL | Query | 0 | starting | SHOW PROCESSLIST |
+----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
3 rows in set (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>QUIT
Bye
[root@mysql201.oldboyedu.com ~]#
如下所示,我们在从库(mysql202.oldboyedu.com)可以查看从库的链接信息:
[root@mysql202.oldboyedu.com ~]# mysql -e "SHOW SLAVE STATUS\G" | grep Running:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
[root@mysql202.oldboyedu.com ~]#
如下所示,我们在从库(mysql203.oldboyedu.com)可以查看从库的链接信息:
[root@mysql203.oldboyedu.com ~]# mysql -uroot -p -e "SHOW SLAVE STATUS\G" | grep Running:
Enter password:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
[root@mysql203.oldboyedu.com ~]#
3.配置关键程序软连接(所有节点都需操作,我只演示了在"mysql203.oldboyedu.com"节点的操作)
[root@mysql203.oldboyedu.com ~]# which mysqlbinlog
/oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# ln -sv /oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
"/usr/bin/mysqlbinlog" -> "/oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog"
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# which mysql
/oldboyedu/softwares/mysql/mysql/bin/mysql
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# ln -sv /oldboyedu/softwares/mysql/mysql/bin/mysql /usr/bin/mysql
"/usr/bin/mysql" -> "/oldboyedu/softwares/mysql/mysql/bin/mysql"
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# ll /usr/bin/mysql*
lrwxrwxrwx 1 root root 44 2月 17 18:26 /usr/bin/mysql -> /oldboyedu/softwares/mysql/mysql/bin/mysql
lrwxrwxrwx 1 root root 50 2月 17 18:26 /usr/bin/mysqlbinlog -> /oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog
[root@mysql203.oldboyedu.com ~]#
4.各个节点配置互信(我只用"mysql201.oldboyedu.com"节点作为演示)
(1)mysql201.oldboyedu.com节点生成密钥
[root@mysql201.oldboyedu.com ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:J058Cm5zBjUGXOU/fqsRTDRqhuR5MSs3VWNAHx3ETsA root@mysql201.oldboyedu.com
The key's randomart image is:
+---[RSA 2048]----+
| ...o.+.B+O+.|
| .+ + B E =.|
| O X . + |
| + B = . |
| o S o = |
| . = = . o |
| + = o . |
| . + o . |
| ... |
+----[SHA256]-----+
[root@mysql201.oldboyedu.com ~]#
(2)mysql201.oldboyedu.com与mysql201.oldboyedu.com节点建立互信,即需要和当前主机进行通信
[root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.201
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '172.200.1.201 (172.200.1.201)' can't be established.
ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@172.200.1.201's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@172.200.1.201'"
and check to make sure that only the key(s) you wanted were added.
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# who
root pts/2 2021-02-17 17:53 (172.200.1.19)
root pts/3 2021-02-17 18:16 (172.200.1.19)
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# ssh 172.200.1.201
Last login: Wed Feb 17 18:16:25 2021 from 172.200.1.19
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# who
root pts/0 2021-02-17 20:55 (172.200.1.201)
root pts/2 2021-02-17 17:53 (172.200.1.19)
root pts/3 2021-02-17 18:16 (172.200.1.19)
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# exit
登出
Connection to 172.200.1.201 closed.
[root@mysql201.oldboyedu.com ~]#
(3)mysql201.oldboyedu.com与mysql202.oldboyedu.com节点建立互信
[root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.202
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '172.200.1.202 (172.200.1.202)' can't be established.
ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@172.200.1.202's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@172.200.1.202'"
and check to make sure that only the key(s) you wanted were added.
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# ssh 172.200.1.202
Last login: Wed Feb 17 18:17:11 2021 from 172.200.1.19
[root@mysql202.oldboyedu.com ~]#
[root@mysql202.oldboyedu.com ~]# who
root pts/0 2021-02-17 08:05 (172.200.1.19)
root pts/1 2021-02-17 09:19 (172.200.1.19)
root pts/2 2021-02-17 18:17 (172.200.1.19)
root pts/3 2021-02-17 18:32 (172.200.1.201)
[root@mysql202.oldboyedu.com ~]#
[root@mysql202.oldboyedu.com ~]# exit
登出
Connection to 172.200.1.202 closed.
[root@mysql201.oldboyedu.com ~]#
(4)mysql201.oldboyedu.com与mysql203.oldboyedu.com节点建立互信
[root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.203
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '172.200.1.203 (172.200.1.203)' can't be established.
ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@172.200.1.203's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@172.200.1.203'"
and check to make sure that only the key(s) you wanted were added.
[root@mysql201.oldboyedu.com ~]#
[root@mysql201.oldboyedu.com ~]# ssh 172.200.1.203
Last login: Wed Feb 17 18:17:36 2021 from 172.200.1.19
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# who
root pts/0 2021-02-17 08:05 (172.200.1.19)
root pts/1 2021-02-17 08:39 (172.200.1.19)
root pts/2 2021-02-17 18:17 (172.200.1.19)
root pts/3 2021-02-17 18:32 (172.200.1.201)
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# exit
登出
Connection to 172.200.1.203 closed.
[root@mysql201.oldboyedu.com ~]#
温馨提示:
MHA高可用节点需要配置互信,可能很多小伙伴会觉得这很不安全,从某种角度来讲,这种担心有点杞人忧天。如果你的Linux系统被破译了,实际上就没有安全之说了。而我们还是比较相信Linux操作系统的安全性的,不会被轻易攻破。
5.所有节点安装MHA Node软件包(我只用"mysql201.oldboyedu.com"节点作为演示)
安装依赖包:
[root@mysql201.oldboyedu.com ~]# yum -y install perl-DBD-MySQL
安装MHA Node软件包:(尽管是CentOS 6的软件包,但是在CentOS 7依旧是可以正常使用哟~)
[root@mysql201.oldboyedu.com ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-node-0.56-0.el6 ################################# [100%]
[root@mysql201.oldboyedu.com ~]#
6.在主库创建mha需要的用户
主库创建用户:
[root@mysql201.oldboyedu.com ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql201 [(none)]>
mysql201 [(none)]>CREATE USER mha@'172.200.1.%' IDENTIFIED BY 'oldboyedu';
Query OK, 0 rows affected (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>GRANT ALL PRIVILEGES ON *.* TO mha@'172.200.1.%';
Query OK, 0 rows affected (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>SELECT user,host FROM mysql.user;
+---------------+-------------+
| user | host |
+---------------+-------------+
| copy | 172.200.1.% |
| mha | 172.200.1.% |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+---------------+-------------+
5 rows in set (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>SHOW GRANTS FOR mha@'172.200.1.%';
+----------------------------------------------------+
| Grants for mha@172.200.1.% |
+----------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'mha'@'172.200.1.%' |
+----------------------------------------------------+
1 row in set (0.00 sec)
mysql201 [(none)]>
mysql201 [(none)]>QUIT
Bye
[root@mysql201.oldboyedu.com ~]#
"mysql202.oldboyedu.com"从库会同步主库创建的用户:
[root@mysql202.oldboyedu.com ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql202 [(none)]>
mysql202 [(none)]>SELECT user,host FROM mysql.user;
+---------------+-------------+
| user | host |
+---------------+-------------+
| copy | 172.200.1.% |
| mha | 172.200.1.% |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+---------------+-------------+
5 rows in set (0.00 sec)
mysql202 [(none)]>
mysql202 [(none)]>QUIT
Bye
[root@mysql202.oldboyedu.com ~]#
"mysql203.oldboyedu.com"从库会同步主库创建的用户:
[root@mysql203.oldboyedu.com ~]# mysql -u root -p -e "SELECT user,host FROM mysql.user;"
Enter password:
+---------------+---------------------------+
| user | host |
+---------------+---------------------------+
| copy | 172.200.1.% |
| mha | 172.200.1.% |
| oldboyedu | 172.200.1.% |
| hive | 172.200.3.% |
| admin | docker201.oldboyedu.com |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+---------------+---------------------------+
[root@mysql203.oldboyedu.com ~]#
7.在"mysql203.oldboyedu.com"节点安装"MHA Manager"软件
(1)先安装依赖包
[root@mysql203.oldboyedu.com ~]# yum -y install perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes
(2)安装"MHA Manager"软件
[root@mysql203.oldboyedu.com ~]# ll
总用量 88
-rw-r--r-- 1 root root 87119 2月 17 20:26 mha4mysql-manager-0.56-0.el6.noarch.rpm
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-manager-0.56-0.el6 ################################# [100%]
[root@mysql203.oldboyedu.com ~]#
温馨提示:
如果提示"perl-Log-Dispatch,perl-Parallel-ForkManager"没有安装,请在多执行几次,最好清空本地缓存哈。
8."MHA Manager"配置文件准备
(1)创建配置文件目录:
[root@mysql203.oldboyedu.com ~]# mkdir -pv /oldboyedu/softwares/mha/etc/
mkdir: 已创建目录 "/oldboyedu/softwares/mha"
mkdir: 已创建目录 "/oldboyedu/softwares/mha/etc/"
[root@mysql203.oldboyedu.com ~]#
(2)创建日志目录:
[root@mysql203.oldboyedu.com ~]# mkdir -pv /oldboyedu/logs/mha/app1
mkdir: 已创建目录 "/oldboyedu/logs/mha"
mkdir: 已创建目录 "/oldboyedu/logs/mha/app1"
[root@mysql203.oldboyedu.com ~]#
(3)编辑MHA配置文件
[root@mysql203.oldboyedu.com ~]# cat > /oldboyedu/softwares/mha/etc/app1.cnf <<EOF
> [server default]
> manager_log=/oldboyedu/logs/mha/app1/manager
> manager_workdir=/oldboyedu/logs/mha/app1
> master_binlog_dir=/oldboyedu/logs/mysql/binlog
> user=mha
> password=oldboyedu
> ping_interval=2
> repl_user=copy
> repl_password=oldboyedu
> ssh_user=root
>
> [server1]
> hostname=172.200.1.201
> port=3306
>
> [server2]
> hostname=172.200.1.202
> port=3306
>
> [server3]
> hostname=172.200.1.203
> port=3306
> EOF
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# cat /oldboyedu/softwares/mha/etc/app1.cnf
[server default]
manager_log=/oldboyedu/logs/mha/app1/manager
manager_workdir=/oldboyedu/logs/mha/app1
master_binlog_dir=/oldboyedu/logs/mysql/binlog
user=mha
password=oldboyedu
ping_interval=2
repl_user=copy
repl_password=oldboyedu
ssh_user=root
[server1]
hostname=172.200.1.201
port=3306
[server2]
hostname=172.200.1.202
port=3306
[server3]
hostname=172.200.1.203
port=3306
[root@mysql203.oldboyedu.com ~]#
MHA配置文件说明如下:
[server default]: 服务器默认的配置文件,适用于所有节点的配置。
manager_log: 指定日志目录位置点。
manager_workdir: 指定数据工作目录。
master_binlog_dir: 指定主库的binlog日志位置点。
user: 指定MHA工作的用户名,已经在所有MySQL实例创建。
password: 指定MHA工作的密码,已经在所有MySQL实例创建。
ping_interval: 指定ping的间隔时间。
repl_user:主从复制专用的用户名,已经在主库MySQL实例创建。
repl_password:主从复制专用的密码,已经在主库MySQL实例创建。
ssh_user: 配置各节点互信的用户名。
[server1]: server1节点信息
hostname: 指定MySQL实例主机信息。
port: 指定MySQL实例主机对应的端口。
温馨提示:(直接将下面的命令粘贴到命令行就可以完成配置文件)
cat > /oldboyedu/softwares/mha/etc/app1.cnf <<EOF
[server default]
manager_log=/oldboyedu/logs/mha/app1/manager
manager_workdir=/oldboyedu/logs/mha/app1
master_binlog_dir=/oldboyedu/logs/mysql/binlog
user=mha
password=oldboyedu
ping_interval=2
repl_user=copy
repl_password=oldboyedu
ssh_user=root
[server1]
hostname=172.200.1.201
port=3306
[server2]
hostname=172.200.1.202
port=3306
[server3]
hostname=172.200.1.203
port=3306
EOF
9."MHA Manager"检查"MHA Node"的状态
(1)检查各个节点的互信状态,如下所示,注意观察是否有"error"信息,如果有需要解决后再执行之后的操作,下面3个节点有6个"ok"说明是正常状态。
[root@mysql203.oldboyedu.com ~]# masterha_check_ssh --conf=/oldboyedu/softwares/mha/etc/app1.cnf
Wed Feb 17 20:54:18 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 17 20:54:18 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 20:54:18 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 20:54:18 2021 - [info] Starting SSH connection tests..
Wed Feb 17 20:54:20 2021 - [debug]
Wed Feb 17 20:54:19 2021 - [debug] Connecting via SSH from root@172.200.1.203(172.200.1.203:22) to root@172.200.1.201(172.200.1.201:22)..
Wed Feb 17 20:54:19 2021 - [debug] ok.
Wed Feb 17 20:54:19 2021 - [debug] Connecting via SSH from root@172.200.1.203(172.200.1.203:22) to root@172.200.1.202(172.200.1.202:22)..
Wed Feb 17 20:54:20 2021 - [debug] ok.
Wed Feb 17 20:54:20 2021 - [debug]
Wed Feb 17 20:54:18 2021 - [debug] Connecting via SSH from root@172.200.1.201(172.200.1.201:22) to root@172.200.1.202(172.200.1.202:22)..
Wed Feb 17 20:54:18 2021 - [debug] ok.
Wed Feb 17 20:54:18 2021 - [debug] Connecting via SSH from root@172.200.1.201(172.200.1.201:22) to root@172.200.1.203(172.200.1.203:22)..
Wed Feb 17 20:54:20 2021 - [debug] ok.
Wed Feb 17 20:54:20 2021 - [debug]
Wed Feb 17 20:54:18 2021 - [debug] Connecting via SSH from root@172.200.1.202(172.200.1.202:22) to root@172.200.1.201(172.200.1.201:22)..
Wed Feb 17 20:54:19 2021 - [debug] ok.
Wed Feb 17 20:54:19 2021 - [debug] Connecting via SSH from root@172.200.1.202(172.200.1.202:22) to root@172.200.1.203(172.200.1.203:22)..
Wed Feb 17 20:54:20 2021 - [debug] ok.
Wed Feb 17 20:54:20 2021 - [info] All SSH connection tests passed successfully.
[root@mysql203.oldboyedu.com ~]#
(2)检查主从的状态,如下所示,注意观察是否有"error"信息,如果有需要解决后再执行之后的操作,下面3个节点都是"Alive Servers",其中"Alive Slaves","Dead Servers"都可以一目了然的看清楚,当然还有一些参数监控等。
[root@mysql203.oldboyedu.com ~]# masterha_check_repl --conf=/oldboyedu/softwares/mha/etc/app1.cnf
Wed Feb 17 21:01:26 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 17 21:01:26 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 21:01:26 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 21:01:26 2021 - [info] MHA::MasterMonitor version 0.56.
Wed Feb 17 21:01:27 2021 - [info] GTID failover mode = 1
Wed Feb 17 21:01:27 2021 - [info] Dead Servers:
Wed Feb 17 21:01:27 2021 - [info] Alive Servers:
Wed Feb 17 21:01:27 2021 - [info] 172.200.1.201(172.200.1.201:3306)
Wed Feb 17 21:01:27 2021 - [info] 172.200.1.202(172.200.1.202:3306)
Wed Feb 17 21:01:27 2021 - [info] 172.200.1.203(172.200.1.203:3306)
Wed Feb 17 21:01:27 2021 - [info] Alive Slaves:
Wed Feb 17 21:01:27 2021 - [info] 172.200.1.202(172.200.1.202:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Feb 17 21:01:27 2021 - [info] GTID ON
Wed Feb 17 21:01:27 2021 - [info] Replicating from 172.200.1.201(172.200.1.201:3306)
Wed Feb 17 21:01:27 2021 - [info] 172.200.1.203(172.200.1.203:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Feb 17 21:01:27 2021 - [info] GTID ON
Wed Feb 17 21:01:27 2021 - [info] Replicating from 172.200.1.201(172.200.1.201:3306)
Wed Feb 17 21:01:27 2021 - [info] Current Alive Master: 172.200.1.201(172.200.1.201:3306)
Wed Feb 17 21:01:27 2021 - [info] Checking slave configurations..
Wed Feb 17 21:01:27 2021 - [info] read_only=1 is not set on slave 172.200.1.202(172.200.1.202:3306).
Wed Feb 17 21:01:27 2021 - [info] read_only=1 is not set on slave 172.200.1.203(172.200.1.203:3306).
Wed Feb 17 21:01:27 2021 - [info] Checking replication filtering settings..
Wed Feb 17 21:01:27 2021 - [info] binlog_do_db= , binlog_ignore_db=
Wed Feb 17 21:01:27 2021 - [info] Replication filtering check ok.
Wed Feb 17 21:01:27 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Feb 17 21:01:27 2021 - [info] Checking SSH publickey authentication settings on the current master..
Wed Feb 17 21:01:27 2021 - [info] HealthCheck: SSH to 172.200.1.201 is reachable.
Wed Feb 17 21:01:27 2021 - [info]
172.200.1.201(172.200.1.201:3306) (current master)
+--172.200.1.202(172.200.1.202:3306)
+--172.200.1.203(172.200.1.203:3306)
Wed Feb 17 21:01:27 2021 - [info] Checking replication health on 172.200.1.202..
Wed Feb 17 21:01:27 2021 - [info] ok.
Wed Feb 17 21:01:27 2021 - [info] Checking replication health on 172.200.1.203..
Wed Feb 17 21:01:27 2021 - [info] ok.
Wed Feb 17 21:01:27 2021 - [warning] master_ip_failover_script is not defined.
Wed Feb 17 21:01:27 2021 - [warning] shutdown_script is not defined.
Wed Feb 17 21:01:27 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@mysql203.oldboyedu.com ~]#
温馨提示:
如果出现"error"字样的报错信息先不要慌,可能是网络波动导致,可以再次尝试运行脚本观察结果。
10.启动"MHA Manager"
[root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf # 未启动服务之前,检查MHA状态
app1 is stopped(2:NOT_RUNNING).
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# nohup masterha_manager --conf=/oldboyedu/softwares/mha/etc/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /oldboyedu/logs/mha/app1/manager.log 2>&1 &
[1] 34776
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# ll /oldboyedu/logs/mha/app1/manager.log
-rw-r--r-- 1 root root 343 2月 17 21:11 /oldboyedu/logs/mha/app1/manager.log
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# cat /oldboyedu/logs/mha/app1/manager.log # 启动MHA后,注意查看日志内容哟~
Wed Feb 17 21:11:57 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 17 21:11:57 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 21:11:57 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
[root@mysql203.oldboyedu.com ~]#
[root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf # 启动服务之后,检查MHA状态
app1 (pid:34776) is running(0:PING_OK), master:172.200.1.201
[root@mysql203.oldboyedu.com ~]#
11.重启MHA管理脚本
停止MHA:
[root@mysql203.oldboyedu.com ~]# masterha_stop --conf=/oldboyedu/softwares/mha/etc/app1.cnf
启动MHA:
[root@mysql203.oldboyedu.com ~]# nohup masterha_manager --conf=/oldboyedu/softwares/mha/etc/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /oldboyedu/logs/mha/app1/manager.log 2>&1 &
检查MHA状态:
[root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf
温馨提示:
如果报错了就查看"/oldboyedu/logs/mha/app1/manager"日志信息。当然,该日志信息由"app1.cnf"配置文件"manager_log"指定的哟~
三.可能遇到的错误
1. Error happened on monitoring servers
错误原因:
通常情况下有两种常见的原因,要么是配置文件写错了,要么是网络不通。
解决方案:
查看配置文件是否配置正确,检查防火墙是否关闭。
2.Access denied for user 'mha'@'mysql106.oldboyedu.com'
错误原因:
检查主从状态的mha用户无权限。
解决方案:
配置mha相关用户。
温馨提示:
在启动MHA之前,建议使用"masterha_check_repl --conf=/oldboyedu/softwares/mha/etc/app1.cnf"去查看主从状态。