021、MySQL的MHA高可用架构部署实战案例
本文最后更新于 257 天前,其中的信息可能已经过时,如有错误请发送邮件到wuxianglongblog@163.com

MySQL的MHA高可用架构部署实战案例

一.Master High Availability(简称"MHA")概述

1.什么是MHA

    我们知道传统的异步主从复制最擅长的是为我们解决物理损坏(比如服务器宕机,磁盘损坏,磁盘坏道,文件被误删除等)。

    Master High Availability(简称"MHA")目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司的youshimaton(现就职于Facebook公司)开发,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。

    在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

    该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。
        MHA Manager:
            可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。
        MHA Node:
            运行在每台MySQL服务器上。

    MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。

    在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。

    使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。

    目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。

2.MHA软件的角色组件概述

    MHA Manager:(管理软件)
        masterha_manager:
            启动MHA。
        masterha_check_ssh:
            检查MHA的SSH配置状况。
        masterha_check_repl:
            检查主从的状态。
        masterha_master_monitor:
            检查master是否宕机。
        masterha_check_status:
            检查当前MHA运行状态。
        masterha_master_switch:
            控制故障转移。
        masterha_conf_host:
            添加或删除配置的server信息。

    MHA Node:(被管理端软件,这些工具通常由MHA Manager的脚本触发,无需人为操作,了解即可)
        save_binary_logs:
            保存和复制master的二进制日志。
        apply_diff_relay_logs:
            识别差异的中继日志事件并将其差异的事件应用于其它的。
        purge_relay_logs:
            清除中继日志(不会阻塞SQL线程)。

3.下载MHA软件

    需要"科学上网":(版本最低,最高版本是0.55)
        https://code.google.com/archive/p/mysql-master-ha/

    GitHub下载地址:(版本较低,属于0.56版本)
        https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

    推荐下载0.58的版本github地址:(针对于MySQL 8.0建议使用该版本)
        https://github.com/yoshinorim/mha4mysql-manager/releases

    温馨提示:
        (1)尽管我们实验使用的是MySQL 5.7版本,但实验我们还是可以基于MHA 0.56版本来安装哟,该版本在生产环境中已经广泛使用,因此无需担心其稳定性;
        (2)如果使用MySQL 8.0版本请修改其默认的用户密码认证策略,改用Native的密码策略,推荐使用的MHA版本为0.58及以上版本;
        (3)截至2021年2月17日,我能下载MHA最新的版本为"mha4mysql-manager-0.58-0.el7.centos.noarch.rpm",我已经放在软件包了;

3.MHA工作原理

    高可用:
        最擅长的是为我们解决物理损坏。

    启动MHA Manager:
        调用"masterha_manager"脚本启动MHA manager程序。

    监控:
        通过"masterha_master_monitor"心跳检测脚本监控所有数据库节点,尤其是监控主库。
        默认探测4次,每次的探测间隔时间可以使用"ping_interval"来指定哟~如果"ping_interval=2",则表示间隔时间为"4 x 2 = 8"秒,在8秒内主库依旧没有响应,就认为主库宕机了,就得进入故障转移(Fail over)过程。

    MHA故障转移(Fail over)流程如下所示:
        (1)主库宕机后,故障转移(Fail over)过程从库选主策略:
            1)可以主观的设置优先级,只需要在节点加入"candidate_master=1"参数,需要注意的是,如果备选主日志量落后master太多(默认中继日志为100MB),也不会被选择为新主,但可以通过"check_repl_delay=0",表示不检查日志落后的情景。
            2)根据日志量最接近主库的数据;
            3)日志量也一样的话,就得看文件的顺序;

        (2)日志补偿阶段:
            1)主库ssh能连上(说明只是MySQL服务挂掉,操作系统并没有挂掉),立即保存缺失部分的二进制日志到各个从库(默认是"/var/tmp"目录)并恢复;
            2)主库ssh不能连上,两个从库的中继日志进行对比(apply_diff_relay_logs)自动进行差异补偿;

        (3)主从身份切换:
            所有从库取消和原有主库的复制关系(会自动执行"STOP SLAVE; RESET SLAVE ALL;"命令),而后新竹库和剩余从库重新构建主从关系。

        (4)故障库自动被剔除集群:
            通过"masterha_conf_host"工具将故障点的配置信息从配置文件中移除。

        (5)MHA是一次性的高可用:
            当上述故障转移完成后,MHA Manager会自动退出。

    综上所述,是MHA的基础环境所有具备的功能,不得不说MHA是一个很优秀的开源组件,但该组件也有一些不足之处,如下所示:
        (1)应用透明:
            所谓的应用透明指的是尽管后端数据库主库发生故障转移,应用程序无需修改IP地址,这样应用程序是感知不到后端存储是否损坏(唯一的感知可能就是有短暂时间连接不到后端数据库)。
            MHA组件已经实现了应用透明vip功能,但只能同机房使用,无法跨机房或者跨网络。如果有必要,可以借助类似于keepalive工具实现vip功能。
        (2)"日志补偿阶段"存在丢失风险:
            在上面提到的"日志补偿阶段"是有可能存在数据丢失的风险。我们可以借助于MySQL支持的半同步复制策略来尽可能的避免主库数据不一致。
            但我们之前已经分析过"半同步复制"策略如果从库超时则会自动切换到传统异步复制策略,因此MHA并没有采取半同步复制,而是使用了"binlog server"组件来实时备份主库的二进制日志文件。
        (3)告警功能:
            当主从发生故障后,无法通知运维人员,因此需要我们借助一些工具来实现告警功能。
        (4)故障自愈:(待二次开发)
            当集群出现故障后,无法自己定位问题,并修复问题,而是将故障节点踢出集群,需要运维人员手动修复。这样需要小伙伴二次开发MHA,也可以用现有的解决方案,如:"MHA + K8S + Operator","MGR + mysqlsh"等架构实现方式。

image-20210727094030332

二.MHA基础架构规划和实施

1.集群规划

    mysql201.oldboyedu.com:
        主库,MHA Node
    mysql202.oldboyedu.com:
        从库,MHA Node
    mysql203.oldboyedu.com:
        从库,MHA Node,MHA Manager

2.准备环境(基于GTID搭建的一主两从架构)

    基于GTID部署一主两从架构,此步骤我在上一章节已经做过了,此处省略。

    如下所示,我们在主库(mysql201.oldboyedu.com)可以查看从库的链接信息:
        [root@mysql201.oldboyedu.com ~]# mysql
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 18
        Server version: 5.7.31-log MySQL Community Server (GPL)

        Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql201 [(none)]>
        mysql201 [(none)]>SELECT @@server_id;
        +-------------+
        | @@server_id |
        +-------------+
        |         201 |
        +-------------+
        1 row in set (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>SHOW SLAVE HOSTS;
        +-----------+------+------+-----------+--------------------------------------+
        | Server_id | Host | Port | Master_id | Slave_UUID                           |
        +-----------+------+------+-----------+--------------------------------------+
        |       202 |      | 3306 |       201 | 4ffdeb3d-70c2-11eb-a18a-000c2958d75b |
        |       203 |      | 3306 |       201 | 332ccee4-50fb-11eb-b153-000c29820c67 |
        +-----------+------+------+-----------+--------------------------------------+
        2 rows in set (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>SHOW PROCESSLIST;
        +----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
        | Id | User | Host                | db   | Command          | Time  | State                                                         | Info             |
        +----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
        |  7 | copy | 172.200.1.202:55872 | NULL | Binlog Dump GTID | 29417 | Master has sent all binlog to slave; waiting for more updates | NULL             |
        |  8 | copy | 172.200.1.203:58786 | NULL | Binlog Dump GTID | 29301 | Master has sent all binlog to slave; waiting for more updates | NULL             |
        | 19 | root | localhost           | NULL | Query            |     0 | starting                                                      | SHOW PROCESSLIST |
        +----+------+---------------------+------+------------------+-------+---------------------------------------------------------------+------------------+
        3 rows in set (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>QUIT
        Bye
        [root@mysql201.oldboyedu.com ~]# 

    如下所示,我们在从库(mysql202.oldboyedu.com)可以查看从库的链接信息:
        [root@mysql202.oldboyedu.com ~]# mysql -e "SHOW SLAVE STATUS\G" | grep Running:
                     Slave_IO_Running: Yes
                    Slave_SQL_Running: Yes
        [root@mysql202.oldboyedu.com ~]# 

    如下所示,我们在从库(mysql203.oldboyedu.com)可以查看从库的链接信息:
        [root@mysql203.oldboyedu.com ~]# mysql -uroot -p -e "SHOW SLAVE STATUS\G" | grep Running:
        Enter password: 
                     Slave_IO_Running: Yes
                    Slave_SQL_Running: Yes
        [root@mysql203.oldboyedu.com ~]# 

3.配置关键程序软连接(所有节点都需操作,我只演示了在"mysql203.oldboyedu.com"节点的操作)

[root@mysql203.oldboyedu.com ~]# which mysqlbinlog 
/oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# ln -sv /oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
"/usr/bin/mysqlbinlog" -> "/oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog"
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# which mysql
/oldboyedu/softwares/mysql/mysql/bin/mysql
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# ln -sv /oldboyedu/softwares/mysql/mysql/bin/mysql /usr/bin/mysql
"/usr/bin/mysql" -> "/oldboyedu/softwares/mysql/mysql/bin/mysql"
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# ll /usr/bin/mysql*
lrwxrwxrwx 1 root root 44 2月  17 18:26 /usr/bin/mysql -> /oldboyedu/softwares/mysql/mysql/bin/mysql
lrwxrwxrwx 1 root root 50 2月  17 18:26 /usr/bin/mysqlbinlog -> /oldboyedu/softwares/mysql/mysql/bin/mysqlbinlog
[root@mysql203.oldboyedu.com ~]# 

4.各个节点配置互信(我只用"mysql201.oldboyedu.com"节点作为演示)

    (1)mysql201.oldboyedu.com节点生成密钥
        [root@mysql201.oldboyedu.com ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
        Generating public/private rsa key pair.
        Your identification has been saved in /root/.ssh/id_rsa.
        Your public key has been saved in /root/.ssh/id_rsa.pub.
        The key fingerprint is:
        SHA256:J058Cm5zBjUGXOU/fqsRTDRqhuR5MSs3VWNAHx3ETsA root@mysql201.oldboyedu.com
        The key's randomart image is:
        +---[RSA 2048]----+
        |     ...o.+.B+O+.|
        |      .+ + B E =.|
        |        O X . +  |
        |       + B =   . |
        |      o S o =    |
        |     . = = . o   |
        |      + =   o .  |
        |     . +     o . |
        |            ...  |
        +----[SHA256]-----+
        [root@mysql201.oldboyedu.com ~]# 

    (2)mysql201.oldboyedu.com与mysql201.oldboyedu.com节点建立互信,即需要和当前主机进行通信
        [root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.201
        /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
        The authenticity of host '172.200.1.201 (172.200.1.201)' can't be established.
        ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
        ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
        Are you sure you want to continue connecting (yes/no)? yes
        /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
        /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
        root@172.200.1.201's password: 

        Number of key(s) added: 1

        Now try logging into the machine, with:   "ssh 'root@172.200.1.201'"
        and check to make sure that only the key(s) you wanted were added.

        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# who
        root     pts/2        2021-02-17 17:53 (172.200.1.19)
        root     pts/3        2021-02-17 18:16 (172.200.1.19)
        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# ssh 172.200.1.201
        Last login: Wed Feb 17 18:16:25 2021 from 172.200.1.19
        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# who
        root     pts/0        2021-02-17 20:55 (172.200.1.201)
        root     pts/2        2021-02-17 17:53 (172.200.1.19)
        root     pts/3        2021-02-17 18:16 (172.200.1.19)
        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# exit 
        登出
        Connection to 172.200.1.201 closed.
        [root@mysql201.oldboyedu.com ~]# 

    (3)mysql201.oldboyedu.com与mysql202.oldboyedu.com节点建立互信
        [root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.202
        /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
        The authenticity of host '172.200.1.202 (172.200.1.202)' can't be established.
        ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
        ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
        Are you sure you want to continue connecting (yes/no)? yes
        /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
        /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
        root@172.200.1.202's password: 

        Number of key(s) added: 1

        Now try logging into the machine, with:   "ssh 'root@172.200.1.202'"
        and check to make sure that only the key(s) you wanted were added.

        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# ssh 172.200.1.202
        Last login: Wed Feb 17 18:17:11 2021 from 172.200.1.19
        [root@mysql202.oldboyedu.com ~]# 
        [root@mysql202.oldboyedu.com ~]# who
        root     pts/0        2021-02-17 08:05 (172.200.1.19)
        root     pts/1        2021-02-17 09:19 (172.200.1.19)
        root     pts/2        2021-02-17 18:17 (172.200.1.19)
        root     pts/3        2021-02-17 18:32 (172.200.1.201)
        [root@mysql202.oldboyedu.com ~]# 
        [root@mysql202.oldboyedu.com ~]# exit 
        登出
        Connection to 172.200.1.202 closed.
        [root@mysql201.oldboyedu.com ~]# 

    (4)mysql201.oldboyedu.com与mysql203.oldboyedu.com节点建立互信
        [root@mysql201.oldboyedu.com ~]# ssh-copy-id root@172.200.1.203
        /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
        The authenticity of host '172.200.1.203 (172.200.1.203)' can't be established.
        ECDSA key fingerprint is SHA256:rOEM9ZBK1HJkhTVdewL0mkRWObE3ZprBZnaIabP2lZY.
        ECDSA key fingerprint is MD5:ff:b3:df:b3:6d:41:87:55:65:8c:e8:87:00:10:46:14.
        Are you sure you want to continue connecting (yes/no)? yes
        /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
        /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
        root@172.200.1.203's password: 

        Number of key(s) added: 1

        Now try logging into the machine, with:   "ssh 'root@172.200.1.203'"
        and check to make sure that only the key(s) you wanted were added.

        [root@mysql201.oldboyedu.com ~]# 
        [root@mysql201.oldboyedu.com ~]# ssh 172.200.1.203
        Last login: Wed Feb 17 18:17:36 2021 from 172.200.1.19
        [root@mysql203.oldboyedu.com ~]# 
        [root@mysql203.oldboyedu.com ~]# who
        root     pts/0        2021-02-17 08:05 (172.200.1.19)
        root     pts/1        2021-02-17 08:39 (172.200.1.19)
        root     pts/2        2021-02-17 18:17 (172.200.1.19)
        root     pts/3        2021-02-17 18:32 (172.200.1.201)
        [root@mysql203.oldboyedu.com ~]# 
        [root@mysql203.oldboyedu.com ~]# exit 
        登出
        Connection to 172.200.1.203 closed.
        [root@mysql201.oldboyedu.com ~]# 

    温馨提示:
        MHA高可用节点需要配置互信,可能很多小伙伴会觉得这很不安全,从某种角度来讲,这种担心有点杞人忧天。如果你的Linux系统被破译了,实际上就没有安全之说了。而我们还是比较相信Linux操作系统的安全性的,不会被轻易攻破。

5.所有节点安装MHA Node软件包(我只用"mysql201.oldboyedu.com"节点作为演示)

    安装依赖包:
        [root@mysql201.oldboyedu.com ~]# yum -y install perl-DBD-MySQL

    安装MHA Node软件包:(尽管是CentOS 6的软件包,但是在CentOS 7依旧是可以正常使用哟~)
        [root@mysql201.oldboyedu.com ~]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm 
        准备中...                          ################################# [100%]
        正在升级/安装...
           1:mha4mysql-node-0.56-0.el6        ################################# [100%]
        [root@mysql201.oldboyedu.com ~]# 

6.在主库创建mha需要的用户

    主库创建用户:
        [root@mysql201.oldboyedu.com ~]# mysql
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 20
        Server version: 5.7.31-log MySQL Community Server (GPL)

        Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql201 [(none)]>
        mysql201 [(none)]>CREATE USER mha@'172.200.1.%' IDENTIFIED BY 'oldboyedu';
        Query OK, 0 rows affected (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>GRANT ALL PRIVILEGES ON *.* TO mha@'172.200.1.%';
        Query OK, 0 rows affected (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>SELECT user,host FROM mysql.user;
        +---------------+-------------+
        | user          | host        |
        +---------------+-------------+
        | copy          | 172.200.1.% |
        | mha           | 172.200.1.% |
        | mysql.session | localhost   |
        | mysql.sys     | localhost   |
        | root          | localhost   |
        +---------------+-------------+
        5 rows in set (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>SHOW GRANTS FOR mha@'172.200.1.%';
        +----------------------------------------------------+
        | Grants for mha@172.200.1.%                         |
        +----------------------------------------------------+
        | GRANT ALL PRIVILEGES ON *.* TO 'mha'@'172.200.1.%' |
        +----------------------------------------------------+
        1 row in set (0.00 sec)

        mysql201 [(none)]>
        mysql201 [(none)]>QUIT
        Bye
        [root@mysql201.oldboyedu.com ~]# 

    "mysql202.oldboyedu.com"从库会同步主库创建的用户:
        [root@mysql202.oldboyedu.com ~]# mysql 
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 14
        Server version: 5.7.31-log MySQL Community Server (GPL)

        Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql202 [(none)]>
        mysql202 [(none)]>SELECT user,host FROM mysql.user;
        +---------------+-------------+
        | user          | host        |
        +---------------+-------------+
        | copy          | 172.200.1.% |
        | mha           | 172.200.1.% |
        | mysql.session | localhost   |
        | mysql.sys     | localhost   |
        | root          | localhost   |
        +---------------+-------------+
        5 rows in set (0.00 sec)

        mysql202 [(none)]>
        mysql202 [(none)]>QUIT
        Bye
        [root@mysql202.oldboyedu.com ~]# 

    "mysql203.oldboyedu.com"从库会同步主库创建的用户:
        [root@mysql203.oldboyedu.com ~]# mysql -u root -p -e "SELECT user,host FROM mysql.user;"
        Enter password: 
        +---------------+---------------------------+
        | user          | host                      |
        +---------------+---------------------------+
        | copy          | 172.200.1.%               |
        | mha           | 172.200.1.%               |
        | oldboyedu   | 172.200.1.%               |
        | hive          | 172.200.3.%               |
        | admin         | docker201.oldboyedu.com |
        | mysql.session | localhost                 |
        | mysql.sys     | localhost                 |
        | root          | localhost                 |
        +---------------+---------------------------+
        [root@mysql203.oldboyedu.com ~]# 

7.在"mysql203.oldboyedu.com"节点安装"MHA Manager"软件

    (1)先安装依赖包
        [root@mysql203.oldboyedu.com ~]# yum -y install perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

    (2)安装"MHA Manager"软件
        [root@mysql203.oldboyedu.com ~]# ll
        总用量 88
        -rw-r--r-- 1 root root 87119 2月  17 20:26 mha4mysql-manager-0.56-0.el6.noarch.rpm
        [root@mysql203.oldboyedu.com ~]# 
        [root@mysql203.oldboyedu.com ~]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpm 
        准备中...                          ################################# [100%]
        正在升级/安装...
           1:mha4mysql-manager-0.56-0.el6     ################################# [100%]
        [root@mysql203.oldboyedu.com ~]# 

    温馨提示:
        如果提示"perl-Log-Dispatch,perl-Parallel-ForkManager"没有安装,请在多执行几次,最好清空本地缓存哈。

8."MHA Manager"配置文件准备

    (1)创建配置文件目录:
        [root@mysql203.oldboyedu.com ~]# mkdir -pv /oldboyedu/softwares/mha/etc/
        mkdir: 已创建目录 "/oldboyedu/softwares/mha"
        mkdir: 已创建目录 "/oldboyedu/softwares/mha/etc/"
        [root@mysql203.oldboyedu.com ~]# 

    (2)创建日志目录:
        [root@mysql203.oldboyedu.com ~]# mkdir -pv /oldboyedu/logs/mha/app1
        mkdir: 已创建目录 "/oldboyedu/logs/mha"
        mkdir: 已创建目录 "/oldboyedu/logs/mha/app1"
        [root@mysql203.oldboyedu.com ~]# 

    (3)编辑MHA配置文件
        [root@mysql203.oldboyedu.com ~]# cat > /oldboyedu/softwares/mha/etc/app1.cnf <<EOF
        > [server default]
        > manager_log=/oldboyedu/logs/mha/app1/manager
        > manager_workdir=/oldboyedu/logs/mha/app1
        > master_binlog_dir=/oldboyedu/logs/mysql/binlog
        > user=mha
        > password=oldboyedu
        > ping_interval=2
        > repl_user=copy
        > repl_password=oldboyedu
        > ssh_user=root
        > 
        > [server1]
        > hostname=172.200.1.201
        > port=3306
        > 
        > [server2]
        > hostname=172.200.1.202
        > port=3306
        > 
        > [server3]
        > hostname=172.200.1.203
        > port=3306
        > EOF
        [root@mysql203.oldboyedu.com ~]# 
        [root@mysql203.oldboyedu.com ~]# cat /oldboyedu/softwares/mha/etc/app1.cnf 
        [server default]
        manager_log=/oldboyedu/logs/mha/app1/manager
        manager_workdir=/oldboyedu/logs/mha/app1
        master_binlog_dir=/oldboyedu/logs/mysql/binlog
        user=mha
        password=oldboyedu
        ping_interval=2
        repl_user=copy
        repl_password=oldboyedu
        ssh_user=root

        [server1]
        hostname=172.200.1.201
        port=3306

        [server2]
        hostname=172.200.1.202
        port=3306

        [server3]
        hostname=172.200.1.203
        port=3306
        [root@mysql203.oldboyedu.com ~]# 

    MHA配置文件说明如下:
        [server default]: 服务器默认的配置文件,适用于所有节点的配置。
            manager_log: 指定日志目录位置点。
            manager_workdir: 指定数据工作目录。
            master_binlog_dir: 指定主库的binlog日志位置点。
            user: 指定MHA工作的用户名,已经在所有MySQL实例创建。
            password: 指定MHA工作的密码,已经在所有MySQL实例创建。
            ping_interval: 指定ping的间隔时间。
            repl_user:主从复制专用的用户名,已经在主库MySQL实例创建。
            repl_password:主从复制专用的密码,已经在主库MySQL实例创建。
            ssh_user: 配置各节点互信的用户名。
        [server1]: server1节点信息
            hostname: 指定MySQL实例主机信息。
            port: 指定MySQL实例主机对应的端口。

    温馨提示:(直接将下面的命令粘贴到命令行就可以完成配置文件)
cat > /oldboyedu/softwares/mha/etc/app1.cnf <<EOF
[server default]
manager_log=/oldboyedu/logs/mha/app1/manager
manager_workdir=/oldboyedu/logs/mha/app1
master_binlog_dir=/oldboyedu/logs/mysql/binlog
user=mha
password=oldboyedu
ping_interval=2
repl_user=copy
repl_password=oldboyedu
ssh_user=root

[server1]
hostname=172.200.1.201
port=3306

[server2]
hostname=172.200.1.202
port=3306

[server3]
hostname=172.200.1.203
port=3306
EOF

9."MHA Manager"检查"MHA Node"的状态

    (1)检查各个节点的互信状态,如下所示,注意观察是否有"error"信息,如果有需要解决后再执行之后的操作,下面3个节点有6个"ok"说明是正常状态。
        [root@mysql203.oldboyedu.com ~]# masterha_check_ssh --conf=/oldboyedu/softwares/mha/etc/app1.cnf
        Wed Feb 17 20:54:18 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
        Wed Feb 17 20:54:18 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
        Wed Feb 17 20:54:18 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
        Wed Feb 17 20:54:18 2021 - [info] Starting SSH connection tests..
        Wed Feb 17 20:54:20 2021 - [debug] 
        Wed Feb 17 20:54:19 2021 - [debug]  Connecting via SSH from root@172.200.1.203(172.200.1.203:22) to root@172.200.1.201(172.200.1.201:22)..
        Wed Feb 17 20:54:19 2021 - [debug]   ok.
        Wed Feb 17 20:54:19 2021 - [debug]  Connecting via SSH from root@172.200.1.203(172.200.1.203:22) to root@172.200.1.202(172.200.1.202:22)..
        Wed Feb 17 20:54:20 2021 - [debug]   ok.
        Wed Feb 17 20:54:20 2021 - [debug] 
        Wed Feb 17 20:54:18 2021 - [debug]  Connecting via SSH from root@172.200.1.201(172.200.1.201:22) to root@172.200.1.202(172.200.1.202:22)..
        Wed Feb 17 20:54:18 2021 - [debug]   ok.
        Wed Feb 17 20:54:18 2021 - [debug]  Connecting via SSH from root@172.200.1.201(172.200.1.201:22) to root@172.200.1.203(172.200.1.203:22)..
        Wed Feb 17 20:54:20 2021 - [debug]   ok.
        Wed Feb 17 20:54:20 2021 - [debug] 
        Wed Feb 17 20:54:18 2021 - [debug]  Connecting via SSH from root@172.200.1.202(172.200.1.202:22) to root@172.200.1.201(172.200.1.201:22)..
        Wed Feb 17 20:54:19 2021 - [debug]   ok.
        Wed Feb 17 20:54:19 2021 - [debug]  Connecting via SSH from root@172.200.1.202(172.200.1.202:22) to root@172.200.1.203(172.200.1.203:22)..
        Wed Feb 17 20:54:20 2021 - [debug]   ok.
        Wed Feb 17 20:54:20 2021 - [info] All SSH connection tests passed successfully.
        [root@mysql203.oldboyedu.com ~]# 

    (2)检查主从的状态,如下所示,注意观察是否有"error"信息,如果有需要解决后再执行之后的操作,下面3个节点都是"Alive Servers",其中"Alive Slaves","Dead Servers"都可以一目了然的看清楚,当然还有一些参数监控等。
        [root@mysql203.oldboyedu.com ~]# masterha_check_repl --conf=/oldboyedu/softwares/mha/etc/app1.cnf
        Wed Feb 17 21:01:26 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
        Wed Feb 17 21:01:26 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
        Wed Feb 17 21:01:26 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
        Wed Feb 17 21:01:26 2021 - [info] MHA::MasterMonitor version 0.56.
        Wed Feb 17 21:01:27 2021 - [info] GTID failover mode = 1
        Wed Feb 17 21:01:27 2021 - [info] Dead Servers:
        Wed Feb 17 21:01:27 2021 - [info] Alive Servers:
        Wed Feb 17 21:01:27 2021 - [info]   172.200.1.201(172.200.1.201:3306)
        Wed Feb 17 21:01:27 2021 - [info]   172.200.1.202(172.200.1.202:3306)
        Wed Feb 17 21:01:27 2021 - [info]   172.200.1.203(172.200.1.203:3306)
        Wed Feb 17 21:01:27 2021 - [info] Alive Slaves:
        Wed Feb 17 21:01:27 2021 - [info]   172.200.1.202(172.200.1.202:3306)  Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
        Wed Feb 17 21:01:27 2021 - [info]     GTID ON
        Wed Feb 17 21:01:27 2021 - [info]     Replicating from 172.200.1.201(172.200.1.201:3306)
        Wed Feb 17 21:01:27 2021 - [info]   172.200.1.203(172.200.1.203:3306)  Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
        Wed Feb 17 21:01:27 2021 - [info]     GTID ON
        Wed Feb 17 21:01:27 2021 - [info]     Replicating from 172.200.1.201(172.200.1.201:3306)
        Wed Feb 17 21:01:27 2021 - [info] Current Alive Master: 172.200.1.201(172.200.1.201:3306)
        Wed Feb 17 21:01:27 2021 - [info] Checking slave configurations..
        Wed Feb 17 21:01:27 2021 - [info]  read_only=1 is not set on slave 172.200.1.202(172.200.1.202:3306).
        Wed Feb 17 21:01:27 2021 - [info]  read_only=1 is not set on slave 172.200.1.203(172.200.1.203:3306).
        Wed Feb 17 21:01:27 2021 - [info] Checking replication filtering settings..
        Wed Feb 17 21:01:27 2021 - [info]  binlog_do_db= , binlog_ignore_db= 
        Wed Feb 17 21:01:27 2021 - [info]  Replication filtering check ok.
        Wed Feb 17 21:01:27 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
        Wed Feb 17 21:01:27 2021 - [info] Checking SSH publickey authentication settings on the current master..
        Wed Feb 17 21:01:27 2021 - [info] HealthCheck: SSH to 172.200.1.201 is reachable.
        Wed Feb 17 21:01:27 2021 - [info] 
        172.200.1.201(172.200.1.201:3306) (current master)
         +--172.200.1.202(172.200.1.202:3306)
         +--172.200.1.203(172.200.1.203:3306)

        Wed Feb 17 21:01:27 2021 - [info] Checking replication health on 172.200.1.202..
        Wed Feb 17 21:01:27 2021 - [info]  ok.
        Wed Feb 17 21:01:27 2021 - [info] Checking replication health on 172.200.1.203..
        Wed Feb 17 21:01:27 2021 - [info]  ok.
        Wed Feb 17 21:01:27 2021 - [warning] master_ip_failover_script is not defined.
        Wed Feb 17 21:01:27 2021 - [warning] shutdown_script is not defined.
        Wed Feb 17 21:01:27 2021 - [info] Got exit code 0 (Not master dead).

        MySQL Replication Health is OK.
        [root@mysql203.oldboyedu.com ~]#  

    温馨提示:
        如果出现"error"字样的报错信息先不要慌,可能是网络波动导致,可以再次尝试运行脚本观察结果。

10.启动"MHA Manager"

[root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf  # 未启动服务之前,检查MHA状态
app1 is stopped(2:NOT_RUNNING).
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# nohup masterha_manager --conf=/oldboyedu/softwares/mha/etc/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /oldboyedu/logs/mha/app1/manager.log 2>&1 &
[1] 34776
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# ll /oldboyedu/logs/mha/app1/manager.log 
-rw-r--r-- 1 root root 343 2月  17 21:11 /oldboyedu/logs/mha/app1/manager.log
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# cat /oldboyedu/logs/mha/app1/manager.log  # 启动MHA后,注意查看日志内容哟~
Wed Feb 17 21:11:57 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Feb 17 21:11:57 2021 - [info] Reading application default configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
Wed Feb 17 21:11:57 2021 - [info] Reading server configuration from /oldboyedu/softwares/mha/etc/app1.cnf..
[root@mysql203.oldboyedu.com ~]# 
[root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf  # 启动服务之后,检查MHA状态
app1 (pid:34776) is running(0:PING_OK), master:172.200.1.201
[root@mysql203.oldboyedu.com ~]# 

11.重启MHA管理脚本

    停止MHA:
        [root@mysql203.oldboyedu.com ~]# masterha_stop --conf=/oldboyedu/softwares/mha/etc/app1.cnf

    启动MHA:
        [root@mysql203.oldboyedu.com ~]# nohup masterha_manager --conf=/oldboyedu/softwares/mha/etc/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /oldboyedu/logs/mha/app1/manager.log 2>&1 &

    检查MHA状态:
        [root@mysql203.oldboyedu.com ~]# masterha_check_status --conf=/oldboyedu/softwares/mha/etc/app1.cnf

    温馨提示:
        如果报错了就查看"/oldboyedu/logs/mha/app1/manager"日志信息。当然,该日志信息由"app1.cnf"配置文件"manager_log"指定的哟~

三.可能遇到的错误

1. Error happened on monitoring servers

错误原因:
    通常情况下有两种常见的原因,要么是配置文件写错了,要么是网络不通。

解决方案:
    查看配置文件是否配置正确,检查防火墙是否关闭。

image-20210727113046542

2.Access denied for user 'mha'@'mysql106.oldboyedu.com'

错误原因:   
    检查主从状态的mha用户无权限。

解决方案:
    配置mha相关用户。

温馨提示:
    在启动MHA之前,建议使用"masterha_check_repl --conf=/oldboyedu/softwares/mha/etc/app1.cnf"去查看主从状态。

image-20210727124754378

谨此笔记,记录过往。凭君阅览,如能收益,莫大奢望。
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇