本文最后更新于 319 天前,其中的信息可能已经过时,如有错误请发送邮件到wuxianglongblog@163.com
MySQL主从复制(Replication)架构从库线程故障分析及处理
一.主从复制故障时关心的参数
通过"SHOW SLAVE STATUS\G"命令查看主从复制监控方法。
我们通常关心以下几个参数信息:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1.IO线程问题
I/O线程作用:
从库根据"master.info"记录的主库链接信息,从主库拉取二进制日志并落地到本地的中继日志(relay-log)。
正常状态:
Slave_IO_Running: Yes
非正常状态:
Slave_IO_Running: No
Slave_IO_Running: Connecting
故障原因:
连接主库失败:
(1)网络,端口,防火墙等问题;
(2)用户名权限问题,必须得拥有"REPLICATION SLAVE"权限;
(3)主库的连接数("max_connections")上限,默认并发用户数量是151个;
(4)数据库实例版本不统一,很多公司为了升级数据库版本,从而构建了主从复制进行数据迁移,但需要注意的是,MySQL 8.0有关用户的密码策略的验证方式有所不同。
请求日志,接受日志出现故障:
(1)主库二进制日志不完整,如被删除,损坏,不连续日志等问题;
(2)从库请求的起点问题,如请求的起点较实际起点位置偏小,则可能导致执行重复的SQL语句,而从导致数据不完整;
(3)主从库的server_id相同等问题,如果基于GTID的方式对应的server_uuid也是不能重复的哟;
(4)中继日志问题;
2.SQL线程问题
SQL线程的作用:
SQL线程用来在后台回放中继日志(relay-log),可以理解为执行中继日志(relay-log)中的SQL语句,与此同时,它还会更新relay-log.info文件。
正常状态:
Slave_SQL_Running: Yes
非正常状态:
Slave_SQL_Running: No
如果SQL线程故障了,可以理解为SQL线程执行不了SQL语句,可能的原因如下:
(1)创建的对象已经存在了,比如要创建一个数据库,而数据库已经存在了(可能是人为创建了该数据库);
(2)需要操作的对象不存在,比如往表中插入数据,而表并不存在(可能是人为删除了表);
(3)因约束冲突导致数据SQL执行失败,比如主键,唯一键,外键,非空约束等特性;
(4)如果主库和从库的SQL_MODE不同,也有可能导致从库执行SQL失败;
(5)主从数据库实例配置参数不同,比如主库可读写可写,而从库是只读的);
(6)主从数据库实例MySQL版本不同,我们以GRANT命令为例,在MySQL 8.0版本只能授权给已存在的用户,而MySQL 5.7及之前的版本是可以授权不存在的用户,并且再授权后会自动创建不存在的用户;
二.IO线程故障模拟
1.用户名权限问题故障模拟及解决方案
模拟故障
查看主库的binlog位置信息:
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3307.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 16
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 | 350 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql>
在实验之前,我从库数据和主库数据已经完全一致了,我在从库执行以下命令目的是为了重新配置从库来模拟故障:
mysql> RESET SLAVE ALL; # 解除从库的所有配置
Query OK, 0 rows affected (0.29 sec)
mysql>
mysql> SHOW SLAVE STATUS\G # 解除后再次查看从库的状态,不难发现,从库的状态信息为空了。
Empty set (0.00 sec)
mysql>
(1)主库查看二进制日志位置信息
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000005 | 350 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql>
(2)从库重新配置链接主库的相关信息(该步骤会多出来"relay-log.info"和"master.info"这两个文件)
mysql> SHOW SLAVE STATUS\G
Empty set (0.00 sec)
mysql>
mysql> CHANGE MASTER TO
-> MASTER_HOST='172.200.1.201',
-> MASTER_USER='copy',
-> MASTER_PASSWORD='oldboyedu2020', # 注意哈,此处我故意写错了密码,正确的用户名密码为"oldboyedu"。我就是为了模拟故障。
-> MASTER_PORT=3307,
-> MASTER_LOG_FILE='mysql-bin.000005',
-> MASTER_LOG_POS=350,
-> MASTER_CONNECT_RETRY=10;
Query OK, 0 rows affected, 2 warnings (0.29 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 350
Relay_Log_File: docker201-relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 350
Relay_Log_Space: 154
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
Master_UUID:
Master_Info_File: /oldboyedu/data/mysql3309/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
(3)当我们启动从库的IO线程和SQL线程时,不难发现,IO线程的状态始终是"Connecting",这就得需要咱们来解决。
mysql> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
...
Slave_IO_Running: Connecting
Slave_SQL_Running: Yes
...
Last_IO_Errno: 1045
Last_IO_Error: error connecting to master 'copy@172.200.1.201:3307' - retry-time: 10 retries: 1
Last_SQL_Errno: 0
Last_SQL_Error:
...
1 row in set (0.00 sec)
mysql>
解决故障
验证主库链接信息:如果出现上述故障,我们就得拿着测试用户去尝试连接,发现果真使用配置的信息是无法连接到主库的,当然,我们还做了其它的几种案例测试:
(1)只有密码写错,报错信息如下:
[root@docker201.oldboyedu.com ~]# mysql -u copy -poldboyedu2020 -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Unknown error 1045
[root@docker201.oldboyedu.com ~]#
(2)用户名和密码都写错,报错信息如下:
[root@docker201.oldboyedu.com ~]# mysql -u copy2 -poldboyedu2020 -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Unknown error 1045
[root@docker201.oldboyedu.com ~]#
(3)用户名和端口写错,报错信息如下:
[root@docker201.oldboyedu.com ~]# mysql -u copy2 -poldboyedu -h 172.200.1.201 -P 330
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '172.200.1.201' (111)
[root@docker201.oldboyedu.com ~]#
(4)只写错用户名:
[root@docker201.oldboyedu.com ~]# mysql -u copy2 -poldboyedu -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Unknown error 1045
[root@docker201.oldboyedu.com ~]#
(5)写错用户名和端口:
[root@docker201.oldboyedu.com ~]# mysql -u copy2 -poldboyedu -h 172.200.1.201 -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '172.200.1.201' (111)
[root@docker201.oldboyedu.com ~]#
报错分析:
很明显,从上面的测试结果不难发现,是由于从库在配置主库的用户连接信息时导致的错误。
解决方案:
(1)使用正确的用户名和密码进行登录即可,如下所示,使用正确的密码就可以正常登录。
[root@docker201.oldboyedu.com ~]# mysql -u copy -poldboyedu -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 107
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.01 sec)
mysql>
mysql> QUIT
Bye
[root@docker201.oldboyedu.com ~]#
(2)重新配置MySQL主从复制的配置信息即可
mysql> RESET SLAVE ALL;
Query OK, 0 rows affected (0.02 sec)
mysql>
mysql> SHOW SLAVE STATUS\G\
Empty set (0.00 sec)
mysql>
mysql> CHANGE MASTER TO
-> MASTER_HOST='172.200.1.201',
-> MASTER_USER='copy',
-> MASTER_PASSWORD='oldboyedu', # 本次使用正确的密码哟~
-> MASTER_PORT=3307,
-> MASTER_LOG_FILE='mysql-bin.000005',
-> MASTER_LOG_POS=350,
-> MASTER_CONNECT_RETRY=10;
Query OK, 0 rows affected, 2 warnings (0.31 sec)
mysql>
mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 350
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 350
Relay_Log_Space: 531
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3309/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
2.最大连接数上限问题故障模拟及解决方案
模拟故障
如下所示,为了达到实验效果,我们将主库的最大连接数设置为5,"max_connections"的值调小,该值默认值为"151",生产环境中可以适当调大。
mysql> SELECT @@max_connections;
+-------------------+
| @@max_connections |
+-------------------+
| 151 |
+-------------------+
1 row in set (0.00 sec)
mysql>
mysql> SET GLOBAL max_connections=5;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SELECT @@max_connections;
+-------------------+
| @@max_connections |
+-------------------+
| 5 |
+-------------------+
1 row in set (0.00 sec)
mysql>
如下图所示,当链接的用户达到5个时,就会抛出如下所示的异常:
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3307.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 151
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SHOW PROCESSLIST;
+-----+------+---------------------------------+------+-------------+------+---------------------------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+---------------------------------+------+-------------+------+---------------------------------------------------------------+------------------+
| 17 | copy | docker201.oldboyedu.com:49594 | NULL | Binlog Dump | 4148 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 117 | copy | docker201.oldboyedu.com:49798 | NULL | Binlog Dump | 1612 | Master has sent all binlog to slave; waiting for more updates | NULL |
| 148 | repl | docker201.oldboyedu.com:49852 | NULL | Sleep | 19 | | NULL |
| 149 | repl | docker201.oldboyedu.com:49854 | NULL | Sleep | 17 | | NULL |
| 151 | root | localhost | NULL | Query | 0 | starting | SHOW PROCESSLIST |
+-----+------+---------------------------------+------+-------------+------+---------------------------------------------------------------+------------------+
5 rows in set (0.00 sec)
mysql>
mysql> SELECT @@max_connections;
+-------------------+
| @@max_connections |
+-------------------+
| 5 |
+-------------------+
1 row in set (0.00 sec)
mysql>
正常连接:
[root@docker201.oldboyedu.com ~]# mysql -u repl -poldboyedu -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 146
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
异常链接:
[root@docker201.oldboyedu.com ~]# mysql -u repl -poldboyedu -h 172.200.1.201 -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1040 (08004): Unknown error 1040
[root@docker201.oldboyedu.com ~]#
解决方案
既然知道是由于"max_connections"参数导致的,因此在生产环境中可以适当将它调大,比如设置为1024甚至更大,以增大用户的并发量。建议在配置文件中定义。
3.主库二进制日志不完整等问题故障模拟及解决方案
模拟故障
(1)主库删除清空所有二进制日志,并重新开始记录二进制日志
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3307.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 152
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> RESET MASTER;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 154 | | | |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
mysql>
(2)从库的IO线程自动停止,而Last_IO_Errno返回的错误代码为"1236"
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3308.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 17
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000005
Read_Master_Log_Pos: 350
Relay_Log_File: docker201-relay-bin.000004
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000005
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 350
Relay_Log_Space: 940
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 210214 15:40:49
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
解决方案
方案一:
重新搭建主从复制,问题肯定是可以解决的,但效率较低。
方案二:
(1)生产环境中,需要在业务不繁忙期间,停止业务5-10分钟左右进行维护是可行的,当然,要执行该操作时可以和你的领导要提前打好招呼哟;
(2)等待从库重放完所有主库日志(必要条件);
(3)主库执行"RESET MASTER"命令;
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3307.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 152
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> RESET MASTER;
Query OK, 0 rows affected (0.01 sec)
mysql>
(4)从库重新同步主库日志
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3308.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> STOP SLAVE;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql>
mysql> RESET SLAVE ALL;
Query OK, 0 rows affected (0.02 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
Empty set (0.00 sec)
mysql>
mysql> CHANGE MASTER TO
-> MASTER_HOST='172.200.1.201',
-> MASTER_USER='copy',
-> MASTER_PASSWORD='oldboyedu',
-> MASTER_PORT=3307,
-> MASTER_LOG_FILE='mysql-bin.000001',
-> MASTER_LOG_POS=154,
-> MASTER_CONNECT_RETRY=10;
Query OK, 0 rows affected, 2 warnings (0.05 sec)
mysql>
mysql> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 154
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 531
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
温馨提示:
如果业务繁忙期间做"RESET MASTER"操作,很有可能导致数据库阻塞一段时间,因为在此期间MySQL是需要删除以前存在的大量日志并重新启动新的日志。
如果无法满足方案二的必要条件,还是想要恢复主从,那就得采用方案一,即重新搭建主从,这样会很浪费时间,尤其是在数据量较大时,在从库做数据恢复时时间可能较长。
CHANGE MASTER TO
MASTER_HOST='172.200.1.201',
MASTER_USER='copy',
MASTER_PASSWORD='oldboyedu',
MASTER_PORT=3307,
MASTER_LOG_FILE='mysql-bin.000002',
MASTER_LOG_POS=154,
MASTER_CONNECT_RETRY=10;
三.SQL线程故障模拟
主从复制时创建的对象已经存在案例模拟
模拟故障
(1)在从库创建"bigdata"数据库
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3308.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SELECT @@server_id;
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| oldboyedu |
+--------------------+
5 rows in set (0.00 sec)
mysql>
mysql> CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4;
Query OK, 1 row affected (0.00 sec)
mysql>
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| bigdata |
| mysql |
| performance_schema |
| sys |
| oldboyedu |
+--------------------+
6 rows in set (0.00 sec)
mysql>
(2)在主库创建"bigdata"数据库,并在该库下创建一张"hadoop"表
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3307.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SELECT @@server_id;
+-------------+
| @@server_id |
+-------------+
| 7 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4;
Query OK, 1 row affected (0.00 sec)
mysql>
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| bigdata |
| mysql |
| performance_schema |
| sys |
| oldboyedu |
+--------------------+
6 rows in set (0.00 sec)
mysql>
mysql> CREATE TABLE bigdata.hadoop (id int, name CHAR(10));
Query OK, 0 rows affected (0.02 sec)
mysql>
mysql> SHOW TABLES FROM bigdata;
+-------------------+
| Tables_in_bigdata |
+-------------------+
| hadoop |
+-------------------+
1 row in set (0.00 sec)
mysql>
(3)不难发现,从库竟然竟然没有hadoop这张表,只有bigdata数据库,而且该数据库是之前我们手动创建的!
[root@docker201.oldboyedu.com ~]# mysql -S /tmp/mysql3308.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
mysql> SELECT @@server_id;
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> SHOW TABLES FROM bigdata;
Empty set (0.00 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 535
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1007
Last_Error: Error 'Unknown error 1007' on query. Default database: 'bigdata'. Query: 'CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4'
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 912
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1007
Last_SQL_Error: Error 'Unknown error 1007' on query. Default database: 'bigdata'. Query: 'CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 210214 21:41:04
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
解决方案
方案一:
一切以主库为准,将从库进行反操作,重启线程。对应本案例就是直接将从库的数据库删除掉,而后启动SQL线程即可解决问题,因为IO线程是启动着的。
mysql> SELECT @@server_id;
+-------------+
| @@server_id |
+-------------+
| 8 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 535
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1007
Last_Error: Error 'Unknown error 1007' on query. Default database: 'bigdata'. Query: 'CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4'
Skip_Counter: 0
Exec_Master_Log_Pos: 154
Relay_Log_Space: 912
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1007
Last_SQL_Error: Error 'Unknown error 1007' on query. Default database: 'bigdata'. Query: 'CREATE DATABASE bigdata DEFAULT CHARSET utf8mb4'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 210214 21:41:04
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.01 sec)
mysql>
mysql> DROP DATABASE bigdata;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 535
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 701
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 535
Relay_Log_Space: 912
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
mysql> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| bigdata |
| mysql |
| performance_schema |
| sys |
| oldboyedu |
+--------------------+
6 rows in set (0.00 sec)
mysql>
mysql> SHOW TABLES FROM bigdata;
+-------------------+
| Tables_in_bigdata |
+-------------------+
| hadoop |
+-------------------+
1 row in set (0.01 sec)
mysql>
方案二:
以从库为准,临时跳过此次复制错误。如果非要用该方法,一定要保证此次故障数据是完全一直时才能采用。
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 694
Relay_Log_File: docker201-relay-bin.000002
Relay_Log_Pos: 701
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1007
Last_Error: Error 'Unknown error 1007' on query. Default database: 'db01'. Query: 'CREATE DATABASE db01'
Skip_Counter: 0
Exec_Master_Log_Pos: 535
Relay_Log_Space: 1071
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1007
Last_SQL_Error: Error 'Unknown error 1007' on query. Default database: 'db01'. Query: 'CREATE DATABASE db01'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 210214 21:56:52
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
mysql> STOP SLAVE;
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> SET GLOBAL sql_slave_skip_counter=1; # 将同步指针向下移动一个,如果多次不同步,可以重复操作。
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
mysql>
mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.200.1.201
Master_User: copy
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 694
Relay_Log_File: docker201-relay-bin.000003
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 694
Relay_Log_Space: 1237
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 7
Master_UUID: 7c5bfd7b-52db-11eb-b842-000c29820c67
Master_Info_File: /oldboyedu/data/mysql3308/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
mysql>
方案三:(该方法比较暴力,遇到指定的错误代码直接跳过,生产环境中并不推荐使用!)
在配置文件"my.cnf"中定义,让从库忽略复制时的错误代码,如下所示:
slave-skip-errors = 1032,1062,1007
跳过的错误代码说明如下:
1032:
无法执行DML。
1062:
主键冲突,或约束冲突。
1007:
创建的对象已经存在。
方案四:
重新搭建主从架构,对主库进行备份,而后在从库进行恢复,重新构建主从复制架构。
方案五:
我们可以将从库设置为只读状态,这样就没法手动在从库创建数据库啦~
(1)如下图所示,我们可以基于参数控制从库只读:
mysql> SELECT @@read_only; # 让普通用户只读
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> SELECT @@super_read_only; # 让管理员只读
+-------------------+
| @@super_read_only |
+-------------------+
| 0 |
+-------------------+
1 row in set (0.00 sec)
mysql>
(2)利用中间件控制从库只读:
用户的所有读写操作发给中间件,而后由中间件来实现读写分离。用户无需直接和数据库打交道,而是直接和中间件打交道,比如mycat工具。
四.MySQL自愈能力
我们后续的章节会讲解MHA架构,但该架构并不能实现自愈功能。
如果想要实现自愈功能,建议使用:"MySQL 8.x版本 + MGR + Mysqlsh + K8S"。
五.MySQL常见的报错故障代码
1.Last_IO_Errno: 1593
故障原因:
3307是主库,而从库也是3307实例,"CHANGE MASTER TO ..."指向3307实例自己。
解决方案:
(1)这种情况下大多数是配置错误导致的,我们环境架构中主库是3307,从库是3308和3309。
(2)如果在生产环境中,多实例部署也出现这种错误也不要慌,可以修改数据目录下"auto.cnf"的"server-uuid"值并重启服务即可。
在DB架构之初要有严谨的思路,避免问题的发生:
同时采用Binlog_Do_DB,Replicate_Do_DB指定要复制的数据库,用Binlog_Ignore_DB,Replicate_Ignore_DB禁用不允许复制的数据库(比如mysql自带的mysql数据库等)避免异常bug发生...
2.Last_IO_Errno: 1045
故障原因:
通常是验证用户名和密码错误。
解决方案:
手动验证"CHANGE MASTER TO"配置的用户名和密码,验证是否可以正常登陆。
3.Last_IO_Errno: 2003
故障原因:
通常是连接异常,即无法和MySQL数据库建立连接。
解决方案:
(1)先检查指定MySQL连接的主机和端口是否正确;
(2)而后检查是否能与指定的主机和端口建立TCP连接,有可能防火墙等因素拦截了请求建立;
4.Last_Errno: 1007 或者 Last_SQL_Errno: 1007
故障原因:
从库做了写入操作,而且写入的数据和从主库同步的数据有所冲突。
解决方案:
方案一:
一切以主库为准,将从库进行反操作,重启线程。对应本案例就是直接将从库的数据库删除掉,而后启动SQL线程即可解决问题,因为IO线程是启动着的。
方案二:
以从库为准,临时跳过此次复制错误。如果非要用该方法,一定要保证此次故障数据是完全一直时才能采用。
方案三:(该方法比较暴力,遇到指定的错误代码直接跳过,生产环境中并不推荐使用!)
在配置文件"my.cnf"中定义,让从库忽略复制时的错误代码,如下所示:
slave-skip-errors = 1032,1062,1007
跳过的错误代码说明如下:
1032:
无法执行DML。
1062:
主键冲突,或约束冲突。
1007:
创建的对象已经存在。
方案四:
重新搭建主从架构,对主库进行备份,而后在从库进行恢复,重新构建主从复制架构。
方案五:(该方法不能解决已发生1007故障的事情,但可以预防1007故障的发生哟~)
我们可以将从库设置为只读状态,这样就没法手动在从库创建数据库啦~
(1)如下图所示,我们可以基于参数控制从库只读:
mysql> SELECT @@read_only; # 让普通用户只读
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
1 row in set (0.00 sec)
mysql>
mysql> SELECT @@super_read_only; # 让管理员只读
+-------------------+
| @@super_read_only |
+-------------------+
| 0 |
+-------------------+
1 row in set (0.00 sec)
mysql>
(2)利用中间件控制从库只读:
用户的所有读写操作发给中间件,而后由中间件来实现读写分离。用户无需直接和数据库打交道,而是直接和中间件打交道,比如mycat工具。
5.Last_SQL_Errno: 1146
故障原因:
我理解的是MySQL从库对于类似于"CREATE TABLE oldboyedu.student (id int);"无法识别,导致student表无法创建。
解决方案:
我们可以将原语句换成"USE oldboyedu;CREATE TABLE student (id int);"即可解决问题,否则默认情况下从库是不会创建对应的表哟~
其他解决方案:
待补充....