MySQL Case-MySQL5.7无效的并行复制
MySQL5.7主从延迟
客户环境主从延迟1500多秒,并且还在持续增加,执行show slave status延迟信息如下
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1468
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 57
Master_UUID: 59965767-90d3-11ea-8f35-c2d49fafca8c
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
查询从库IO负载,IO磁盘性能较差,每秒写15M,使用繁忙度几乎100%
[root@s2 root]# iostat -xm 2 22
avg-cpu: %user %nice %system %iowait %steal %idle
1.34 0.00 3.42 11.53 16.72 66.99
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 50.25 181.09 552.74 8.67 16.22 69.47 4.91 6.68 8.59 6.05 1.29 94.93
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 185.07 499.50 8.99 15.88 74.41 4.47 6.49 8.35 5.80 1.39 94.83
dm-1 0.00 0.00 0.00 86.57 0.00 0.34 8.00 1.07 12.31 0.00 12.31 0.37 3.18
avg-cpu: %user %nice %system %iowait %steal %idle
2.06 0.00 3.31 11.06 11.88 71.69
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 57.50 196.50 518.00 11.94 15.53 78.72 4.32 6.06 8.02 5.32 1.31 93.65
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 192.50 457.00 11.62 15.19 84.54 3.92 6.05 8.16 5.16 1.44 93.35
dm-1 0.00 0.00 0.00 101.50 0.00 0.40 8.00 0.92 9.03 0.00 9.03 0.18 1.85
avg-cpu: %user %nice %system %iowait %steal %idle
1.69 0.00 2.77 10.86 18.28 66.41
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 160.00 435.00 6.63 13.60 69.63 3.73 6.27 8.35 5.50 1.47 87.25
scd0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 160.00 412.50 6.63 13.53 72.14 3.72 6.49 8.32 5.78 1.52 87.05
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
那么究竟是什么进程占用这么大IO呢?
[root@s2 fd]# strace -frtT -p 23372 -o /tmp/23372.txt
过滤处理分别生成write和read文件
cat /tmp/23372.txt | grep write > write
cat /tmp/23372.txt | grep read > read
[root@s2 fd]# cat /tmp/write | grep "write(71" | wc -l
2128
[root@s2 fd]# cat /tmp/read | grep "read(69" | wc -l
4228
从上面内容得知,69文件和71文件,分别正在进行大量的读写操作
[root@s2 fd]# ls -l | grep 69
lr-x------ 1 root root 64 Aug 24 10:19 69 -> /mysql/product/mysql57/data/s2-relay-bin.000027
[root@s2 fd]# ls -l | grep 71
lrwx------ 1 root root 64 Aug 24 10:19 71 -> /mysql/product/mysql57/data/s2-relay-bin.000033
从上面看得知,正在写s2-relay-bin.000033文件,读s2-relay-bin.000027文件,相差6个relay文件,说明SQL Thread追不上IO Thread了,到这里你会想到什么呢?是不是要开启并行复制~~,对的!
开启并行复制
stop slave sql_thread;
set global slave_parallel_type='LOGICAL_CLOCK';
set global slave_parallel_workers=16;
start slave sql_thread;
开启并行复制后,继续观察延迟情况,如下并没有得到解决
mysql> show slave status\\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.239.51
Master_User: slave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000075
Read_Master_Log_Pos: 94907731
Relay_Log_File: s2-relay-bin.000030
##SQL线程当前正在读取和执行的中继日志文件的名称
Relay_Log_Pos: 108332883
Relay_Master_Log_File: mysql-bin.000073
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 108332670
Relay_Log_Space: 2334806964
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 2894
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 57
Master_UUID: 59965767-90d3-11ea-8f35-c2d49fafca8c
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set:
Auto_Position: 0
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
MySQL异步复制流程
同样的方式分析,再次查看relay读写情况IO Thread长在写36号realy log,SQL Thread长在读30号relay log,仍然相差6个relay log
706 -> /mysql/product/mysql57/data/s2-relay-bin.000036
69 -> /mysql/product/mysql57/data/s2-relay-bin.000030
思考与结论
上面主从延迟是IO问题造成的,还是SQL Thread线程效率太慢导致的呢?看到iostat输出磁盘繁忙度90%多,你可能直观的认为IO达到了瓶颈,但后面从库读取和写入relay的进度可以看出,SQL Thread已经跟不上复制进度了。如果主机上的负载不大,那么组提交的效率就不高,很有可能发生每组提交的事务数量仅有1个,那么在从机的回放时, 虽然开启了并行复制,但是性能并没有明显提升,更有可能反而比原先的单线程还要差的现象(因为有协同线程参与)。MySQL 8.0最新的基于writeset的MTS才是最终的解决之道。你认为8.0的并行复制可以解决上述问题吗?
更多文章欢迎关注本人公众号,搜dbachongzi或扫二维码
作者:姚崇 Oracle OCM、MySQL OCP、Oceanbase OBCA、PingCAP PCTA认证,擅长基于Oracle、MySQL Performance Turning及多种关系型 NoSQL数据库。
