您现在的位置是:首页 > 文章详情

加速无索引表引起的主从延迟数据回放

日期:2025-02-10点击:73

加速无索引表引起的主从延迟数据回放

一、场景

由于某些原因,客户现场存在一张 8千万 的大表,而且该表上无任何索引(也无主键),平时该表上 UPDATEDELETE 只操作几条数据。忽然有一天业务进行了某种操作,DELETE 2万 条数据,悲剧发生了,当在主库上执行了之后,传到从库上之后一直回放,当时评估了下可能会回放10天,后来在经过业务同意之后,对表进行操作,用于加速回放日志,处理该问题。

二、处理思路

停掉复制线程,关闭 Binlog 日志记录,添加索引,重启复制线程。

三、复现步骤

1、准备数据:

创建库:

greatsql> CREATE DATABASE qj; Query OK, 1 row affected (0.01 sec) 

准备数据,准备一张8千万数据的表:

$ sysbench --db-driver=mysql --mysql-host=192.168.139.230 --mysql-port=3307 --mysql-user=greatsql --mysql-password=GreatSQL@2024 --mysql-db=qj --table_size=80000000 --tables=1 oltp_write_only prepare sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3) Creating table 'sbtest1'... Inserting 80000000 records into 'sbtest1' Creating a secondary index on 'sbtest1'... 

主节点修改表结构,将表的索引去掉

greatsql> USE qj Database changed greatsqll> ALTER TABLE sbtest1 MODIFY id int not null; Query OK, 80000000 rows affected (36 min 29.13 sec) Records: 80000000 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest1 DROP key `k_1`; Query OK, 0 rows affected (0.96 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> ALTER TABLE sbtest1 DROP PRIMARY KEY; Query OK, 80000000 rows affected (14 min 29.31 sec) Records: 80000000 Duplicates: 0 Warnings: 0 

2、模拟延迟:

01、主节点:

模拟删除数据

greatsql> DELETE FROM qj.sbtest1 WHERE k<20000; Query OK, 19894 rows affected (4 min 20.40 sec) 

02、从节点:

等待时间超过DELETE语句的时间,观察延迟情况,确认从节点可读可写

greatsql> SELECT sleep(300); +------------+ | sleep(300) | +------------+ | 0 | +------------+ 1 row in set (5 min 0.01 sec) greatsql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.139.230 Master_User: greatsql Master_Port: 3307 Connect_Retry: 60 Master_Log_File: binlog.000078 Read_Master_Log_Pos: 953750186 Relay_Log_File: relaylog.000231 Relay_Log_Pos: 949954084 Relay_Master_Log_File: binlog.000078 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 949953874 Relay_Log_Space: 953750683 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 1345 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1000403307 Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235 Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) greatsql> SHOW GLOBAL VARIABLES LIKE 'read_only'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | OFF | +---------------+-------+ 1 row in set (0.01 sec) 

03、通过操作从节点加速日志回放

找到回放线程,KILL掉回放线程,关闭当前会话binlog日志记录,添加索引,重新启动复制回放线程

greatsql> SHOW PROCESSLIST; +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 2369454 | Waiting on empty queue | NULL | 2369454039 | 0 | 0 | | 17 | system user | connecting host | NULL | Connect | 2369447 | Waiting for source to send event | NULL | 2369446555 | 0 | 0 | | 543448 | greatsql | 172.17.136.93:48298 | NULL | Sleep | 110 | | NULL | 109832 | 0 | 0 | | 543500 | greatsql | 172.17.139.230:35198 | NULL | Sleep | 586 | | NULL | 585885 | 0 | 0 | | 543588 | greatsql | 172.17.136.93:57948 | NULL | Sleep | 50 | | NULL | 49878 | 0 | 0 | | 543663 | greatsql | 172.17.139.230:35726 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 543666 | greatsql | 172.17.136.93:58908 | NULL | Sleep | 290 | | NULL | 289861 | 0 | 0 | | 543708 | system user | | NULL | Query | 477 | Replica has read all relay log; waiting for more updates | NULL | 476520 | 0 | 0 | | 543709 | system user | | qj | Query | 1383 | Applying batch of row changes (delete) | NULL | 26102 | 0 | 0 | | 543710 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476570 | 0 | 0 | | 543711 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476569 | 0 | 0 | | 543712 | system user | | NULL | Connect | 477 | Waiting for an event from Coordinator | NULL | 476568 | 0 | 0 | | 543714 | greatsql | 172.17.136.93:33582 | NULL | Sleep | 470 | | NULL | 469798 | 0 | 0 | +--------+-----------------+----------------------+------+---------+---------+----------------------------------------------------------+------------------+------------+-----------+---------------+ 13 rows in set (0.00 sec) greatsql> KILL 543709; Query OK, 0 rows affected (0.00 sec) greatsql> SHOW PROCESSLIST; +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined | +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+ | 5 | event_scheduler | localhost | NULL | Daemon | 2369473 | Waiting on empty queue | NULL | 2369473241 | 0 | 0 | | 17 | system user | connecting host | NULL | Connect | 2369466 | Waiting for source to send event | NULL | 2369465757 | 0 | 0 | | 543448 | greatsql | 172.17.136.93:48298 | NULL | Sleep | 129 | | NULL | 129034 | 0 | 0 | | 543500 | greatsql | 172.17.139.230:35198 | NULL | Sleep | 605 | | NULL | 605087 | 0 | 0 | | 543588 | greatsql | 172.17.136.93:57948 | NULL | Sleep | 9 | | NULL | 9132 | 0 | 0 | | 543663 | greatsql | 172.17.139.230:35726 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 | | 543666 | greatsql | 172.17.136.93:58908 | NULL | Sleep | 309 | | NULL | 309064 | 0 | 0 | | 543714 | greatsql | 172.17.136.93:33582 | NULL | Sleep | 489 | | NULL | 489000 | 0 | 0 | +--------+-----------------+----------------------+------+---------+---------+----------------------------------+------------------+------------+-----------+---------------+ 8 rows in set (0.00 sec) greatsql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.139.230 Master_User: greatsql Master_Port: 3307 Connect_Retry: 60 Master_Log_File: binlog.000078 Read_Master_Log_Pos: 953750186 Relay_Log_File: relaylog.000231 Relay_Log_Pos: 949954084 Relay_Master_Log_File: binlog.000078 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1317 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Skip_Counter: 0 Exec_Master_Log_Pos: 949953874 Relay_Log_Space: 953750683 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1317 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '1d0963e9-85d9-11ef-80e1-00163e28e06a:163235' at master log binlog.000078, end_log_pos 950241265. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Replicate_Ignore_Server_Ids: Master_Server_Id: 1000403307 Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: 241105 19:06:59 Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235 Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163234 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) greatsql> SET sql_log_bin=0; Query OK, 0 rows affected (0.00 sec) greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k); Query OK, 0 rows affected (4 min 51.92 sec) Records: 0 Duplicates: 0 Warnings: 0 greatsql> START SLAVE; Query OK, 0 rows affected, 1 warning (0.02 sec) greatsql> SHOW SLAVE STATUS \G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.139.230 Master_User: greatsql Master_Port: 3307 Connect_Retry: 60 Master_Log_File: binlog.000078 Read_Master_Log_Pos: 953750186 Relay_Log_File: relaylog.000231 Relay_Log_Pos: 953750396 Relay_Master_Log_File: binlog.000078 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 953750186 Relay_Log_Space: 953750683 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1000403307 Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235 Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-163235 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set, 1 warning (0.00 sec) 

四、思考

01、如何保持主从一致性

02、正在回放数据时,STOP SLAVE 是否会有问题

03、如果主从结构作为数据节点,上层还有计算节点该如何保持元数据的一致性即计算节点记录的表的结构和数据节点的表结构要一致

关于以上三个问题我的处理方案如下:

1、如何保持主从一致性

#为保持主从表结构一致,主节点添加索引 greatsql> SET sql_log_bin=0; greatsql> ALTER TABLE qj.sbtest1 ADD index tmp_k(k); 

2、正在回放数据时,STOP SLAVE是否会有问题

greatsql> STOP SLAVE --当该GTID回放很长时间时,stop可能会很长时间,我们可以采用 kill 线程方式结束回放 greatsql> SHOW PROCESSLIST; --找到会话ID greatsql> KILL 会话号 greatsql> SHOW SLAVE STATUS \G -- 添加索引即可 

3、如果主从结构作为数据节点,上层还有计算节点该如何保持元数据的一致性即计算节点记录的表的结构和数据节点的表结构要一致

-- 当从节点添加上索引之后,在计算节点即代理层给表添加索引,当然该索引名不要与临时添加索引名重复 -- 添加上索引之后,在将从节点上临时添加的索引删除即可 greatsql> STOP SLAVE; greatsql> SHOW SLAVE STATUS \G greatsql> SET sql_log_bin=0; greatsql> ALTER table qj.sbtest1 DROP index tmp_k; greatsql> START SLAVE; greatsql> SHOW SLAVE STATUS \G 

五、总结

加速无索引表的回放主要是针对该表上进行 DELETEUPDATE 操作时有效,而且我们操作的时候要注意是否能记录binlog日志,保证主从数据一致性,当从节点作为数据节点即存储节点的角色时,要关注计算节点即代理层元数据与存储节点元数据的一致性或者是符合计算节点关于该表的定义规则。

Enjoy GreatSQL :)

关于 GreatSQL

GreatSQL是适用于金融级应用的国内自主开源数据库,具备高性能、高可靠、高易用性、高安全等多个核心特性,可以作为MySQL或Percona Server的可选替换,用于线上生产环境,且完全免费并兼容MySQL或Percona Server。

相关链接: GreatSQL社区 Gitee GitHub Bilibili

GreatSQL社区:

image

社区有奖建议反馈: https://greatsql.cn/thread-54-1-1.html

社区博客有奖征稿详情: https://greatsql.cn/thread-100-1-1.html

(对文章有疑问或者有独到见解都可以去社区官网提出或分享哦~)

技术交流群:

微信&QQ群:

QQ群:533341697

微信群:添加GreatSQL社区助手(微信号:wanlidbc )好友,待社区助手拉您进群。

原文链接:https://my.oschina.net/GreatSQL/blog/17553859
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章