How to Handle pt-table-checksum Errors
pt-table-checksum is a robust tool that validates data between master/slaves in a replication environment. However, in some scenarios, the task can be quite challenging.
Join the DZone community and get the full member experience.
Join For FreeIn this post, we'll look at how to approach pt-table-checksum errors.
pt-table-checksum is one of the most popular tools in the Percona Toolkit, and it is widely used to identify data differences between masters and slaves. Therefore, as Percona Support Engineers, we have customers often asking questions related to the pt-table-checksum errors and warnings produced. Below are the most common issues raised with pt-table-checksum, and we decided to address those issues to help with how to mitigate related warnings or errors.
Unable to Detect Slaves
Cannot connect to h=127.0.0.1,p=...,u=percona
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
It's possible that the tool cannot connect to the slaves due to not-specific-enough information found on the master. By default, it is looking for slaves based on the replica threads visible in master's processlist. This could be the problem if, for example, the slave's MySQL runs with a different TCP port, the hostname is not resolved correctly, both the master and slave are on the same host, or it is Galera-based replication. In this case, there is the -recursion-method option to try with different discovery methods: hosts or cluster. And if all of them fail, you can specify each slave details manually using the dsn method.
An example using this option for the cluster looks like this:
# pt-table-checksum --user=root --password=*** --databases="db1" --recursion-method=cluster 192.168.88.82
Checking if all tables can be checksummed ...
Starting checksum ...
Not checking replica lag on pxc02 because it is a cluster node.
Not checking replica lag on pxc03 because it is a cluster node.
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
03-03T00:24:13 0 0 12 1 0 0.033 db1.t1
03-03T00:24:13 0 0 4 1 0 0.031 db1.t2
And when a DSN is needed (like for mysqlsandbox instances), we have to add the slave details to the table, similar to below:
master [localhost] {msandbox} ((none)) > create table percona.dsns (id int(11) NOT NULL AUTO_INCREMENT,parent_id int(11) DEFAULT NULL,dsn varchar(255) NOT NULL,PRIMARY KEY (id));
Query OK, 0 rows affected (0.08 sec)
master [localhost] {msandbox} ((none)) > insert into percona.dsns values (null,null,"h=localhost,S=/tmp/mysql_sandbox20997.sock");
Query OK, 1 row affected (0.03 sec)
$ pt-table-checksum --databases="test" --tables="s1" --recursion-method=dsn=localhost,D=percona,t=dsns u=root,p=msandbox,h=localhost,S=/tmp/mysql_sandbox20996.sock
Checking if all tables can be checksummed ...
Starting checksum ...
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
03-19T14:16:05 0 1 0 1 0 0.344 test.s1
ROW Format on Slave
Replica slave1.myorg.com has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.
The problem is that second- and next-level replicas (in chain replication topology) will not calculate the diffs as expected. So, this message warns that the slave is using binlog_format=ROW, as the tool needs STATEMENT format to calculate the diffs separately on the slave and master. This is done by replicating the command (e.g., INSERT INTO percona.checksum SELECT CRC32 .... WHERE ... ) as the original statement, not as a row copy of CRC values already computed on the master. And that is possible, as the tool sets the binlog_format=STATEMENT in its session. This session setting does not propagate further into the slave's own binary log, though. This is not a problem when all the slaves are replicating directly from the master, and in such cases, we can ignore that message and use the -no-check-binlog-format option.
By the way, the warning message is misleading regarding breaking replication claim, hence the bug reported.
Unable to Switch Session binlog_format to STATEMENT
# pt-table-checksum --user=root --password=cmon --databases="db1" --recursion-method=cluster 192.168.88.82
03-02T23:54:50 Failed to /*!50108 SET @@binlog_format := 'STATEMENT'*/: DBD::mysql::db do failed: Percona-XtraDB-Cluster prohibits setting binlog_format to STATEMENT or MIXED with pxc_strict_mode = ENFORCING or MASTER [for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/"] at /bin/pt-table-checksum line 10064.
This tool requires binlog_format=STATEMENT, but the current binlog_format is set to ROW and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.
Or:
$ pt-table-checksum -h przemek-aurora57.xxx.rds.amazonaws.com -u przemek -p xxx --databases="test"
02-19T12:51:01 Failed to /!50108 SET @@binlog_format := 'STATEMENT'/: DBD::mysql::db do failed: Access denied; you need (at least one of) the SUPER privilege(s) for this operation for Statement "/*!50108 SET @@binlog_format := 'STATEMENT'*/" at /usr/bin/pt-table-checksum line 10023.
This tool requires binlog_format=STATEMENT, but the current binlog_format is set to ROW and an error occurred while attempting to change it. If running MySQL 5.1.29 or newer, setting binlog_format requires the SUPER privilege. You will need to manually set binlog_format to 'STATEMENT' before running this tool.
This can be an issue if STATEMENT mode is unsupported in the MySQL variant or special edition of it — Amazon RDS, for example — or when switching is prohibited either by lack of SUPER privilege (limitation for Amazon Aurora) or the Percona XtraDB Cluster Strict Mode safety precaution, as seen in the example above. To work around it in Percona XtraDB Cluster, temporarily relaxing the strict mode (be careful, as this may be dangerous) will work:
pxc01 > set global pxc_strict_mode="permissive";
Query OK, 0 rows affected (0.00 sec)
For Aurora, though (only if asynchronous replication is used between Aurora clusters or from Aurora to non-Aurora MySQL), you will have to change the binlog_format globally to STATEMENT using the option groups.
Too Large Chunk Size or No Good Index
Cannot checksum table db_name.table_name: There is no good index and the table is oversized. at /usr/bin/pt-table-checksum line 6662.
Or:
Skipping table because on the master it would be checksummed in one chunk but on these replicas it has too many rows:
xxxxx rows on db_name.table_name
The current chunk size limit is xxxxx rows (chunk size=xxxx * chunk size limit=5).
Instead of examining each table with a single big query, the pt-table-checksum splits tables into chunks to ensure that the checksum is non-intrusive and doesn't cause too much replication lag or load on the server. To create these chunks, it needs an index of some sort (preferably a primary key or unique index). If there is no index and the table contains a suitably small number of rows, the tool tries to checksum the table in a single chunk.
Skipping the table, as in the second message example, is a common issue with pt-table-checksum and can be caused by different/outdated table statistics on the master or slave side. To alleviate this issue, make sure all your tables contain a primary or unique key. pt-table-checksum requires that to divide a table into chunks effectively. We also suggest that you make sure that these messages are not related to real differences in this table (maybe a row count is significantly different). Also, executing pt-table-checksum with PTDEBUG is a good idea, as it captures a lot of debugging info and provides better insight into what is causing the issue.
There can be some random skipping of tables across many tool runs, and it's probably because of a mix of two variables. One is innodb_stats_on_metadata. Turn it off, at least during the checksum running, so that InnoDB index stats won't change so often. We remind you it's a dynamic variable, which means that you can change it without a MySQL Server restart. On the other hand, if constant statistics change for a table (even though the innodb_stats_on_metadata=0, statistics change with each significant amount of writes) is a problem, you may want to disable it for the duration of checksum. Check the innodb_stats_auto_update option in Percona Server for MySQL for details.
pt-table-checksum uses an EXPLAIN query to determine the number of rows in the chunk, so ever-changing table statistics are most likely the reason for skipped tables. This is where pt-table-checksum decides whether to skip a chunk. This avoids a scenario in which a table has fewer rows on the master but many on a replica, and is checksummed in a single large query, which causes a very long delay in replication. This is also affected by -chunk-size-limit, which defaults to 2. Try setting up higher chunk-size-limit or chunk-time so that pt-table-checksum allows larger chunks, but do it during off-peak periods. Of course, allowing too big of a chunk makes the server suffer for heavy selects, and slave lag may also be a problem while -chunk-time adjusts the chunk size dynamically so that the checksum query executes in a defined amount of time.
For tables that can't be chunked and must be checksummed in a single run, the chunk size should be sufficiently large and sometimes is not enough. That's where the chunk-size-limit comes into play. The -chunk-size-limit modifier is a multiplier for chunk-size and allows larger chunks. To make sure that your server is not heavily loaded, you can set a threshold at which pt-table-checksum pauses itself. This can be done by using the --max-load parameter of pt-table-checksum. So, in this way -chunk-time and -chunk-size-limit won't noticeably impact your server. We would suggest to start with default value -chunk-size-limit and increase it gradually until it succeeds. High values of -chunk-size-limit guarantee higher rates of successful runs, but there's no way to tell if it will always be successful because the number of rows processed is only an estimate. It's worth mentioning that you can also try running ANALYZE TABLE on "skipped tables" before running pt-table-checksum to make sure statistics are up-to-date. This may help or may not help, as statistics are estimated and it still might not be inaccurate.
Also, scripting retries of skipped chunks can be a good approach. You can redirect the pt-table-checksum output to a log file and parse that log to find out which tables need to be re-tried separately. You can do many re-tries for a single table if necessary, and the checksum result for a particular table in the checksums table gets overwritten without affecting other results.
All the problems described above will not take place when a table has a primary key on auto_increment int column.
Suboptimal Query Plan
Skipping chunk 1 of db_name.table_name because MySQL used only 3 bytes of the PRIMARY index instead of 9. See the --[no]check-plan documentation for more information.
The tool uses several heuristics to determine whether an execution plan is good or bad. The first is whether EXPLAIN reports that MySQL intends to use the desired index to access the rows. If MySQL chooses a different index, the tool considers the query unsafe. The tool also checks how much of the index MySQL reports that it uses for the query. The EXPLAIN output shows this in the key_len column. The tool remembers the largest key_len and skips chunks where MySQL reports that it uses a smaller prefix of the index. However, it stretches the overall time to run checksum as it runs several heuristics to decide whether execution path is good or bad. This helps to decide the chunk. By default, -check-plan is on. It can bring a little bit of additional load to the server, but if that's the case, you can always monitor the checksum progress during execution and cancel pt-table-checksum at any moment if necessary. In general, it's good to keep it enabled. Further, it's best to run pt-table-checksum during low database traffic time.
To deal with the above error, disable the feature by using -no-check-plan when you get one "Skipping chunk" error. The only drawback of using it is leaving the door open for possible (costly) table scans.
Missing or Filtered Tables on the Slave
Error checksumming table test.dummy: Error getting row count estimate of table test.dummy on replica centos1.bm.int.percona.com: DBD::mysql::db selectrow_hashref failed: Table 'test.dummy' doesn't exist [for Statement "EXPLAIN SELECT * FROM `test`.`dummy` WHERE 1=1"] at pt-table-checksum line 6607.
The above error is clear that table test.dummy exists on the master but is missing on the slave server. This usually occurs with replication filters. pt-table-checksum failed because test.dummy checksummed on the master while failed on replica to checksum. This can be easily reproduced as per the below example:
mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.3.164
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000001
Read_Master_Log_Pos: 704
Relay_Log_File: centos1-relay-bin.000002
Relay_Log_Pos: 684
Relay_Master_Log_File: master-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: test
[root@slave1]# perl pt-table-checksum --empty-replicate-table --no-check-replication-filters --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
02-04T03:14:07 Skipping table test.dummy because it has problems on these replicas:
Table test.dummy does not exist on replica slave1
This can break replication. If you understand the risks, specify --no-check-slave-tables to disable this check.
02-04T03:14:07 Error checksumming table test.dummy: Error getting row count estimate of table test.dummy on replica slave1: DBD::mysql::db selectrow_hashref failed:
Table 'test.dummy' doesn't exist [for Statement "EXPLAIN SELECT * FROM `test`.`dummy` WHERE 1=1"] at pt-table-checksum line 6607.
As per the above example, the 'test' database is ignored to replicate via replication filter Replicate_Ignore_DB, which means any updates on that database will not fall to slave.
Waiting to check replicas for differences: 0% 00:00 remain
Waiting to check replicas for differences: 0% 00:00 remain
.
Waiting to check replicas for differences: 0% 00:00 remain
Waiting to check replicas for differences: 0% 00:00 remain
.
That is actually not an error, but it means that pt-table-checksum is waiting on replicas to run checksum queries. We have customers reporting that the tool runs forever and never came out from "Waiting to check replicas for differences." We noticed this problem occurs when database tables exist on replicas but are ignored by replication filters — because pt-table-checksum checksums each chunk with an INSERT/REPLACE...SELECT query and those queries from the master never fall to replicas via replication because the tables in question are blocked by replication filters. So, the tool waits forever to check the checksum result on replicas, which will never happen. To remedy this issue, use the -ignore-databases or -ignore-tables option to ignore filtered tables from the checksum process.
Replication filters can bring unexpected issues as the last two warnings/errors demonstrated.
Conclusion
pt-table-checksum is a robust tool that validates data between master/slaves in a replication environment. However, in some scenarios, the task can be quite challenging. Fortunately, there are options to deal with these obstacles. Some, however, involve not only using specific options for the tool but also properly (re-)designing your schema. A proper primary key may not only allow the tool to work much faster, less expensive, but sometimes to work at all.
Published at DZone with permission of Przemyslaw Malkowski, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments