Percona Backup for MongoDB supports physical backup

Version 1.7.0 of Percona Backup for MongoDB(PBM), released in April 2022, began supporting physical backups.

The physical backup of pbm is based on the backupCursors feature of PSMDB, which means that to use physical backup, you must use Percona Server for Mongodb.

backup

On each replica set, pbm uses $backupCursor to iterate through the list of files that need to copy the archive backup. Once you have the list, the next step is to make sure the cluster is consistent. Each replica set publishes a cluster time of the most recent operation observed. The backup leader elects the most recent one. This is the backup timestamp (last_write_ts) in the backup metadata. After agreeing on the backup time, pbm-agent opens $backupCursorExtend on each cluster node. This cursor will only return results when the node reaches the given timestamp. Therefore, the returned log list (hournals) will contain consistent backup timestamps. At this point, we have a list of files to back up. At each node copy to storage, save the metadata, close the cursor, this is the process of backup. If you want to know about backup cursors, you can refer to https://www.percona.com/blog/2021/06/07/experimental-feature-backupcursorextend-in-percona-server-for-mongodb/

Of course, pbm has done more work internally, selecting appropriate nodes for backup, coordinating operations between clusters, logging, error handling, and more.

The restore timestamp of the backup

Restore any of the backups and pbm will return the cluster to a specific point in time. The time discussed here is not the wall clock, but the logical clock of the mongodb cluster. Therefore, based on the point in time, it is consistent on all nodes. For logical or physical backups, the time is recorded in the conplete section of the pbm list in the pbm status output. for example:

2022-04-19T15:36:14Z 22.29GB <physical> [complete: 2022-04-19T15:36:16]2022-04-19T14:48:40Z 10.03GB <logical> [complete: 2022-04-19T14:58:38]

This time is not when the backup ends, but when the cluster state is captured. In pbm's logical backup, the recovery timestamp is close to the backup completion time. In order to define these, pbm needs to wait until the end of the snapshot on all nodes. Then, from the backup start time, start capturing the oplog.

Keeping the cursor open ensures that the checkpoint data does not change during the backup. This way pbm can define the correct completion time in advance.

reduction

Restoration requires some consideration.

First, the files in the backup may contain some files outside the target time (commonBackupTimestamp). To deal with this problem, pbm uses a special function of the replication subsystem to set the limit for the oplog to be restored, by setting the value of oplogTruncateAfterPoint, which is in the replset.oplogTruncateAfterPoint collection of the local db.

In addition to oplogTruncateAfterPoint, the database needs some other modifications to be cleaned up before startup. This requires multiple restarts of psmdb in standalone mode.

This in turn brings some trouble to the pbm operation. To communicate and coordinate their work among all agents, PBM relies on PSMDB itself. But once the cluster is shut down, the PBM must switch to communicating through storage. Also, PBM cannot store its logs in the database during standalone operation. Therefore, at some point during the restore, the pbm-agent logs are only available in the agent's stderr. And the pbm logs won't be able to access them. We plan to address this by physically backing up the GA.

Additionally, we must decide on the recovery strategy in the replica set. One way is to restore one node, then delete all data on the remaining nodes, and let PSMDB replication do the work. Although it's a little easier, it means that the cluster will be of little use until InitialSync is complete. Also, logical replication at this stage ignores almost all the speed benefits that physical restore brings to the table (later). So we set out to restore every node in the replica set. And make sure that after the cluster starts, no node will notice any difference and won't start ReSync.

As with PBM's logical backup, it is currently possible to restore a physical once to a cluster with the same topology, which means that the replica set name in the backup should match the target cluster. Although logical backups from the next PBM release will not be a problem. Later this feature will also be extended to physical backups. In addition to this, there may be more replica sets in the cluster than in the backup and vice versa. This means that all data from the backup should be restored.

performance review

Use the following environment:

• A three-node replica set. Each node is mongod+pbm-agent, 16GB, 8vCPU

Storage: nyc3.digitaloceanspaces.com

Data volume: randomly generated files with a size of 1mb

 

In general, logical backups are more beneficial for small databases (hundreds of megabytes). At this scale, the additional overhead on top of the data that physical files bring can still have an impact. Basically, only reading/writing user data during logical backups means less data needs to be transferred over the network. But as the database grows, the overhead of logical read (select) and write (insert) becomes the bottleneck of logical backup. As for physical backups, speed is almost always limited only by network bandwidth to and from remote storage. In our tests, recovery times for physical backups scaled linearly with dataset size, while logical recovery times increased non-linearly. The more data you have, the longer it will take to replay all the data and rebuild the indexes. For example, for a 600GB dataset, a physical restore takes 5 times less time than a logical restore.

But on smaller database sizes, the difference is negligible - a few minutes. Therefore, the main benefit of logical backups goes beyond performance. This is flexibility. Logical backups allow partial backup/restore of databases (on the roadmap for PBM). You can choose a specific database and/or collection to use. Since physical backups work directly with database storage engine files, they operate in an all-or-nothing framework.

 

practise

pbm configuration

Starting from version 1.7.0, the user who runs the first pbm-agent process must be able to read and write the data directory of psmdb. Since version 1.7.0, the user has been changed from pbm to mongod.

Also, keep in mind that to use physical backups, psmdb must be version 4.2.15-16, 4.4.6-8 or higher. Hot backups and backup cursors were introduced from these releases.

Create backup

In the new version of pbm, users can specify whether it is a physical backup or a logical backup. The default is a logical backup:

> pbm backupStarting backup '2022-04-20T11:12:53Z'....Backup '2022-04-20T11:12:53Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started​> pbm backup -t physicalStarting backup '2022-04-20T12:34:06Z'....Backup '2022-04-20T12:34:06Z' to remote store 's3://https://storage.googleapis.com/pbm-bucket' has started​> pbm status -s cluster -s backupsCluster:========rs0:  - rs0/mongo1.perconatest.com:27017: pbm-agent v1.7.0 OK  - rs0/mongo2.perconatest.com:27017: pbm-agent v1.7.0 OK  - rs0/mongo3.perconatest.com:27017: pbm-agent v1.7.0 OKBackups:========S3 us-east-1 s3://https://storage.googleapis.com/pbm-bucket  Snapshots:    2022-04-20T12:34:06Z 797.38KB <physical> [complete: 2022-04-20T12:34:09]    2022-04-20T11:12:53Z 13.66KB <logical> [complete: 2022-04-20T11:12:58]

point-in-time recovery

Currently only point-in-time recovery of logical backups is supported. This means that pbm-agent needs a logical backup snapshot to start periodically saving consecutive slices of the oplog. Physical backups can still be made with PITR enabled, it doesn't break or change the oplog save process.

The recovery process to a specific point in time will also use the corresponding logical backup snapshots and oplog slices, which will be replayed on top of the backup.

Check logs

During physical backup, pbm logs can be viewed with the pbm logs command

> pbm logs -e backup/2022-04-20T12:34:06Z2022-04-20T12:34:07Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup started2022-04-20T12:34:12Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading files2022-04-20T12:34:54Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] uploading done2022-04-20T12:34:56Z I [rs0/mongo2.perconatest.com:27017] [backup/2022-04-20T12:34:06Z] backup finished

As for restore, the pbm logs command does not provide information about restoring from a physical backup. This is caused by the peculiarities of the restore process and will be improved in the upcoming PBM version. However, pbm-agent still saves logs locally, so information about the restore process on each node can be checked:

> sudo journalctl -u pbm-agent.service | grep restorepbm-agent[12560]: 2022-04-20T19:37:56.000+0000 I [restore/2022-04-20T12:34:06Z] restore started.......pbm-agent[12560]: 2022-04-20T19:38:22.000+0000 I [restore/2022-04-20T12:34:06Z] copying backup data.......pbm-agent[12560]: 2022-04-20T19:38:39.000+0000 I [restore/2022-04-20T12:34:06Z] preparing data.......pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished <nil>pbm-agent[12560]: 2022-04-20T19:39:12.000+0000 I [restore/2022-04-20T12:34:06Z] restore finished successfully

restore from backup

The restore process from a physical backup is similar to a logical backup, but requires several additional steps after the PBM has finished restoring.

> pbm restore 2022-04-20T12:34:06ZStarting restore from '2022-04-20T12:34:06Z'.....Restore of the snapshot from '2022-04-20T12:34:06Z' has started. Leader: mongo1.perconatest.com:27017/rs0

After starting the restore process, the pbm cli will return the id of the leader node, so that the restore process can be tracked by checking the log of the pbm-agent leader node. Additionally, the state is written to a remotely stored metadata file. The state file is created at the root of the store in the format:

.pbm.restore/<restore_timestamp>.json

The -w flag can also be used during restore, which blocks the current shell session waiting for the restore process to complete.

> pbm restore 2022-04-20T12:34:06Z -wStarting restore from '2022-04-20T12:34:06Z'....Started physical restore. Leader: mongo2.perconatest.com:27017/rs0Waiting to finish...........................Restore successfully finished!

 

After the restore is complete, the following steps need to be completed:

Restart all mongod nodes

·Restart pbm-agent

Run the following command to resync the stored backup list

$ pbm config --force-resync

 

Tags: MongoDB

Posted by BruceRowe on Thu, 02 Jun 2022 11:09:11 +0530