While the specifics of how to implement data backups will depend on the project, this article describes the objectives that we want to achieve.
Any and all application environments to which our clients or users have access to are backed up at least once a day. This includes all databases, and also ideally all messaging systems, queues or any sub systems that hold data that cannot be recreated or re-computed if lost.
The frequency of backups is calibrated with the client as the project and data being stored evolve.
Which parts of the application are backed up, how frequently and how, for each environment, is documented and easily accessible to everyone on the team and the client.
The team also documents which parts are not backed up (if any), as well as the reasons for that.
Backups are regularly tested (at least once a month) to ensure that the recovery process works as expected.
All production-environment backups are stored in at least 2 separate places. One in close proximity so that it can be easily applied if necessary. Another one outside of the project’s infrastructure, so if something happens to the entire infrastructure, the team doesn’t risk losing all data and backups as well.
A sensible recommendation when using AWS, for example, would be to have backups on S3 and another copy of them on Rackspace or DigitalOcean. If this is not possible, then at least replicating the backup in another S3 region would be acceptable.
All backups are encrypted. It’s recommended to use the industry standard AES-256 encryption algorithm.
Team members are trained and familiarized with listing existing backups, triggering the data backup process and triggering the data recovery process.
Backups are performed automatically without any intervention from the engineering team. Manual operations are reserved for extraordinary situations.
The process of backing up data or restoring data from a backup is monitored through the entire lifecycle. The engineering team is automatically notified of any problems uncovered by the monitoring system.
3rd-Party Operation Teams
Sometimes the team will find themselves working on projects in which they have no control over the infrastructure and someone else is responsible for all DevOps tasks. This is no excuse for not having backups. The team should explain why backups are crucial and work with other responsible parties and stakeholders until they are implemented. Any blocking issues for accomplishing this are escalated as a project risk.