Backup and Recovery Best Practices: Data Protection and Disaster Recovery

Share

TL;DR

The #1 backup security best practice is following the 3-2-1 rule: 3 copies, 2 different media types, 1 offsite. Encrypt backups at rest and in transit. Test restores regularly (untested backups are not backups). Automate backup verification and set up alerts for failures. Document your recovery procedures.

"A backup you haven't tested is just a hope. A backup you've restored is a guarantee."

Best Practice 1: The 3-2-1 Backup Rule 5 min

A proven strategy for data protection:

  • 3 copies of your data (production + 2 backups)
  • 2 different media types (database + object storage)
  • 1 offsite location (different region or provider)
PostgreSQL backup strategy example
# Automated backup script
#!/bin/bash
set -e

DATE=$(date +%Y-%m-%d-%H%M)
BACKUP_FILE="backup-${DATE}.sql.gz"

# Create encrypted backup
pg_dump $DATABASE_URL | gzip | \
  gpg --symmetric --cipher-algo AES256 \
      --passphrase-file /secrets/backup-key \
      --batch -o "/backups/${BACKUP_FILE}.gpg"

# Upload to primary storage (same region)
aws s3 cp "/backups/${BACKUP_FILE}.gpg" \
  "s3://backups-primary/${BACKUP_FILE}.gpg" \
  --storage-class STANDARD_IA

# Upload to offsite storage (different region)
aws s3 cp "/backups/${BACKUP_FILE}.gpg" \
  "s3://backups-offsite/${BACKUP_FILE}.gpg" \
  --region eu-west-1 \
  --storage-class GLACIER

# Verify backup integrity
aws s3api head-object \
  --bucket backups-primary \
  --key "${BACKUP_FILE}.gpg"

echo "Backup completed: ${BACKUP_FILE}"

Best Practice 2: Encrypt All Backups 10 min

Backups are high-value targets for attackers:

Encryption TypeWhen to UseKey Management
Server-side (SSE-S3)Basic protectionAWS managed
Server-side (SSE-KMS)Audit requirementsKMS with rotation
Client-sideMaximum securityYou manage keys
Client-side encryption before upload
import { createCipheriv, randomBytes } from 'crypto';
import { pipeline } from 'stream/promises';
import { createGzip } from 'zlib';

async function encryptBackup(inputStream, outputPath) {
  // Generate unique key for this backup
  const key = randomBytes(32);
  const iv = randomBytes(16);

  // Store key securely (e.g., in Secrets Manager)
  await storeBackupKey(outputPath, { key, iv });

  const cipher = createCipheriv('aes-256-gcm', key, iv);
  const gzip = createGzip();

  await pipeline(
    inputStream,
    gzip,
    cipher,
    fs.createWriteStream(outputPath)
  );

  // Return auth tag for verification
  return cipher.getAuthTag();
}

Best Practice 3: Test Restores Regularly 15 min

A backup you have not tested is not a backup:

Automated restore testing
# Weekly restore test script
#!/bin/bash
set -e

echo "Starting restore test..."

# Get latest backup
LATEST=$(aws s3 ls s3://backups-primary/ | sort | tail -1 | awk '{print $4}')

# Download and decrypt
aws s3 cp "s3://backups-primary/${LATEST}" /tmp/restore-test.gpg
gpg --decrypt --passphrase-file /secrets/backup-key \
    --batch /tmp/restore-test.gpg | gunzip > /tmp/restore-test.sql

# Restore to test database
createdb restore_test_db
psql restore_test_db < /tmp/restore-test.sql

# Run verification queries
USERS=$(psql restore_test_db -t -c "SELECT count(*) FROM users")
ORDERS=$(psql restore_test_db -t -c "SELECT count(*) FROM orders")

# Compare with production counts (within 1% tolerance)
PROD_USERS=$(psql $DATABASE_URL -t -c "SELECT count(*) FROM users")
if [ $(echo "$USERS < $PROD_USERS * 0.99" | bc) -eq 1 ]; then
  echo "ALERT: User count mismatch"
  exit 1
fi

# Cleanup
dropdb restore_test_db
rm /tmp/restore-test.*

echo "Restore test passed!"
echo "Users: ${USERS}, Orders: ${ORDERS}"
  • Test full restores monthly
  • Test partial restores (single table) weekly
  • Measure restore time (RTO)
  • Verify data integrity after restore
  • Document any issues found

Best Practice 4: Define RTO and RPO 10 min

Know your recovery requirements:

MetricDefinitionTypical Values
RPO (Recovery Point Objective)Maximum acceptable data loss1 hour to 24 hours
RTO (Recovery Time Objective)Maximum acceptable downtime15 min to 4 hours
Backup frequency based on RPO
# RPO: 1 hour = Backup every hour
0 * * * * /scripts/backup.sh

# RPO: 15 minutes = Use continuous replication
# PostgreSQL: streaming replication to standby
# AWS RDS: Enable automated backups with PITR

# RPO: Near-zero = Multi-region active-active
# Use database replication + application-level sync

# Terraform: RDS with point-in-time recovery
resource "aws_db_instance" "main" {
  backup_retention_period = 7
  backup_window           = "03:00-04:00"

  # Enable PITR for RPO of ~5 minutes
  enabled_cloudwatch_logs_exports = ["postgresql"]

  # Multi-AZ for high availability
  multi_az = true
}

Best Practice 5: Secure Backup Access 15 min

Limit who can access or delete backups:

S3 bucket policy for backup protection
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyDeleteExceptBackupAdmin",
      "Effect": "Deny",
      "Principal": "*",
      "Action": [
        "s3:DeleteObject",
        "s3:DeleteObjectVersion"
      ],
      "Resource": "arn:aws:s3:::backups-primary/*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalArn": "arn:aws:iam::123456789:role/BackupAdmin"
        }
      }
    },
    {
      "Sid": "RequireMFAForDelete",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:DeleteObject",
      "Resource": "arn:aws:s3:::backups-primary/*",
      "Condition": {
        "Bool": {
          "aws:MultiFactorAuthPresent": "false"
        }
      }
    }
  ]
}

// Enable Object Lock for immutable backups
// (ransomware protection)
aws s3api put-object-lock-configuration \
  --bucket backups-primary \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "GOVERNANCE",
        "Days": 30
      }
    }
  }'

Best Practice 6: Monitor Backup Health 10 min

Set up alerts for backup failures:

Backup monitoring and alerting
// Check backup age
async function checkBackupHealth() {
  const backups = await s3.listObjects({
    Bucket: 'backups-primary',
    Prefix: 'backup-',
  });

  const latest = backups.Contents
    .sort((a, b) => b.LastModified - a.LastModified)[0];

  const ageHours = (Date.now() - latest.LastModified) / (1000 * 60 * 60);

  if (ageHours > 25) {  // More than 1 day old
    await sendAlert({
      severity: 'critical',
      message: `Latest backup is ${ageHours.toFixed(1)} hours old`,
      runbook: 'https://wiki/runbooks/backup-failure',
    });
  }

  // Check backup size (detect empty or truncated backups)
  if (latest.Size < 1000000) {  // Less than 1MB
    await sendAlert({
      severity: 'critical',
      message: `Backup suspiciously small: ${latest.Size} bytes`,
    });
  }
}

// Run hourly
setInterval(checkBackupHealth, 60 * 60 * 1000);

Ransomware Protection: Use immutable backups (S3 Object Lock, Azure Immutable Blob) to prevent ransomware from encrypting or deleting your backups. Keep at least one backup copy completely air-gapped or on a different cloud provider.

Official Resources: For comprehensive backup and disaster recovery guidance, see AWS Backup and Recovery Prescriptive Guidance, Google Cloud Disaster Recovery Planning Guide, and Azure Backup Documentation.

How long should I retain backups?

Keep daily backups for 7-30 days, weekly backups for 3 months, and monthly backups for 1-7 years depending on compliance requirements. Use lifecycle policies to automatically transition to cheaper storage tiers.

Should I back up my Supabase/Firebase database?

Yes. While managed services have their own backups, you should maintain independent backups you control. Export data regularly and store it in your own cloud storage. This protects against account issues and vendor lock-in.

What about backing up file uploads?

Enable versioning on your storage bucket and replicate to a secondary region. For critical files, consider cross-cloud replication. Test that you can restore specific file versions, not just the latest.

Verify Your Backup Strategy

Check your backup configuration and recovery procedures.

Start Free Scan
Best Practices

Backup and Recovery Best Practices: Data Protection and Disaster Recovery