Backup & Recovery

Comprehensive backup and disaster recovery strategies for MegaVault to ensure data protection, business continuity, and rapid recovery from system failures.

Backup Overview

A robust backup and recovery strategy is essential for protecting MegaVault data and ensuring business continuity in case of hardware failures, data corruption, or disasters.

Data Protection

Multi-layer backup approach

  • ✅ Database backups
  • ✅ File storage backups
  • ✅ Configuration backups
  • ✅ Application state

Recovery Objectives

Business continuity goals

  • ✅ RTO: 4 hours max
  • ✅ RPO: 1 hour max
  • ✅ 99.9% data integrity
  • ✅ Point-in-time recovery

Backup Types

Comprehensive coverage

  • ✅ Full backups
  • ✅ Incremental backups
  • ✅ Differential backups
  • ✅ Continuous replication
💡

Backup Principles

Follow the 3-2-1 backup rule: 3 copies of data, 2 different storage types, 1 offsite backup for maximum protection.

Backup Strategy

Comprehensive backup strategy covering all MegaVault components and data types.

Backup Components

ComponentFrequencyRetentionMethod
Redis DatabaseEvery 4 hours30 daysRDB snapshots + AOF
User FilesDaily90 daysCross-region sync
ConfigurationOn change1 yearGit repository
Application LogsDaily30 daysLog aggregation

Backup Storage Locations

  • Primary: Same region as production (for quick recovery)
  • Secondary: Different region (for disaster recovery)
  • Tertiary: Different cloud provider or on-premises
  • Archive: Long-term cold storage for compliance

Backup Schedule

Backup Schedule Configuration
# Crontab entries for automated backups

# Redis backup every 4 hours
0 */4 * * * /scripts/backup-redis.sh

# File storage backup daily at 2 AM
0 2 * * * /scripts/backup-files.sh

# Configuration backup on changes (via Git hooks)
# Handled by CI/CD pipeline

# Log backup daily at 1 AM
0 1 * * * /scripts/backup-logs.sh

# Weekly full system backup on Sundays at midnight
0 0 * * 0 /scripts/backup-full.sh

Automated Backups

Implement automated backup procedures for consistent and reliable data protection.

Redis Backup Script

Redis Backup Automation
#!/bin/bash
# backup-redis.sh

set -e

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/redis"
REDIS_DATA_DIR="/var/lib/redis"
S3_BUCKET="megavault-backups"
LOG_FILE="/var/log/megavault/backup.log"

echo "[$TIMESTAMP] Starting Redis backup" >> $LOG_FILE

# Create backup directory
mkdir -p $BACKUP_DIR

# Create Redis snapshot
redis-cli --rdb $BACKUP_DIR/dump_$TIMESTAMP.rdb

# Compress backup
gzip $BACKUP_DIR/dump_$TIMESTAMP.rdb

# Upload to S3
aws s3 cp $BACKUP_DIR/dump_$TIMESTAMP.rdb.gz s3://$S3_BUCKET/redis/

# Verify backup integrity
if aws s3 ls s3://$S3_BUCKET/redis/dump_$TIMESTAMP.rdb.gz; then
    echo "[$TIMESTAMP] ✓ Redis backup completed successfully" >> $LOG_FILE
    
    # Cleanup local backup (keep last 3)
    ls -t $BACKUP_DIR/dump_*.rdb.gz | tail -n +4 | xargs -r rm
else
    echo "[$TIMESTAMP] ✗ Redis backup failed" >> $LOG_FILE
    exit 1
fi

File Storage Backup

File Storage Backup Script
#!/bin/bash
# backup-files.sh

set -e

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SOURCE_BUCKET="megavault-storage"
BACKUP_BUCKET="megavault-backup"
LOG_FILE="/var/log/megavault/backup.log"

echo "[$TIMESTAMP] Starting file storage backup" >> $LOG_FILE

# Sync files to backup bucket
aws s3 sync s3://$SOURCE_BUCKET s3://$BACKUP_BUCKET/files_$TIMESTAMP/     --exclude "*/temp/*"     --exclude "*/cache/*"     --storage-class STANDARD_IA

if [ $? -eq 0 ]; then
    echo "[$TIMESTAMP] ✓ File storage backup completed" >> $LOG_FILE
    
    # Create backup manifest
    aws s3 ls s3://$BACKUP_BUCKET/files_$TIMESTAMP/ --recursive > /tmp/backup_manifest.txt
    aws s3 cp /tmp/backup_manifest.txt s3://$BACKUP_BUCKET/manifests/files_$TIMESTAMP.txt
    
    # Update latest backup pointer
    echo "files_$TIMESTAMP" | aws s3 cp - s3://$BACKUP_BUCKET/latest_files.txt
else
    echo "[$TIMESTAMP] ✗ File storage backup failed" >> $LOG_FILE
    exit 1
fi

echo "[$TIMESTAMP] File storage backup process completed" >> $LOG_FILE

Database Backup Verification

Backup Verification Script
#!/bin/bash
# verify-backup.sh

BACKUP_FILE=$1
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
TEMP_DIR="/tmp/backup_verify_$TIMESTAMP"
LOG_FILE="/var/log/megavault/backup-verify.log"

echo "[$TIMESTAMP] Verifying backup: $BACKUP_FILE" >> $LOG_FILE

# Create temporary directory
mkdir -p $TEMP_DIR

# Download and extract backup
aws s3 cp $BACKUP_FILE $TEMP_DIR/backup.rdb.gz
gunzip $TEMP_DIR/backup.rdb.gz

# Start temporary Redis instance for verification
redis-server --port 6380 --dir $TEMP_DIR --dbfilename backup.rdb --daemonize yes

# Test basic operations
if redis-cli -p 6380 ping | grep -q PONG; then
    echo "[$TIMESTAMP] ✓ Backup verification successful" >> $LOG_FILE
    redis-cli -p 6380 shutdown
    rm -rf $TEMP_DIR
    exit 0
else
    echo "[$TIMESTAMP] ✗ Backup verification failed" >> $LOG_FILE
    redis-cli -p 6380 shutdown 2>/dev/null || true
    rm -rf $TEMP_DIR
    exit 1
fi

Data Recovery

Step-by-step procedures for recovering data from backups in various scenarios.

1

Stop Redis Service

Stop the Redis service to prevent data corruption during recovery.

2

Download Backup

Download the appropriate backup file from the backup storage location.

3

Restore Database File

Replace the current Redis database file with the backup file.

4

Set Correct Permissions

Ensure the restored file has the correct ownership and permissions.

5

Start Redis Service

Start the Redis service and verify the data has been restored correctly.

6

Validate Recovery

Run tests to ensure the application functions correctly with restored data.

Redis Recovery Commands

Redis Recovery Script
#!/bin/bash
# recover-redis.sh

BACKUP_DATE=$1
if [ -z "$BACKUP_DATE" ]; then
    echo "Usage: $0 <backup_date> (format: YYYYMMDD_HHMMSS)"
    exit 1
fi

BACKUP_FILE="dump_${BACKUP_DATE}.rdb.gz"
REDIS_DATA_DIR="/var/lib/redis"
S3_BUCKET="megavault-backups"

echo "Starting Redis recovery from backup: $BACKUP_FILE"

# Stop Redis service
sudo systemctl stop redis-server

# Backup current data (just in case)
sudo cp $REDIS_DATA_DIR/dump.rdb $REDIS_DATA_DIR/dump.rdb.pre-recovery

# Download and restore backup
aws s3 cp s3://$S3_BUCKET/redis/$BACKUP_FILE /tmp/
gunzip /tmp/$BACKUP_FILE
sudo cp /tmp/dump_${BACKUP_DATE}.rdb $REDIS_DATA_DIR/dump.rdb

# Set correct permissions
sudo chown redis:redis $REDIS_DATA_DIR/dump.rdb
sudo chmod 660 $REDIS_DATA_DIR/dump.rdb

# Start Redis service
sudo systemctl start redis-server

# Verify recovery
if redis-cli ping | grep -q PONG; then
    echo "✓ Redis recovery completed successfully"
    
    # Test data integrity
    redis-cli info keyspace
else
    echo "✗ Redis recovery failed"
    exit 1
fi

File Recovery Process

File Recovery Script
#!/bin/bash
# recover-files.sh

BACKUP_DATE=$1
RECOVERY_PATH=$2

if [ -z "$BACKUP_DATE" ] || [ -z "$RECOVERY_PATH" ]; then
    echo "Usage: $0 <backup_date> <recovery_path>"
    exit 1
fi

BACKUP_BUCKET="megavault-backup"
SOURCE_PATH="files_${BACKUP_DATE}/"

echo "Starting file recovery from backup: $BACKUP_DATE"

# Create recovery directory
mkdir -p $RECOVERY_PATH

# Download files from backup
aws s3 sync s3://$BACKUP_BUCKET/$SOURCE_PATH $RECOVERY_PATH/

if [ $? -eq 0 ]; then
    echo "✓ File recovery completed successfully"
    echo "Files recovered to: $RECOVERY_PATH"
    
    # Display recovery statistics
    echo "Recovery statistics:"
    find $RECOVERY_PATH -type f | wc -l | xargs echo "Files recovered:"
    du -sh $RECOVERY_PATH | xargs echo "Total size:"
else
    echo "✗ File recovery failed"
    exit 1
fi

Disaster Recovery

Comprehensive disaster recovery plan for complete system failure scenarios.

Disaster Recovery Scenarios

  • Data Center Outage: Complete loss of primary infrastructure
  • Regional Disaster: Natural disaster affecting entire region
  • Cyber Attack: Ransomware or data breach incident
  • Data Corruption: Widespread data corruption or loss

Recovery Time Objectives (RTO)

ScenarioRTO TargetRPO TargetAction Required
Single Component Failure30 minutes15 minutesAutomated failover
Data Center Outage4 hours1 hourManual intervention
Regional Disaster24 hours4 hoursFull DR activation
Cyber Attack48 hours24 hoursSecurity investigation

DR Activation Checklist

  • ☐ Assess the scope and impact of the disaster
  • ☐ Activate incident response team
  • ☐ Notify stakeholders and users
  • ☐ Activate backup infrastructure
  • ☐ Restore data from backups
  • ☐ Update DNS and routing
  • ☐ Verify system functionality
  • ☐ Monitor system performance
  • ☐ Document lessons learned

Backup Testing

Regular testing procedures to ensure backup integrity and recovery capabilities.

Testing Schedule

  • Daily: Automated backup verification
  • Weekly: Recovery test on non-production environment
  • Monthly: Full disaster recovery simulation
  • Quarterly: Cross-region recovery test

Backup Test Script

Automated Backup Testing
#!/bin/bash
# test-backup-recovery.sh

TIMESTAMP=\$(date +%Y%m%d_%H%M%S)
TEST_ENV="backup-test-\$TIMESTAMP"
LOG_FILE="/var/log/megavault/backup-test.log"

echo "[\$TIMESTAMP] Starting backup recovery test" >> \$LOG_FILE

# Create isolated test environment
docker-compose -f docker-compose.test.yml up -d --wait

# Get latest backup
LATEST_BACKUP=\$(aws s3 ls s3://megavault-backups/redis/ | sort | tail -1 | awk '{print \$4}')
BACKUP_DATE=\$(echo \$LATEST_BACKUP | sed 's/dump_\\(.*\\)\\.rdb\\.gz/\\1/')

# Restore backup to test environment
./recover-redis.sh \$BACKUP_DATE test

# Run application tests against restored data
npm run test:integration

if [ \$? -eq 0 ]; then
    echo "[\$TIMESTAMP] ✓ Backup recovery test passed" >> \$LOG_FILE
else
    echo "[\$TIMESTAMP] ✗ Backup recovery test failed" >> \$LOG_FILE
    exit 1
fi

# Cleanup test environment
docker-compose -f docker-compose.test.yml down -v

echo "[\$TIMESTAMP] Backup recovery test completed" >> \$LOG_FILE

Backup Monitoring

Monitor backup processes and ensure they complete successfully.

Backup Monitoring Metrics

  • Backup Success Rate: Percentage of successful backups
  • Backup Duration: Time taken to complete backups
  • Backup Size: Size of backup files and growth trends
  • Recovery Time: Time required for data recovery

Backup Monitoring Script

Backup Status Monitor
#!/bin/bash
# monitor-backups.sh

TIMESTAMP=\$(date +%Y%m%d_%H%M%S)
SLACK_WEBHOOK="https://hooks.slack.com/your-webhook"

# Check Redis backup status
REDIS_BACKUP_AGE=\$(aws s3 ls s3://megavault-backups/redis/ | tail -1 | awk '{print \$1 " " \$2}')
REDIS_BACKUP_TIME=\$(date -d "\$REDIS_BACKUP_AGE" +%s)
CURRENT_TIME=\$(date +%s)
HOURS_SINCE_BACKUP=\$(( (CURRENT_TIME - REDIS_BACKUP_TIME) / 3600 ))

if [ \$HOURS_SINCE_BACKUP -gt 6 ]; then
    # Send alert
    curl -X POST "\$SLACK_WEBHOOK" \
        -H 'Content-type: application/json' \
        --data "{\"text\":\"⚠️ Redis backup is \$HOURS_SINCE_BACKUP hours old\"}"
fi

# Check file backup status
FILE_BACKUP_STATUS=\$(aws s3 ls s3://megavault-backup/latest_files.txt)
if [ -z "\$FILE_BACKUP_STATUS" ]; then
    curl -X POST "\$SLACK_WEBHOOK" \
        -H 'Content-type: application/json' \
        --data '{"text":"🚨 File backup status unknown"}'
fi

echo "[\$TIMESTAMP] Backup monitoring check completed"
💡

Backup Validation

Always validate backups by performing test recoveries. An untested backup is not a reliable backup.