Backup & Recovery
Comprehensive backup and disaster recovery strategies for MegaVault to ensure data protection, business continuity, and rapid recovery from system failures.
Table of Contents
Backup Overview
A robust backup and recovery strategy is essential for protecting MegaVault data and ensuring business continuity in case of hardware failures, data corruption, or disasters.
Data Protection
Multi-layer backup approach
- ✅ Database backups
- ✅ File storage backups
- ✅ Configuration backups
- ✅ Application state
Recovery Objectives
Business continuity goals
- ✅ RTO: 4 hours max
- ✅ RPO: 1 hour max
- ✅ 99.9% data integrity
- ✅ Point-in-time recovery
Backup Types
Comprehensive coverage
- ✅ Full backups
- ✅ Incremental backups
- ✅ Differential backups
- ✅ Continuous replication
Backup Principles
Backup Strategy
Comprehensive backup strategy covering all MegaVault components and data types.
Backup Components
| Component | Frequency | Retention | Method |
|---|---|---|---|
| Redis Database | Every 4 hours | 30 days | RDB snapshots + AOF |
| User Files | Daily | 90 days | Cross-region sync |
| Configuration | On change | 1 year | Git repository |
| Application Logs | Daily | 30 days | Log aggregation |
Backup Storage Locations
- Primary: Same region as production (for quick recovery)
- Secondary: Different region (for disaster recovery)
- Tertiary: Different cloud provider or on-premises
- Archive: Long-term cold storage for compliance
Backup Schedule
# Crontab entries for automated backups
# Redis backup every 4 hours
0 */4 * * * /scripts/backup-redis.sh
# File storage backup daily at 2 AM
0 2 * * * /scripts/backup-files.sh
# Configuration backup on changes (via Git hooks)
# Handled by CI/CD pipeline
# Log backup daily at 1 AM
0 1 * * * /scripts/backup-logs.sh
# Weekly full system backup on Sundays at midnight
0 0 * * 0 /scripts/backup-full.shAutomated Backups
Implement automated backup procedures for consistent and reliable data protection.
Redis Backup Script
#!/bin/bash
# backup-redis.sh
set -e
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/redis"
REDIS_DATA_DIR="/var/lib/redis"
S3_BUCKET="megavault-backups"
LOG_FILE="/var/log/megavault/backup.log"
echo "[$TIMESTAMP] Starting Redis backup" >> $LOG_FILE
# Create backup directory
mkdir -p $BACKUP_DIR
# Create Redis snapshot
redis-cli --rdb $BACKUP_DIR/dump_$TIMESTAMP.rdb
# Compress backup
gzip $BACKUP_DIR/dump_$TIMESTAMP.rdb
# Upload to S3
aws s3 cp $BACKUP_DIR/dump_$TIMESTAMP.rdb.gz s3://$S3_BUCKET/redis/
# Verify backup integrity
if aws s3 ls s3://$S3_BUCKET/redis/dump_$TIMESTAMP.rdb.gz; then
echo "[$TIMESTAMP] ✓ Redis backup completed successfully" >> $LOG_FILE
# Cleanup local backup (keep last 3)
ls -t $BACKUP_DIR/dump_*.rdb.gz | tail -n +4 | xargs -r rm
else
echo "[$TIMESTAMP] ✗ Redis backup failed" >> $LOG_FILE
exit 1
fiFile Storage Backup
#!/bin/bash
# backup-files.sh
set -e
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SOURCE_BUCKET="megavault-storage"
BACKUP_BUCKET="megavault-backup"
LOG_FILE="/var/log/megavault/backup.log"
echo "[$TIMESTAMP] Starting file storage backup" >> $LOG_FILE
# Sync files to backup bucket
aws s3 sync s3://$SOURCE_BUCKET s3://$BACKUP_BUCKET/files_$TIMESTAMP/ --exclude "*/temp/*" --exclude "*/cache/*" --storage-class STANDARD_IA
if [ $? -eq 0 ]; then
echo "[$TIMESTAMP] ✓ File storage backup completed" >> $LOG_FILE
# Create backup manifest
aws s3 ls s3://$BACKUP_BUCKET/files_$TIMESTAMP/ --recursive > /tmp/backup_manifest.txt
aws s3 cp /tmp/backup_manifest.txt s3://$BACKUP_BUCKET/manifests/files_$TIMESTAMP.txt
# Update latest backup pointer
echo "files_$TIMESTAMP" | aws s3 cp - s3://$BACKUP_BUCKET/latest_files.txt
else
echo "[$TIMESTAMP] ✗ File storage backup failed" >> $LOG_FILE
exit 1
fi
echo "[$TIMESTAMP] File storage backup process completed" >> $LOG_FILEDatabase Backup Verification
#!/bin/bash
# verify-backup.sh
BACKUP_FILE=$1
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
TEMP_DIR="/tmp/backup_verify_$TIMESTAMP"
LOG_FILE="/var/log/megavault/backup-verify.log"
echo "[$TIMESTAMP] Verifying backup: $BACKUP_FILE" >> $LOG_FILE
# Create temporary directory
mkdir -p $TEMP_DIR
# Download and extract backup
aws s3 cp $BACKUP_FILE $TEMP_DIR/backup.rdb.gz
gunzip $TEMP_DIR/backup.rdb.gz
# Start temporary Redis instance for verification
redis-server --port 6380 --dir $TEMP_DIR --dbfilename backup.rdb --daemonize yes
# Test basic operations
if redis-cli -p 6380 ping | grep -q PONG; then
echo "[$TIMESTAMP] ✓ Backup verification successful" >> $LOG_FILE
redis-cli -p 6380 shutdown
rm -rf $TEMP_DIR
exit 0
else
echo "[$TIMESTAMP] ✗ Backup verification failed" >> $LOG_FILE
redis-cli -p 6380 shutdown 2>/dev/null || true
rm -rf $TEMP_DIR
exit 1
fiData Recovery
Step-by-step procedures for recovering data from backups in various scenarios.
Stop Redis Service
Stop the Redis service to prevent data corruption during recovery.
Download Backup
Download the appropriate backup file from the backup storage location.
Restore Database File
Replace the current Redis database file with the backup file.
Set Correct Permissions
Ensure the restored file has the correct ownership and permissions.
Start Redis Service
Start the Redis service and verify the data has been restored correctly.
Validate Recovery
Run tests to ensure the application functions correctly with restored data.
Redis Recovery Commands
#!/bin/bash
# recover-redis.sh
BACKUP_DATE=$1
if [ -z "$BACKUP_DATE" ]; then
echo "Usage: $0 <backup_date> (format: YYYYMMDD_HHMMSS)"
exit 1
fi
BACKUP_FILE="dump_${BACKUP_DATE}.rdb.gz"
REDIS_DATA_DIR="/var/lib/redis"
S3_BUCKET="megavault-backups"
echo "Starting Redis recovery from backup: $BACKUP_FILE"
# Stop Redis service
sudo systemctl stop redis-server
# Backup current data (just in case)
sudo cp $REDIS_DATA_DIR/dump.rdb $REDIS_DATA_DIR/dump.rdb.pre-recovery
# Download and restore backup
aws s3 cp s3://$S3_BUCKET/redis/$BACKUP_FILE /tmp/
gunzip /tmp/$BACKUP_FILE
sudo cp /tmp/dump_${BACKUP_DATE}.rdb $REDIS_DATA_DIR/dump.rdb
# Set correct permissions
sudo chown redis:redis $REDIS_DATA_DIR/dump.rdb
sudo chmod 660 $REDIS_DATA_DIR/dump.rdb
# Start Redis service
sudo systemctl start redis-server
# Verify recovery
if redis-cli ping | grep -q PONG; then
echo "✓ Redis recovery completed successfully"
# Test data integrity
redis-cli info keyspace
else
echo "✗ Redis recovery failed"
exit 1
fiFile Recovery Process
#!/bin/bash
# recover-files.sh
BACKUP_DATE=$1
RECOVERY_PATH=$2
if [ -z "$BACKUP_DATE" ] || [ -z "$RECOVERY_PATH" ]; then
echo "Usage: $0 <backup_date> <recovery_path>"
exit 1
fi
BACKUP_BUCKET="megavault-backup"
SOURCE_PATH="files_${BACKUP_DATE}/"
echo "Starting file recovery from backup: $BACKUP_DATE"
# Create recovery directory
mkdir -p $RECOVERY_PATH
# Download files from backup
aws s3 sync s3://$BACKUP_BUCKET/$SOURCE_PATH $RECOVERY_PATH/
if [ $? -eq 0 ]; then
echo "✓ File recovery completed successfully"
echo "Files recovered to: $RECOVERY_PATH"
# Display recovery statistics
echo "Recovery statistics:"
find $RECOVERY_PATH -type f | wc -l | xargs echo "Files recovered:"
du -sh $RECOVERY_PATH | xargs echo "Total size:"
else
echo "✗ File recovery failed"
exit 1
fiDisaster Recovery
Comprehensive disaster recovery plan for complete system failure scenarios.
Disaster Recovery Scenarios
- Data Center Outage: Complete loss of primary infrastructure
- Regional Disaster: Natural disaster affecting entire region
- Cyber Attack: Ransomware or data breach incident
- Data Corruption: Widespread data corruption or loss
Recovery Time Objectives (RTO)
| Scenario | RTO Target | RPO Target | Action Required |
|---|---|---|---|
| Single Component Failure | 30 minutes | 15 minutes | Automated failover |
| Data Center Outage | 4 hours | 1 hour | Manual intervention |
| Regional Disaster | 24 hours | 4 hours | Full DR activation |
| Cyber Attack | 48 hours | 24 hours | Security investigation |
DR Activation Checklist
- ☐ Assess the scope and impact of the disaster
- ☐ Activate incident response team
- ☐ Notify stakeholders and users
- ☐ Activate backup infrastructure
- ☐ Restore data from backups
- ☐ Update DNS and routing
- ☐ Verify system functionality
- ☐ Monitor system performance
- ☐ Document lessons learned
Backup Testing
Regular testing procedures to ensure backup integrity and recovery capabilities.
Testing Schedule
- Daily: Automated backup verification
- Weekly: Recovery test on non-production environment
- Monthly: Full disaster recovery simulation
- Quarterly: Cross-region recovery test
Backup Test Script
#!/bin/bash
# test-backup-recovery.sh
TIMESTAMP=\$(date +%Y%m%d_%H%M%S)
TEST_ENV="backup-test-\$TIMESTAMP"
LOG_FILE="/var/log/megavault/backup-test.log"
echo "[\$TIMESTAMP] Starting backup recovery test" >> \$LOG_FILE
# Create isolated test environment
docker-compose -f docker-compose.test.yml up -d --wait
# Get latest backup
LATEST_BACKUP=\$(aws s3 ls s3://megavault-backups/redis/ | sort | tail -1 | awk '{print \$4}')
BACKUP_DATE=\$(echo \$LATEST_BACKUP | sed 's/dump_\\(.*\\)\\.rdb\\.gz/\\1/')
# Restore backup to test environment
./recover-redis.sh \$BACKUP_DATE test
# Run application tests against restored data
npm run test:integration
if [ \$? -eq 0 ]; then
echo "[\$TIMESTAMP] ✓ Backup recovery test passed" >> \$LOG_FILE
else
echo "[\$TIMESTAMP] ✗ Backup recovery test failed" >> \$LOG_FILE
exit 1
fi
# Cleanup test environment
docker-compose -f docker-compose.test.yml down -v
echo "[\$TIMESTAMP] Backup recovery test completed" >> \$LOG_FILEBackup Monitoring
Monitor backup processes and ensure they complete successfully.
Backup Monitoring Metrics
- Backup Success Rate: Percentage of successful backups
- Backup Duration: Time taken to complete backups
- Backup Size: Size of backup files and growth trends
- Recovery Time: Time required for data recovery
Backup Monitoring Script
#!/bin/bash
# monitor-backups.sh
TIMESTAMP=\$(date +%Y%m%d_%H%M%S)
SLACK_WEBHOOK="https://hooks.slack.com/your-webhook"
# Check Redis backup status
REDIS_BACKUP_AGE=\$(aws s3 ls s3://megavault-backups/redis/ | tail -1 | awk '{print \$1 " " \$2}')
REDIS_BACKUP_TIME=\$(date -d "\$REDIS_BACKUP_AGE" +%s)
CURRENT_TIME=\$(date +%s)
HOURS_SINCE_BACKUP=\$(( (CURRENT_TIME - REDIS_BACKUP_TIME) / 3600 ))
if [ \$HOURS_SINCE_BACKUP -gt 6 ]; then
# Send alert
curl -X POST "\$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
--data "{\"text\":\"⚠️ Redis backup is \$HOURS_SINCE_BACKUP hours old\"}"
fi
# Check file backup status
FILE_BACKUP_STATUS=\$(aws s3 ls s3://megavault-backup/latest_files.txt)
if [ -z "\$FILE_BACKUP_STATUS" ]; then
curl -X POST "\$SLACK_WEBHOOK" \
-H 'Content-type: application/json' \
--data '{"text":"🚨 File backup status unknown"}'
fi
echo "[\$TIMESTAMP] Backup monitoring check completed"