“Routine NVMe maintenance caused a RAID failure, then a destructive command on the wrong server wiped the secondary. 51TB PostgreSQL restore from S3.”
Matrix.org went down for 24 hours. Disk maintenance broke RAID. Then someone ran a destructive command on the wrong server. The secondary was gone. 51 terabytes restored from S3 backups. This is why you test your backups. This is why you have runbooks. This is why production access should terrify you.