Storage that scales, and backups that actually restore

Two things are true of almost every growing business's data. It grows faster than anyone budgeted for. And the backups quietly rot — until the day someone needs one and finds out it doesn't restore.

Both are solvable, and neither requires handing your data to a vendor and hoping. Here's how I think about the whole chain: resilient primary storage, backups you can actually trust, and cheap archives that let you keep everything forever.

Primary storage that grows with you

The trap is buying one big box. A single NAS or storage appliance is simple right up until it's full, or it dies and takes everything with it. Then you're doing a forklift upgrade under pressure.

Distributed storage solves this differently. Ceph is the open-source workhorse here: instead of one box, you run a cluster of commodity nodes that act as one pool of storage. A few things fall out of that design that matter to a business:

No single point of failure. Data is replicated (or erasure-coded) across nodes. Lose a disk — or a whole node — and the cluster keeps serving while it heals itself in the background.
Scale by adding, not replacing. Need more space? Add nodes. The cluster rebalances onto them. No migration weekend, no downtime.
One cluster, three shapes of storage. Ceph serves object storage (with an S3-compatible API, so your apps talk to it exactly like they'd talk to AWS), block storage for virtual machines, and shared file systems — from the same pool.

The payoff is an on-premises storage tier you own outright, that survives hardware failure as a matter of routine, and that grows a node at a time instead of in painful lurches.

Backups that actually restore

Owning resilient storage is not a backup. Replication protects you from a disk dying; it does nothing against a bad deploy, a ransomware event, or someone deleting the wrong thing — those get faithfully replicated too. You still need real backups, and the bar is higher than most setups clear.

The old 3-2-1 rule still holds: 3 copies of your data, on 2 different kinds of media, with 1 copy offsite. But two things turn a checkbox into an actual safety net:

Automated and verified. A backup job that runs on a schedule is table stakes. A backup you have restored from — actually pulled data back out of, on a schedule, and confirmed it's intact — is the real thing. "We have backups" and "we have tested backups" are different sentences, and the gap between them is where businesses lose everything. Every backup run should be tracked and confirmed, with a summary landing somewhere a human will see it.
Immutable, for the ransomware case. If an attacker who's in your network can also delete your backups, they aren't backups. Object storage with a write-once, read-many lock (AWS calls it Object Lock) makes recent backups impossible to alter or delete for a set window — even with stolen admin keys. That single feature is the difference between "we restored" and "we paid."

Archives that cost almost nothing

Here's the part that changes how you think about deleting data: you probably shouldn't. Cold storage has gotten absurdly cheap.

With S3 lifecycle rules, data ages through tiers automatically. Hot, frequently-touched data sits in standard storage. After it cools off, it transitions to infrequent-access, then to archival classes — Glacier Deep Archive runs on the order of a dollar per terabyte per month. You set the policy once ("move anything untouched for 90 days to archive"), and it happens without anyone thinking about it.

The result: instead of deleting old records to save space, or paying hot-storage prices to hoard them, you keep everything — compliance records, old projects, years of history — for pennies, and pull it back when you need it.

When you need this

Be honest about scale. If you're one server and a few hundred gigs, a good managed backup service and a sensible retention policy are plenty — don't build a Ceph cluster to feel serious. This chain earns its keep when your data is measured in terabytes and climbing, when downtime or data loss would genuinely hurt, or when you'd rather own the storage than rent it indefinitely.

But when you're there, the pieces fit together cleanly: resilient primary storage you own, automated backups you've actually tested, and archives cheap enough that "keep it forever" is the easy choice. Storage stops being the thing you worry about running out of, and backups stop being the thing you hope you'll never need to find out about.

Outgrowing your storage, or not sure your backups would actually come back? Let's pressure-test it.