It's 2 AM on a Thursday. A pager pulls me awake. The database writer is under high cpu load, queries are choked and some users are seeing the app fail to load. I check, and it's read queries choking the writer. They shouldn't be hitting the writer at all. Then it clicks. That's how Django routes queries by default. We'll need to fix that. But first the immediate problem. Get the load down. Manually scale up the readers, then run a carefully orchestrated failover in production, mid-event, while sessions are live. Route pager to respective dev team to have them instrument a temporary fix. Load starts dropping. Things stabilize and we can talk about fixing the root cause tomorrow. For now let me spend some time collecting data that will help us discuss. It's 5:30 AM now. Let's go back to sleep.
You've probably heard some version of this on-call story. For me it was routine. And I'm not complaining. I signed up exactly for this. I joined Goldcast as a founding engineer knowing what came with it. I'm not someone who's comfortable doing a good-enough job. If I'm trusted with something, I own it end to end, to the best of my ability.
Living on the edge of every infrastructure and business problem paid off. It earned me the respect of my peers and sharpened my ability to reason from first principles and take problems head-on. I grew comfortable out there on the edge, and I grew a lot because of it. Going from solo frontend contributor to manager of the SRE function over six years is real growth, technically and functionally.
But extreme ownership has a cost. I was owning the infrastructure while quietly leaving my own growth behind. The deeper learning that would make me sharper, and capable of tackling harder problems than the ones already in front of me.
A couple of months ago I decided it was time to step back from daily firefighting. Goldcast now has a wider team at Cvent to keep the systems stable, so I can hand off the pager and take real time to go deeper. That's the job now. FinOps, infrastructure at scale, AI workloads, MLOps, and the craft of reliable systems. I'm investing in growth deliberately.
I'll admit it's a little scary, stepping out alone into a fast-changing landscape. But I've never seen failure come from being deliberate and thoughtful about learning. In the other end of it was comfort of familiar systems, which I feel, gave me deminishing returns as time progressed.
I'll use this space to share notes from the journey at Goldcast, what I learnt watching the company grow from failed initial events to an acquisition, and whatever I'm learning as I go.
And if you have interesting engineering problems where my experience might help, full-time or fractional, say hi ([email protected]).