One of the challenges with making changes in a database environment is that undoing those changes can be hard. What's often preferred is rolling forward with a new change to correct the issue, but that's often done with limited analysis and thought. Instead, we hope our staff makes a quick patch and a better decision under pressure than they did with more time to examine the problem. That works if it's a simple mistake that was made in implementation but not if we haven't designed our solution well at the start.

I ran across an article on DoorDash that I thought was interesting. During the pandemic, their business exploded and they outgrew the Aurora PostgreSQL database. They migrated to Cockroach, a cloud version of PostgreSQL that's distributed and can (theoretically) scale much higher.

Read the rest of Try, Try Again, Until It's Right