Paul Zaich from Checkr tells us about a critical outage that occurred, what caused it and how they tracked down and fixed the issue. The conversation ranges through troubleshooting complex systems, building team culture, blameless post-mortems, and monitoring the right things to make sure your applications don't fail or alert you when they do.PanelCharles Max WoodDave KimuraLuke StuttersGuestPaul ZaichLinksPaul's TwitterPaul's LinkedInPicksBlood Pressure Monitor - Daveeft - LukeRuby one-liners cookbook - PaulPodcast Growth Summit - ChuckMost Valuable Dev - ChuckMost Valuable Dev Summit - ChuckMushroom Wars - ChuckGmelius - ChuckSpecial Guest: Paul Zaich.

Advertising Inquiries: https://redcircle.com/brands

Privacy & Opt-Out: https://redcircle.com/privacy

Become a supporter of this podcast: https://www.spreaker.com/podcast/ruby-rogues--6102073/support.

Twitter Mentions