Software Misadventures artwork

Software Misadventures

38 episodes - English - Latest episode: 11 days ago - ★★★★★ - 10 ratings

A show about not just the technologies, but the people and stories behind them. In every episode, Ronak and Guang sit down with engineers, founders, and investors to chat about their paths, lessons they’ve learned and of course, the misadventures along the way.

Technology sre softwareengineering devops distributedsystems technology
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Learning in public | Kelsey Hightower

April 16, 2024 11:11 - 57 minutes - 52.9 MB

We’re super excited to have Kelsey back on the show! Our last conversation was around his incredible career journey - from working at McDonald’s after school to starting his own computer store, to hacking on python infrastructure with the core developers, to meeting Satya Nadella for an interview. In part two of this conversation, we dive deep into Kelsey’s experiences learning in public and writing “Kubernetes: Up and Running”: The biggest barrier to getting started with learning in publ...

Engineer's guide to startup advising | Kelsey Hightower

April 02, 2024 11:11 - 49 minutes - 45.9 MB

We’re super excited to have Kelsey back on the show! Our last conversation was around his incredible career journey - from working at McDonald’s after school to starting his own computer store, to hacking on python infrastructure with the core developers, to meeting Satya Nadella for an interview. In part one of this conversation, we dive deep into Kelsey’s experiences and expertise as a startup advisor: How to break into advising when you don’t have a lot of connections How to influence...

The hard power of management and the soft power of senior ICs | Josh Wills

March 19, 2024 11:00 - 1 hour - 72.2 MB

As a self-described “gainfully unemployed data person”, Josh Wills is an angel investor and has worked on and led data teams at Slack, Cloudera, WeaveGrid and Google. We discuss: How to get started with angel investing without a ton of $$ Attributes that define great engineering managers What’s it like transitioning from management back to IC Challenges in Climate Tech from a software perspective And more   Segments: [0:01:35] Transitioning from management to individual contributor...

From High School Suspension to US Chief Data Scientist | DJ Patil

March 05, 2024 12:00 - 1 hour - 59.9 MB

Known for coining the term “Data Scientist”, DJ is a renowned technologist with a diverse background spanning academia, industry, and government. Having led product teams at companies like RelateIQ and LinkedIn, DJ was appointed by President Obama to be the first U.S. Chief Data Scientist where his efforts led to the establishment of nearly 40 Chief Data Officer roles across the Federal government, new health care programs as well as new criminal justice reforms. We discuss: “Dream in years...

Building Diverse Engineering Teams | Erica Lockheimer

February 20, 2024 12:11 - 1 hour - 73.9 MB

Erica is a former VP of Engineering at LinkedIn. Having almost dropped out of college, Erica’s journey in tech is a testament to her perseverance and dedication. In addition to leading engineering teams at LinkedIn, Erica founded WIT (Women In Tech) to empower women within the company as well as the broader tech community. We discuss: How to create incentives for diversity-building work. Building your personal “board of directors”. Balancing mentoring work vs sprint tickets. Structuring ...

Stories behind building HashiCorp | Mitchell Hashimoto

January 30, 2024 11:45 - 1 hour - 70.7 MB

Mitchell co-founded HashiCorp in 2012 and created many important infrastructure tools, such as Terraform, Vagrant, Packer, and Consul. In addition to being a prolific engineer, Mitchell grew HashiCorp into a multi-billion-dollar public company. We discuss: How to structure large projects to avoid demotivation or burnout The "A.P.P.L.E" framework for diffusing tense situations and handling trolls How to decide what to work on Mitchell's unconventional transitions from CEO to CTO and then ...

Practical Guide to More Effective Mentorship | Dave O'Connor (Google, Twilio, Elastic)

January 16, 2024 07:00 - 1 hour - 101 MB

After 17 years building SRE teams at Google and serving as the Site Lead for Engineering in Dublin, Dave joined Elastic as the Sr Director of Engineering and later VP of Engineering at Twilio. Following a recent career break, Dave now divides his time between coaching engineering leaders and consulting to help busy teams be more effective. In the heart of our conversation, Dave shares the frameworks and practical tips he's amassed for making the most of the mentorship experience.   Segme...

War stories from early days of engineering at LinkedIn | David Henke (LinkedIn, Yahoo)

January 04, 2024 14:06 - 57 minutes - 52.8 MB

At the personal request of Reid Hoffman to emerge from early retirement, David joined LinkedIn in 2009 during a period of rapid growth to help stabilize the chaos, cultivating a much-needed culture of “Site Up and Secure.” Before this, David served as SVP of Engineering and Operations at Yahoo!, overseeing their Search Marketing organization and the Production Operations infrastructure for the entire company. Throughout his career, David has held multiple leadership positions and is recogniz...

Automating away your job as a Data Scientist | Melissa Runfeldt (Salesforce, CueIn)

December 12, 2023 07:14 - 1 hour - 57.4 MB

Before joining CueIn last year as a Founding Data Scientist, Melissa was a Lead Data Scientist at Salesforce working on the Einstein Platform that focused on automating Data Science workflows. In this conversation we dive into Melissa’s unique journey, what to do in the face of increasing job automation and explore the latest developments in practical AI. Segments: [00:02:13] Melissa’s background in computational neuroscience [00:06:08] 7 years at Salesforce vs startup [00:11:31] Joi...

Open sourcing LinkedIn's Derived Data Platform | Felix GV (LinkedIn)

November 28, 2023 14:58 - 1 hour - 49.8 MB

What's it like to open source an internal project at a big tech company like LinkedIn? When should a company open source a project and what are the benefits and challenges that come along with it? If you want to open source an internal project, how should you go about advocating for it? Félix is a Principal Staff Engineer at LinkedIn where he works on the data infrastructure team that builds Venice. Venice is a distributed derived data store which LinkedIn open sourced in the fall of 2022....

When enough was enough - practical and emotional drivers for leaving big tech to bootstrap Metacast | Arnab Deka & Ilya Bezdelev (AWS, Google)

November 07, 2023 06:20 - 1 hour - 72.2 MB

Should engineers and product managers “stay in their lanes”? What big company habits should you keep vs unlearn when transitioning to working at a start-up? Could an ayahuasca retreat give you more clarity on your career goals? Ilya and Arnab join the show to share their journey quitting big tech to bootstrap a podcasting startup. Arnab and Ilya are the co-founders of Metacast. Before starting the company, Arnab was a Principal Engineer at AWS while Ilya was a Sr. Product Manager at Google...

Pete Warden - On launching "AI in a Box" and building a hardware edge AI company - #24

October 23, 2023 17:43 - 43 minutes - 40.2 MB

What's "AI in a Box"? Pete Warden joins the show to share a new project he recently launched that encapulates Language Transcription/Translation and Question Answering capabilities into a wallet-sized board running locally without internet, as well as stories and learnings from building his new company, Useful Sensors, after 7 years of leading the tensorflow mobile project at Google.  Pete is the CEO of Useful Sensors. After founding his own company Jetpac and selling it to Google in 2014,...

Nathan Marz - On changing the economics of building large-scale software with Rama - #23

September 22, 2023 04:43 - 1 hour - 85.7 MB

What does it mean to change the economics of software development? Nathan Marz joins the show to share how they reduced the cost of building Mastodon at Twitter-scale by 100X and the 10 years journey to build Rama, a new programming platform that made this feat possible. Nathan is the founder of Red Planet Labs. Prior to RPL, he led engineering for BackType which was acquired by Twitter in 2011. Nathan created the Apache Storm project and wrote the book Big Data: Principles and best practi...

Kelsey Hightower - On retiring as Distinguished Engineer from Google at 42 (Part 2)

August 03, 2023 05:43 - 1 hour - 73.4 MB

Kelsey Hightower was a Distinguished Engineer at Google, where he worked on Google Cloud Platform. In this second part of the conversation, we focus on Kelsey’s retirement - the financial planning that enabled him to retire at 42, how he got started advising startups and his perspectives on compensation, turning down a substantial offer from Microsoft and meeting Satya Nadella in person. And, of course, plans for the future.

Kelsey Hightower - On retiring as Distinguished Engineer from Google at 42 (Part 1)

July 24, 2023 17:34 - 1 hour - 53.5 MB

Kelsey Hightower was a Distinguished Engineer at Google, where he worked on Google Cloud Platform. In this first part of the conversation, we delve into pivotal moments in Kelsey’s career journey ranging from buying his first car by working at mcdonald’s after school, to starting his own computer store that turned into a music studio after 6pm, to hacking on python infrastructure with the core developers. Through these stories, we learned a ton about how Kelsey thinks about acquiring new ski...

Julie Amundson - Career breaks, job search amidst hiring freezes, positioning yourself and much more - #20

June 27, 2023 02:26 - 57 minutes - 44 MB

Julie Amundson is a Sr Staff Software Engineer at Google working on Machine Learning Infrastructure. Prior to Google, she was the Director of Machine Learning Infrastructure at Netflix. Julie decided to take a career break last year when she was affected by mass layoffs. In this conversation, we talk to her about what it was like to find a job during hiring freezes, what it was like to position herself in this market, whether the interviewers cared about the career break she took and how t...

Chris Pruett - On deciding to leave LinkedIn and co-founding Jam, values based decision making and compassionate leadership - #19

June 03, 2022 05:53 - 1 hour - 63.9 MB

Chris Pruett is the CTO and Co-founder of Jam - a new way to share and listen to bite-sized audio. Prior to Jam, Chris spent 9+ years at LinkedIn growing from an engineering manager to VP of Engineering. During his tenure at LinkedIn, he worked on almost all aspects of the app and towards the end, led an org of 500+ engineers working on Feed, Messaging, Identity and Search. In this episode, we discuss how he made the decision to leave his leadership position at LinkedIn and co-found Jam. W...

Software Misadventures Update and Plans for 2022

March 25, 2022 18:10 - 8 minutes - 7.14 MB

Short episode about reflections on the past year and plans for 2022.

Kailash Nadh - On being an absurdist and building the tech team at Zerodha, India's largest stock broker - #18

February 25, 2022 08:04 - 1 hour - 79.7 MB

Kailash is the CTO at Zerodha, the largest stock broker in India. In this conversation, we speak with him about absurdism - a philosophy that guides his personal and professional worldview. We discuss how he built Zerodha’s tech team, their team culture and how the team operates so efficiently while being so lean. We also discuss why Zerodha self-hosts all of their tech stack, what they look for when hiring engineers and how their systems scaled when the user base grew from 2 to 8 million ...

Michael Lynch - On quitting google for indie hacking, bootstrapping to $450K+ ARR in public, writing personal retrospectives and more - #17

January 14, 2022 06:50 - 1 hour - 86.4 MB

Michael Lynch is the founder of TinyPilot. After doing software engineering at Microsoft and Google for 7 years, Michael decided in 2018 to quit and start working for himself by building small software businesses. From years of negative profit to now building a $450K+ ARR hardware business, Michael joins the show to chat about what made him quit his cushy job at Google, how he builds in public with monthly retrospectives, what he has learned over the 3 years indie hacking and much more.

Cory Watson - Leading observability teams at Twitter & Stripe, how to succeed in a new org, effective ways to advocate for your team and more - #16

November 12, 2021 06:37 - 1 hour - 68.3 MB

Cory is currently a Solutions Engineer at Jeli.io and very well known in the community for his work on Observability. His career in observability began at Twitter where he managed the observability team and then he joined Stripe, where he created and led the observability team, this time around as a Principal Engineer. We talk to him about how he got his start in customer support and the role it played in the later part of his career. We discuss his time at Twitter where there was a power ...

Ashwin Kumar - On learning new things by breaking them down, the secret to winning >$100k from hackathons, the art of storytelling, and much more - #15

October 12, 2021 04:00 - 1 hour - 60.3 MB

Ashwin is a Startup Partnership Lead at Stripe. From web development to co-founding a YC startup, to deep learning, Ashwin has a knack for picking up new skills extremely quickly. In this episode, we chat about the methods he employed to successfully make these transitions, learnings/tips from winning 30+ hackathons in a row, and what engineers can gain from better story-telling.

Bruno Connelly - Building and leading the global SRE org at LinkedIn - #14

September 12, 2021 06:28 - 1 hour - 54 MB

Bruno Connelly is a VP of Engineering at LinkedIn. He leads the Site Engineering org responsible for LinkedIn's production infrastructure. He joins the show to talk about his journey in tech - from teaching himself how to code at a young age, building, maintaining and reverse engineering software as a teenager, building ISPs in the early part of his career (there are some fun stories that involve sleeping in the data center) to leading the SRE org at LinkedIn over the last decade. He talks...

Lorin Hochstein - On how Netflix learns from incidents, software as socio-technical systems, writing persuasively and more - #13

August 14, 2021 08:14 - 1 hour - 68.2 MB

With 5+ years of experience building resilient systems at the Netflix scale, Lorin joins the show to chat about his favorite incident story, the path that led him to doing chaos engineering (and later away from it), and advocating for a dedicated analyst to talk to people after an incident. Throughout the conversation, Lorin shares his philosophy and tips on how to learn from incidents, what engineers can gain from writing better, and why some metrics may not be as useful as you think.

Spoons (Daniel Spoonhower) - On building Lightstep, being customer focused, developing systems at Google scale and much more - #12

July 09, 2021 07:30 - 1 hour - 60.2 MB

Spoons is the Co-founder and Chief Architect of Lightstep. He joins the show to talk about building systems at Google scale and various aspects that make Google a weird place than other companies. We talked about Spoons's journey of leaving Google and deciding to join Lightstep as a co-founder. We dig into the challenges during the early days of Lightstep and discuss the importance of speaking to customers to build the right product. We talk about what it's like to start a family and run a...

Emmanuel Ameisen - On production ML at Stripe scale, leading 100+ ML projects, iterating fast, and much more - #11

June 11, 2021 09:00 - 1 hour - 59.3 MB

Having led 100+ ML projects at Insight and built ML systems at Stripe scale, Emmanuel joins the show to chat about how to build useful ML products and what happens next when the model is in production. Throughout the conversation, Manu shares stories and advice on topics like the common mistakes people make when starting a new ML project, what’s similar and different about the lifecycle of ML systems compared to traditional software, and writing a technical book.

Todd Underwood - On lessons from running ML systems at Google for a decade, what it takes to be a ML SRE, challenges with generalized ML platforms and much more - #10

May 07, 2021 03:41 - 1 hour - 55 MB

Todd is a Sr Director of Engineering at Google where he leads Site Reliability Engineering teams for Machine Learning. Having recently presented on how ML breaks in production, by examining more than a decade of outage postmortems at Google, Todd joins the show to chat about why many ways that ML systems break in production have nothing to do with ML, what’s different about engineering reliable systems for ML, vs traditional software (and the many ways that they are similar), what he looks f...

Evan Estola - On recommendation systems going bad, hiring ML engineers, giving constructive feedback, filter bubbles and much more - #9

April 23, 2021 05:21 - 1 hour - 59 MB

Evan Estola (https://twitter.com/estola) is a Director of Engineering at Flatiron Health where he's leading software engineering teams focused on building Machine Learning products. Throughout this episode, Evan shares various stories when recommendation systems didn’t work as expected, like this one time when members saw mathematically worst recommendations for meetups near them. He also shares why Schenectady, NY pops up on some lists of most popular cities and the story behind the Wall St...

Uma Chingunde - On managing migrations, growing engineering teams and much more - #8

April 09, 2021 07:50 - 1 hour - 49.4 MB

Uma is a VP of Engineering at Render. In this episode, she shared with us her insights on how to successfully manage infrastructure migrations. We discussed the importance of communicating the "why" behind a migration, identifying success metrics, creating a culture where migrations are identified as highly impactful projects and much more. Uma also shared stories where parts of a migration didn’t go as planned, how the team fixed the issue and the kind of engineers she thinks would make goo...

Charity Majors - On database outages, journey as a co-founder, thriving under pressure and growing as an engineer - #7

March 20, 2021 05:37 - 1 hour - 54.3 MB

Charity Majors (https://twitter.com/mipsytipsy) is the co-founder and CTO of Honeycomb.io. Before this she worked at Facebook, Parse and Linden Lab on infrastructure and developer tools, and always seemed to wind up running the databases. She is the co-author of Database Reliability Engineering book and also has an amazing blog at charity.wtf. We love the content in her blogs and have learned a lot from them. We had a lot of fun speaking with Charity in this lively conversation! We learne...

Tammy Bryant Butow - On failure injection, chaos engineering, extreme sports and being curious - #6

March 07, 2021 08:51 - 1 hour - 47.8 MB

Tammy Bryant Butow is a Principal SRE at Gremlin where she works on Chaos Engineering. In this episode, we discuss how her curiosity led her to the world of infrastructure engineering, an outage from her early days where a core switch took down half the datacenter, her experience running a disaster recovery test and how it taught her about the importance of injecting failures into a system to make it more resilient. We also touch on advanced failure injection techniques, how chaos engineeri...

Oliver Leaver-Smith - On how "just a monitoring change" took down the entire site and resilience engineering - #5

February 19, 2021 12:57 - 1 hour - 50 MB

Oliver Leaver-Smith, better known as Ols, is a Senior Devops Engineer at Sky Betting and Gaming. In this episode, we discuss how a seemingly simple monitoring change ended up taking down the entire site. We also talk about chaos and resilience engineering. We discuss how the team at Sky Betting and Gaming conducts fire drills (chaos engineering exercises) where they not only test the resiliency of their software systems but also their people systems. We walk through a recent example of a f...

Ryan Underwood - On debugging the Linux kernel - #4

February 06, 2021 07:56 - 1 hour - 50.4 MB

Ryan Underwood is a Staff SRE and tech lead on the Helix and Zookeeper SRE team at LinkedIn. Prior to LinkedIn, he was an SRE at Machine Zone and Google. Apart from his regular responsibilities, Ryan’s interest and expertise include debugging production kernel, I/O and containerization issues. His opinion about not treating software as a black box and his persistent approach to debugging complex problems are truly inspiring.   On several occasions, Ryan’s colleagues have leaned on him to...

David Henke - On building a culture of "Site Up" at LinkedIn and Yahoo! - #3

January 23, 2021 08:55 - 58 minutes - 47.5 MB

David is LinkedIn’s former SVP of Engineering and Operations. He came out of retirement to join LinkedIn in 2009 during a time of rapid growth. After 4 years at LinkedIn, he retired in 2013.  Throughout his career, David has been in multiple leadership positions and has been recognized as one of the best Operations Executives. This was an extremely fascinating conversation. David shares insightful stories from early days at LinkedIn and what it took to develop the culture of “Site Up and S...

Julia Evans - On kubernetes scheduler bugs, TCP performance regressions and debugging tips - #2

January 06, 2021 06:49 - 46 minutes - 38.1 MB

In this episode, we speak with Julia Evans. Julia runs a programming zines business, called Wizard Zines (https://wizardzines.com/), where she creates comics about various programming concepts. She has been creating zines, when she was still a software engineer at Stripe. Her zines are extremely approachable and highly educational. In addition to creating zines, Julia is a prolific blogger and has around 500 posts on her blog at jvns.ca. Her blogs are another great source to learn about fund...

Kelsey Hightower - On ways kubernetes can break, being an effective leader and much more (#1)

December 04, 2020 08:08 - 1 hour - 60 MB

In this episode, we speak with Kelsey Hightower who is currently a Principal Developer Advocate at Google and one of the most influential individuals in the Kubernetes community. He is also an author and a keynote speaker, with a knack for demystifying complex topics, doing live demos and enabling others to succeed.   In this insightful conversation, we cover wide ranging topics from his role at Google to the art of storytelling. We get into some very interesting details of how Kubernete...

Kelsey Hightower - On ways kubernetes can break, being an effective leader and much more - #1

December 04, 2020 08:08 - 1 hour - 60 MB

In this episode, we speak with Kelsey Hightower who is currently a Principal Developer Advocate at Google and one of the most influential individuals in the Kubernetes community. He is also an author and a keynote speaker, with a knack for demystifying complex topics, doing live demos and enabling others to succeed.   In this insightful conversation, we cover wide ranging topics from his role at Google to the art of storytelling. We get into some very interesting details of how Kubernete...

Introducing Software Misadventures Podcast - #0

November 28, 2020 22:25 - 4 minutes - 3.94 MB

In this episode, Ronak, Austin and Guang share the origin story - who they are, what this podcast is about and why they are doing this.  They've seen first hand how stressful it is when something breaks in production but also found it to be the best opportunity to learn about a system more deeply. They started this podcast to have in-depth conversations with software and devops experts and hear their stories from the trenches about how software breaks in production. In upcoming conversat...

Twitter Mentions

@kelseyhightower 4 Episodes
@nathanmarz 1 Episode
@dj44 1 Episode
@josh_wills 1 Episode
@mipsytipsy 1 Episode
@b0rk 1 Episode
@tammyxbryant 1 Episode
@estola 1 Episode