About the guest:

Rich Burroughs is a Staff Developer Advocate at Loft Labs where he's focused on improving the happiness of teams using Kubernetes. He's the creator and host of the Kube Cuddle podcast where he interviews members of the Kubernetes community. Rich was one of the founding organizers of DevOpsDays Portland, and he's helped organize other community events. He also has a strong interest in how working in tech impacts mental health. Rich has ADHD and has documented his journey on Twitter since being diagnosed.

Find our guest on:

Rich’s TwitterRich’s LinkedIn

Find us on:

On Call Me Maybe Podcast TwitterAdriana’s TwitterAdriana’s LinkedInAdriana’s InstagramAna’s TwitterAna’s LinkedInAna's Instagram

Show Links:

Loft LabsLukas Gentele (Loft Labs CEO and co-founder)Fabian Kramm (Loft Labs CTO and co-founder)Kube Cuddle podcastKubernetesVclusterJoe Beda (Kubernetes Co-Creator)Craig McLuckie (Kubernetes Co-Creator)Ian ColdwaterKelsey Hightower TetrisHolly Cummins talks about zombie serversCustom Resource Definition (CRD)PostgresSQLiteElastic Kubernetes Service (EKS)CodefreshGitOpsKubernetes ReplicaSetREST APIMatt LemayAgile ManifestoSAFeADHD

Transcript:

ADRIANA: Hey, y'all. Welcome to On-Call Me Maybe, the podcast about DevOps, SRE, observability principles, on-call, and everything in between. I am your host, Adriana Villela, with my awesome co-host...

ANA: Ana Margarita Medina. 

ADRIANA: And today, we are talking to Rich Burroughs, who is a Staff Developer Advocate at Loft Labs. Rich, welcome.

RICH: Thanks so much. I'm really excited to be here. People may not know this, but Ana and I actually worked together at a previous job on the same team. So it's a pleasure to be talking with you both.

ANA: We just happen to be bringing SRE friends together to talk about this amazing space and ways that we can make it better, the cool work that we're doing. So it's just an honor to bring you on board and share some of your learnings in the past few years. 

RICH: Well, wow. I thought everything was going perfect. I didn't know we needed to improve things.

[laughter]

ANA: You mean you don't constantly reconsider every choice you make in life, from what you're wearing to why you have such photos on your Twitter or why you retweet ADHD memes? [laughs]

RICH: Yeah, I do still carry a decent amount of impostor syndrome, so...

ANA: [laughs] It's fair. I think we all do, and it gets better, and it doesn't get better. That's my words of advice.

RICH: [laughs]

ADRIANA: I think if you're good at your job, you do have some form of imposter syndrome. So I feel like it's a gut check that we're doing something right.

RICH: You know, it's funny; I was actually just tweeting about this a little bit ago. Somebody had mentioned the idea of people assuming you should know something that you don't know. And I work in the Kubernetes space, and I interact with a ton of people. And I bet that a lot of people would be shocked if they were to give me a Kubernetes quiz at how bad I would do. [laughter]

ADRIANA: I feel your pain. [laughs] 

ANA: To be fair, the ecosystem of Kubernetes is just constantly changing, and it's amazing to see such large projects move. But thinking four years ago, where I was leading Kubernetes 101 classes and letting folks get up and running from scratch on Ubuntu servers, looking at an hour, a lot of it is just manage Kubernetes. And then we were just trying to focus more on some resource consumption, and networking, and security aspects of stuff. I'm like, yeah, [laughs] I can't be doing these workshops anymore. I don't feel like I have this extensive knowledge down in the stack.

RICH: I mean, it's so complex. And that's a little bit of almost a trope at this point, you know, how complex Kubernetes is. But the reality is that there are all those little niches like security, and storage, and networking. You end up following people on Twitter or seeing their content or whatever who are experts in one or more of those areas. And I know in my brain sometimes I compare my level of knowledge with those people. 

And it's just not fair to me to compare my level of Kubernetes security knowledge with Ian Coldwater. Or I follow Joe and Craig; I mean, they invented Kubernetes, so obviously [laughs] they're going to know more about Kubernetes than I do. So I think that's part of it, too, is that, to me, learning and growth is about your own personal progression. And the important part is to try not to compare yourself too much with other people. But that's a lot easier said than done sometimes. 

ADRIANA: That's so true. One thing that I was wondering: how did you get into Kubernetes?

RICH: Well, I guess, just let me back up a little bit from where I was going to start. I have a long background or had a long background in operations. So I started as a sysadmin in like 1995 or something. And so I had worked with Linux for many years. I did lots of different kinds of ops roles. And in 2015, I was at this small conference here in Portland that, like most people, probably never heard that it was even happening. It was very much a local event. And this guy named Kelsey Hightower was there. And he was working at CoreOS at the time, and he gave this talk. 

And you can still find versions of this talk online if you Google Kelsey Hightower Tetris, where he was playing Tetris during the talk and using that as a metaphor for Kubernetes. And the idea being that these compute nodes that we no longer care about which nodes our apps are running on or things like that. That suddenly, these compute nodes are just a bunch of memory, and CPU, and storage. They're just real sources that are getting consumed. And that was the way to think of them instead of thinking of them as the host that the front-end app runs on, you know, which was very much my background. 

And I sort of fell into this niche fairly early on in my career where I was...I wouldn't call it a systems administrator. That was my title, but I don't think this position even exists anymore, but we used to call it more of like an application administrator. So I was doing manual deployments of applications, and managing their configurations, and troubleshooting problems at the app level of the stack. And so that's a lot of where my focus was. 

And so, so much of what Kelsey was talking about really spoke to me because I was that person in my shop who knew which services run on which hosts. I could tell you right away, oh, the front-end service runs on this host. And I dealt with so many of those things he was talking about and had felt a lot of pain. And so the idea that there was this platform that took a lot of these practices that we already were doing as ops people and just kind of built them into the platform, the scheduling and all of that, I just thought it was brilliant.

And I was hooked pretty much immediately but not really like a Kubernetes practitioner. Some people might be surprised to hear this, but I've actually not worked in a shop where we run Kubernetes as a main platform. So a lot of my experience with that over the years was just following along with the project and playing with it. But it was always something I was very, very interested in. 

And then, actually, I guess this would have been early 2020 or, no, late 2019. I had gone to a couple of KubeCons and really enjoyed that. And I went to the one in San Diego, and I was looking around and realized that I knew all these rad people in the community that I could have access to. And I had done some podcasting before and really enjoyed it. And I suddenly was like, oh, I could do a Kubernetes podcast. 

And a lot of podcasting or any kind of media stuff is getting access to people. And this is something that I think people don't think about necessarily. But if you're going to have a podcast, you have to have guests, right? [laughs] And so you've got to know some people or be able to get people to come on. And it just struck me that I knew a bunch of pretty influential people in the community and could probably get them to come on the show. So I started doing that. 

And then, about two months into the podcast, the lockdown happened. [laughs] And I was like, oh wow, this is the worst possible time in the world to launch a new project because now my mental health is in the garbage. And I'm just lucky to even be able to take a shower and get dressed, let alone try to sustain a project. But I managed to keep it going over time. 

And then now I'm working for my first time at an actual Kubernetes vendor, Loft Labs. So I've been there since April of 2021. So I've been really enjoying that. I just love this community. There are a lot of fantastic people in the Kubernetes community, a lot of super smart but also very, very generous and people with good values. Yeah, it's really great to be working with it for a living now.

ANA: It's kind of awesome because it's that, it's like when we think about the work of Kubernetes specifically, we are seeing just in general, the way that we've been doing infrastructure is actually getting revamped. And there are these things that are like acknowledging that the systems are so complex and that we can't do things the same way. So we get to start this transformation. 

And going to that little piece where we're talking about impostor syndrome, a portion of it is also understanding that because it is a project that is so large, it's going to be 100% fair for you to only know one vertical or to just kind of be like, I contribute to this open-source project but in very different ways. Like, I just spent the last three months being part of the version 1.25 release team as a communication shadow. And it was very interesting to see all the portions that happen in order to get version 1.26 out the door. 

And I was like, never did I know that all these little things need to be actually checked in on every two or three days. And it takes as many people to get it done for people to then have 1.26 and then providers to start adopting it. So sometimes you think that you're not making any contributions, but you really are, and sharing those stories is huge. It's like, it's what makes us learn a new topic, or find a new mentor, or even feel like we belong in a community.

RICH: This is another topic that came up not too long ago on Twitter. So I actually had my first pull request accepted into the Kubernetes project. And it's sort of a funny pull request because, in the Kubernetes project, there is a YAML file that controls the list of the channels that are in the Kubernetes community Slack. And we wanted to get a channel added for one of these open-source tools that I work with. And so I put in the pull request to do that, and the PR got accepted and merged finally. 

And so I was like, wow, I'm technically a Kubernetes contributor because I've got this PR merged. And I was talking about that on Twitter. And some people were like, "Well, no, you already were a contributor. [laughter] You're doing this podcast, and you're helping people who are using Kubernetes. And so you've already contributed a lot." And that's actually the point of view I have in general. And it's kind of funny because if I were talking to someone else, I would have said the exact same thing. But when it came to myself, I wouldn't give myself that credit which is interesting.

ANA: I can relate. [laughs] It's like we are so lenient and have empathy for others. But sometimes, we forget that empathy starts with ourselves, where we allow for failure to happen. And we also sit down and introspect and ask ourselves questions of what we want to do or why we're doing something, just like get all aligned.

RICH: Yeah, I can be very, very hard on myself.

ANA: I wanted to ask for folks listening; what exactly does Loft Labs do?

RICH: We're focused on Kubernetes multi-tenancy and self-service. So our commercial product, Loft, gives platform engineers a tool that they can use to give developers self-service access to Kubernetes environments. For me, I've worked in the past in roles where, like I said, I was working very closely with engineers deploying these apps and things. And there were times where the engineer had opened a ticket up for me, and I couldn't get to it for three days, and they were totally blocked. 

And I've been on the other side of that too, where I needed someone from...I needed a load balancer form setup, or I needed a port open on the firewall or something like that. And I was sitting around waiting for another team to do that for me. And so I very much felt the pain of being both the person waiting for someone to do something for them and also the person who feels kind of guilty because you've got this open ticket and you know somebody's blocked, but you just have other higher priority things that you need to be working on, you know. 

So I'm very much a believer in self-service. And when I first saw the product, that was one of the things that I thought was so important. And then I think that, like in terms of the multi-tenancy stuff, the thing that the platform has built into it is this concept of virtual Kubernetes clusters, which is a new way to share a Kubernetes cluster. 

To talk about it on a very high level, the idea is that you've got this cluster, and people tend to go one of two ways when it comes to provisioning clusters for tenants, for teams, either they do namespace isolation where they take one cluster, and they carve it up between a bunch of tenants and give them all a namespace. And that can work well in some scenarios, but it's got some problems too. Say that this is a dev cluster, and I'm a developer, and I want to be able to make CRDs that go along with my app. 

Well, as a normal tenant in a namespace isolated cluster, I'm not going to have access to those global objects like CRDs. So that's a problem, along with the fact that things just get really complex when you have to put in all these exceptions and network policies. Say somebody needs to have three namespaces, and they need them to talk to each other. There are all these things that can come up that make it more complex. 

And because of that, because it's hard to do, a lot of people default to the other option, which is the Oprah thing, you know, look under your chair, and everybody gets a cluster, [laughter], and that's just a nightmare. Like, from a management perspective, if you've got thousands of Kubernetes clusters lying around, besides the fact that it's expensive, it's like, how do you know that they're secure? How do you know what's running on them? How do you know that they're even needed anymore? 

Holly Cummins did a really great talk at KubeCon a few years ago about this. And she used the phrase zombie clusters for these clusters that are out there, and they've got workloads running on them, but they're not needed anymore. Like, nobody's actively using them. And the reality is that that has an actual impact on our environment because of the power and resources that are being used to keep these workloads running that don't even need to be running in the first place.

So it's like that meme of the guy who's sweating, and he's got the two buttons to push. And it's like [laughter] one is the namespace isolation, and the other one is giving everybody a cluster. And I had heard about this for years from people in the community. I'd heard about the pain that people feel with multi-tenancy. And so when Lukas, our CEO, approached me about potentially working with him, I took a look at the tool, and I was like, you know, they're addressing the multi-tenancy pain, which is a big, big problem with Kubernetes. And they're doing self-service. And I was like, this is just brilliant. These are two things that I think are very, very important. 

And since I joined the company, we've actually open-sourced that virtual cluster technology; it's called vcluster. And if people want to check it out, they can just go to vcluster.com. It's really fun. It's a fun open-source tool. Basically, what it does is you take a cluster, and you divide it up into namespaces like you would with namespace isolation. But what you're doing in essence is putting a virtual cluster inside those namespaces. So you've got a Kubernetes control plane inside a namespace on a shared cluster. 

ADRIANA: Oh.

RICH: And then the developers who are using that cluster connect to the API server of the virtual cluster. And so that's the view that they get is what the virtual cluster thinks is real. And so you kind of get the best of both worlds because you've got this shared cluster, and so you're not dealing with the management pain of having a million clusters, and you're not wasting all these resources. 

But to the user, it feels like a full-blown Kubernetes cluster. They can create namespaces in it. They can manage those global objects like CRDs and things. And so it really, really kind of addresses that sort of gap between those two options, you know, the namespace isolation and giving everyone a cluster.

ADRIANA: So they basically get full reign over their little kingdom inside this namespace. 

RICH: Yeah, exactly.

ADRIANA: And to them, they see it as a cluster. They're not aware, so to speak, of the namespace; they see it as a cluster.

RICH: Yeah, absolutely. And the way that it works is kind of interesting. So we started off with using K3s, which is a very lightweight version of Kubernetes. And we would stick a K3s cluster in that namespace. But the thing that the virtual cluster doesn't have is it doesn't have a scheduler. And so the workloads all still get scheduled by the underlying host cluster. 

And so there's this process called the sinker, and what it does is it tells the scheduler, "Hey, I need you to schedule this new workload." Or it comes back and tells the virtual cluster, "Hey, that workload that I scheduled for you isn't running anymore." It's really interesting. So most of the state of the virtual cluster is kept in a database inside the virtual cluster. It's SQLite by default, but you can also point it at CD or Postgres or something like that. 

The idea is that most of the state is kept inside the cluster. But then there are these lower-level objects like the pods that are actually dealt with by the host cluster still. It's pretty cool the way that it works. And it's a really fun thing to demo, actually, because you could give people that view where you can show them inside the virtual cluster; you only see your stuff, but inside the host cluster, you see the pods and the namespace for the virtual cluster, and all of those things. 

And I think that one thing I emphasize a lot when I talk about vcluster, the open-source version, is that I think it's just really fun to use. And I think that we don't talk about that enough, about whether tools are fun to use or not. Because, you know, maybe this matters more to me because I'm somebody with ADHD, and that's a big deal for folks with ADHD. If they find something fun and interesting, they're more likely to engage with it well. So I think that's part of it for me.

But I think, in general, there are so many painful ops tools out there. [laughs] And as a platform engineer, you work with so many tools that you're just kind of groaning about having to work with. And so if you get the idea to pick up a tool that is actually fun and interesting and work it into your platform, I think that that's a huge win.

ANA: It's like select all the internal tools that we've had to use, whether you work in like ops, or just in dev or SRE. [laughs]

RICH: Yeah, for sure. There's always some. There are always those things. And sometimes they're written in-house, and they weren't really built like a product. It's just something that somebody threw together to solve this one specific need. It doesn't have any kind of HA or liability or any of those things.

ANA: You mean like a really nice front end with a whole bunch of bash scripts on the back or cron jobs waiting to be deployed? [laughter]

RICH: I have never seen a front end with a bunch of bash scripts behind it. I have no idea what you're talking about.

ANA: [laughs] One of the questions that came up as we were talking about the work that Loft Labs and vcluster covers is for the users and the customers that are using these technologies. Are we seeing use cases that are more like, I'm trying to get a proof of concept, and I just want to put stuff in a cluster, or I'm actually a developer that really needs a virtualized cluster like as a dev environment? Or is it more like getting into staging prod environments that folks are using?

RICH: It's a great question. We haven't seen a ton of what I would call production use necessarily. I think there's more of that. I think that we're still pretty early on. This stuff was only open-sourced a little over a year ago, and so a lot of people are still vetting it. And there are still use cases that are developing. One thing about it is that it is a certified Kubernetes distribution. So it does pass the CNCF conformance tests. So I think that's something that can give people a level of confidence in it. 

Beyond the multi-tenancy stuff that we talked about earlier, there's one other really big aspect to it, and that's that it's super fast because you're not building a whole cluster. You're not building nodes and installing the kubelet on them and doing all of that stuff. Ana and I, in a previous life, used to do these workshops where we would have to spin up like 50 or 100 EKS clusters. It would take us days to get that done because...I think it's gotten faster since then. But at that point in time, it was like maybe 20 minutes to spin up an EKS cluster, maybe even a little longer.

ANA: Yeah. Every cluster, I think we ranged it from like 15 to 22 minutes, and it would kind of very much vary. And a lot of it was like where we were doing hand rollout or where we're using a third-party cloud vendor of like a managed Kubernetes. But it was also like years where Kubernetes just wasn't stable, or cloud vendors weren't making it as stable as we see it now because there were less customers. There was less money to lose. 

So we kind of had to go through all of our own reliability testing for these clusters because we are building reliability workshops to put in front of our customers. So it's like, developer trust is such a [laughs] high up, 100% uptime.

RICH: That's actually a use case that somebody brought up to me. I was on another podcast, and it really hadn't occurred to me yet. But they were like, "Oh yeah, I do workshops where I teach people Kubernetes, and I'm going to start using it for that." The reason is that it takes a few seconds to deploy these virtual clusters. Like I said, I think that EKS provisioning times have gotten better.

ANA: [laughs]

RICH: But they can't be a few seconds. It's definitely not that fast. And so the idea that in that sort of workshop scenario, you could carve up a big cluster into like 50 namespaces and put a virtual cluster in every one, it would be so much faster than deploying that many clusters.

ANA: Fine, Rich. You win. You got one person that is going to go check out vcluster, and it's one of your podcast hosts.

ADRIANA: Make that two. I'm like super stoked.

RICH: This is actually one of my OKRs is to get at least one podcast host to use vcluster. [laughter] I'm done with work for the quarter, basically. I can just tick it off.

ANA: You might be done for two years. [laughter]

RICH: But seriously, it is. That is definitely a use case. There are a lot of other ones. I think that dev environments is a great one. Dev environments; it's such a pain in the butt to manage your own minikube or Docker Desktop or whatever. And a lot of people want an environment that's going to be closer to the one that they're using in production. And so I think a lot of teams have moved towards this model where there is an actual EKS cluster or something that they're using for their development. That is a fantastic use case for it. 

I think CI/CD is a great one because a lot of times in a test run, you might want to just throw stuff away. Like, you do a series of tests, and you want to just wipe the slate clean and start over from scratch. And if that means provisioning a new cluster again, that speed means that you get faster feedback from your tests. And there's a bunch of other ones too. 

One thing that we're seeing now, which has been really exciting, is people using vcluster as part of their own tooling that they're building. So Codefresh actually put out a really cool blog post about they had this new GitOps-related feature where it was like, I think it's like feature branches for PRs. And they were using vcluster for that. And actually, when we were at the KubeCon in Valencia, there was one very big Kubernetes vendor that was actually using vcluster for the demos in their booth. 

I think there are a ton of use cases. I think that anytime you're in that situation where you're trying to share a big cluster among a bunch of people, I think that those are pretty obvious use cases. But again, I think the speed thing is just so important. And anytime you're in a situation where you want to be able to build clusters quickly and throw them away and start over, there are so many application engineers nowadays who have to write stuff that run in clusters. And do you expect them all to be Kubernetes experts? Probably not; I hope not. Some of them want to be. Some of them are drawn to it, and they're interested in it. 

But really, in the end, you're paying these people to develop an application or a service that's part of your product, and that's what is delivering the business value, not them learning how to troubleshoot problems with Kubernetes. And so, honestly, I think that it's a pretty cool thing to be able to say, all right, my cluster has just hosed up. I could spend half a day troubleshooting it, or I could just delete it and have a new one in three seconds.

ANA: We're not even having the larger conversation around just the cost of it where it's from the engineering perspective of how long it took to actually spin up a cluster to the infrastructure cost of using it and then figuring out how to bring it down. Because we know that doesn't always ends up being an easy task unless you've done it before. [laughs]

ADRIANA: Or cleaning, because then you always end up with these dangling things that you're like, why am I still being charged for this? [laughs]

ANA: Please stop looking at my AWS bill, Adriana. That's like confidential information. [laughter]

RICH: It's all very true. And I think that the other thing to consider too is developer happiness. And that's something that I think people talk about a lot more nowadays. But it's like, every time you're sitting around waiting 15 minutes for a cluster to get provisioned or, even worse, opening up a ticket for somebody to provision you a namespace or something like that, that's times that engineers would rather be doing their job. And they feel bad probably about the fact that they're not able to, and they're going to be more productive and deliver more if they don't have to do that. 

But absolutely, the cost is a big thing. In Loft, we actually have this thing that's pretty cool that's called sleep mode. And you can do this manually in vcluster too. You can put a vcluster to sleep. But basically, what the sleep mode does in Loft is you can say if this cluster has been idle for X amount of hours...or you can even schedule it like during the hours of like 7:00 p.m. to 1:00 p.m., I mean, 1:00 a.m. or whatever, a time that you know it's not going to be used, you can put them to sleep. 

And the way the sleep mode works, it's pretty cool, is it actually works on replica sets. And so what it does is it makes an annotation of the number of replicas that were there. It then sets the number of replicas to zero. So all of the pods get deleted, and then when it wakes back up later on, and that can either be through an activity like somebody making an API request or something, or it can be, again, because of the scheduled time ending. Once it wakes back up, then that original number of replicas goes back into place. 

And so all the workloads have gone away, and so the computing stuff that's using to power those workloads is no longer needed during that time. But everything is still there. All the configuration is still there. You don't have to re-provision the cluster afterward. It's just literally just telling Kubernetes, hey, I want these pods back.

ADRIANA: So is it seamless then to wake up your cluster after? 

RICH: Yeah, depending on how you configure it. So like I said, it can be configured so that it's going to happen just based on traffic and whether it's idle or not, or you can schedule it. It's pretty flexible. And that's something where you can just do the math. And if this thing is up and running and these workloads are running 24 hours a day, and I can suddenly cut that down to eight hours really easily, then obviously, you're going to save some money. So that's one of the really big attractive things about Loft as well is that it's got this really easy way to save some money on your Kubernetes spend built into it. 

ADRIANA: This is so cool. I'm dying to try this out. Honestly, I was doing some Kubernetes prototyping work a few jobs back, and I was managing my own Kubernetes cluster. And I was constantly bringing it up and destroying it. And the wait times, you're like, okay, I guess I'll grab a coffee while this thing provisions, and oh, by the way, it crapped out during provisioning. What did I mess up? So it's so nice to have this as an option. I wish I'd had that a few years back.

RICH: Yeah, same. [laughs]

ANA: The amount of times where it literally was like, I am just waiting. Like literally having to relive my code is compiling in Java, or C++-type moments where it's like, I'm just chilling. I'm getting paid to chill. This feels really weird. [laughs]

RICH: And you can pause vclusters, and that works the same way. So it actually spins the whole vcluster down; literally, even the pods that are part of vcluster get suspended. But that's something more where you can't do it on a big level without building some tooling yourself. To me, the cost stuff is more important now than ever with the situation we're in with the economy and people getting laid off and stuff. I imagine that most shops are probably thinking more about what their Kubernetes spend is than they were doing two years ago.

ANA: I hope so. I mean, even also on the climate, just looking at how much footprint we're using with our infrastructure, our data centers, and every other tooling that we use. And the hot summer that we just had, like fires out in Europe, like, places that weren't like that just like a year ago. There's a lot of stuff happening that hopefully is bringing more awareness but highly urgent. Please look at me because I'm going to go away soon. 

RICH: Yeah. When I saw the sleep mode feature, actually, that was one of the first things that kind of came into my head. And that talk of Holly's that I enjoyed so much, I'd actually seen that before joining the company. And it was kind of funny because I think that the folks who built it were very much focused on the cost thing. And when I was like, "Hey, this is really good for the environment too," they were like, "Oh yeah, that's awesome." [laughs] 

ANA: A bonus.

RICH: Yeah, yeah. And, again, like I said, it was kind of funny. So I'd been doing the podcast Kube Cuddle. I'd been talking to all these folks in the Kubernetes community and really enjoying it. I parted from a job and was just looking around and said on Twitter, "Hey, I need a job." And I got this DM from Lukas, who's the CEO and Co-Founder there at Loft Labs, and I'd never heard of him, never heard of the company. I just thought this was going to be some little startup that I'm not interested in. 

And in fact, at that point, I had been so burned by working at early-stage companies that I was like, I want to work somewhere really big. I was applying at places like VMware. [laughter] And I was like, I just want to work at a giant company. But when I took a look at this stuff that they were building, what I really saw is that it was addressing real pain that people feel and real problems that are there with using Kubernetes and also doing it in really smart ways. Like, Loft is super Kubernetes native under the hood. It's like a bunch of operators and CRDs and stuff. 

It would have been easy for somebody who didn't have the Kubernetes knowledge that our co-founders have. Lukas and Fabian, our CTO, are both just super, super sharp, and really, really smart when it comes to Kubernetes. I think that somebody who didn't have that kind of background easily could have designed this thing in a whole different way that had some sort of REST API that people interacted with or something like that. But this is like, it feels like Kubernetes.

ANA: I mean, that developer experience is so important because you also don't want to be a big abstraction layer from what's under the hood, especially when we're thinking about new tooling and giving control back to the developers. In a sense, that's part of what it does fix. 

RICH: Yeah. And I think that, again, like we were talking about earlier, you know, and I felt this way for a long time, is that it's not fair to expect product engineers to be Kubernetes experts. 

ANA: [chuckles]

RICH: And I think that it's arguably really counterproductive to even think that way. So being able to give them some sort of an interface where they can really easily spin up an environment for themselves really quickly and just get their job done, I think that that's big on multiple levels.

ADRIANA: I feel like this caters really well also to the control freaks who want to get their hands on all the things. Like, I personally love doing things myself. So having to say, hey, put in a request for someone to create a cluster, or namespace, or whatever, for me, it's like, God, just shoot me now. [laughs] It's torture.

RICH: Yeah, absolutely. So I think that that aspect of it is really big. And again, with vcluster, you still can use it on your laptop. You can use it with Docker Desktop or with minikube or whatever else. And I think there are reasons to do that, too, which is that you don't have to reset your whole Docker Desktop VM. [laughs] I'm sure we've all done that at some point and had to wait for it to reset. Oh, and that's a cool thing, too, that happened recently. So Docker Desktop now has these extensions. And so we built a Docker Desktop extension for vcluster where you can literally go into Docker Desktop.

ANA: Cool.

ADRIANA: Awesome.

RICH: And there's an extensions interface now. And you can just click a button to deploy a vcluster instead of even having to use the CLI. 

ADRIANA: That's super awesome.

ANA: Meeting developers where they're at. Do you ever spend any time thinking about what's coming after Kubernetes?

RICH: I don't. I mean, I can't say that I'm necessarily great at predicting the future like that. There are people, I think, who are. It's so hard because it's like, who's ever going back and checking how often they're right? [laughs] Like, the people predicting the future now, like, what's going to happen in 5 or 10 years, a lot of them are probably wrong. 

But I don't expect Kubernetes to go away anytime soon. I don't think that it will be around forever, for sure. I also don't think that it's the best tool for every use case. And that's something that I've always felt very strongly. There are tools that I enjoy using, and that if I get an excuse to use them like vcluster, I will. But these are tools to accomplish some sort of task that we have. To get religious about that, to get super dogmatic about which one is better, to me, is, in general, a pretty big waste of time. So if something comes along that's better than Kubernetes, then I'm there. Let me know when it happens. [laughter]

ADRIANA: Totally totally agree. On a similar note, talking about people using Kubernetes for the sake of using Kubernetes, what's your take on all these...so many organizations have started to ditch the monolith for microservices; what's your take on that?

 

RICH: That has been a super interesting thing to think about recently because I think that actually, in the last few years, what we've seen is more people popping up and saying, "Hey, monoliths are pretty rad. [laughter] Maybe we all don't need to use microservices for everything." And it's been funny to me. I've been in the industry for a long time. I started in 1995. I don't give people credit based on the length of time they've done something. You could do something for 25 years and still be terrible at it. 

One thing that I think it does give you is you start to get a view of the thing that's different than if you've done it for a year. And you start to see patterns that repeat themselves. And it's kind of funny because a lot of what I'm seeing nowadays sort of feels like ten years ago. Like I said, I worked in that kind of role for a long time where I was the one deploying the applications and the one configuring which application talks to which database and what load balancers are they using, and all of that stuff. 

And so what I saw on my job, I was in a shop for many years where we had a bunch of Java services. And we started off with three Java services. We had like a front end and a back end. And there was an audit service that just wrote down all the requests. And we went from those three services over the course of a few years to like 30. And everything got so much more complex, and suddenly, we had so many more firewall rules. 

And sometimes the firewall team would like...one of the rules would go away that we needed. [laughs] And they were doing many things. And it just all got harder, so much harder. And that's one reason why I think Kubernetes really took off is because, at that point in time when it came along, a lot of people were in that situation where they were using these microservices, and things were getting harder to do manually. It's like anything, you've got to look at your use case, and you've got to decide what makes the most sense for what you're doing. 

And if you're a little startup and you're just starting out thinking that we're going to build a microservice for everything because five years from now, we're going to have independent teams who need to interact through API boundaries, maybe that's not the way to go. Maybe the way to go is to just build exactly what you need to make your stuff run.

ANA: It does feel kind of nice to see folks being outspoken about monoliths again. Coming from a shop that had way too many microservices and having to figure out critical path and not being able to understand it that's kind of hard. And at the same time, it's just like, maybe you thought this was the right way. Technology is a type of industry that is able to reflect and correct. So it's like we've been able to see, like, oh no, we started running to microservices. 

But at the end of the day, the overhead or the benefits are not even that great. And then it's this complexity that makes it even harder to operate. I remember I've seen those conversations happen a lot more than, I would say, maybe even just two years ago because two years ago, we were really pushing forward more of a microservices world. And now we're like, wait, no, actually, have you read these five blog posts that explain the pros and cons of both?

RICH: Yeah, I think that, you know, I was very much in the trenches of this stuff being in the kind of roles that I was in. And what I saw when we started using more microservices at the job I was at was that it made it easier for the product engineers. It made their life easier the fact that they were only responsible for this one section of code and that they could rely on the APIs to talk to their dependencies and things. But my life got way more complex. [laughs] And so the complexity doesn't just disappear. You're just moving it around. And maybe that's the right thing to do. 

Maybe if you're Netflix and you've got a structure where you got all these independent teams who you want to empower to work in their own way, maybe the fact that there are these clear API boundaries that they use to interact is a critical thing for you, but for a lot of companies, it's not. Yeah, it was hard in the old days, too, when you wanted to release a new thing, and you're waiting on the new version of your dependency to get released because you need that new API. And so you've got to wait and deploy them both at the same time.

ANA: How about just managing dependencies in general, like having to have versions of microservices? [laughs]

RICH: Yeah, so it's never easy. But I think that it's really about looking at your use case very honestly, trying not to get caught up in the hype, and really looking and seeing what it is that's going to work for you the most. And there's nothing wrong with thinking a little bit forward. There's nothing wrong with saying, "We're going to make a certain choice now architecturally because that is going to help us down the road in a year or two." I think that you've got to really think about the trade-offs. 

And I feel like in our kinds of roles in engineering roles, and in ops roles, DevOps, all of that stuff, that there's that joke that if you ask a senior person their opinion, it's going to be it depends, and that's very much the case with me. It's all about trade-offs. And it's all about trying to find out what's going to work for you best in your team.

ADRIANA: Yeah, rather than jumping on the bandwagon of ooh, new and shiny, and I read a paper about this, and therefore we must all do it. Let's pivot the whole company. 

ANA: [laughs]

ADRIANA: Because I think that's the trap that a lot of organizations fall into where they're like, well, everyone's doing microservices. Uber does microservices, and Netflix does microservices, and therefore, we must do it. And I feel like they're not necessarily aware of what can of worms [laughs] they've opened. I feel like a little more thought needs to go into some of these decisions that don't necessarily get made because of the allure of the shiny object.

RICH: Yeah. Another thing we were kind of joking about on Twitter recently was the, you know, I asked how many people had tried to copy the Spotify org structure. I don't know if y'all remember this. This would have been probably in the mid to late 2000s. There was all this hype around Spotify. They had these cross-functional teams that they called squads and so much of a better way to develop software. And so many people jumped on board and were like, yeah, and it just didn't work out for a lot of them. And I think it didn't even work out for Spotify. [laughter] I don't remember all the details. 

There's a thing that makes me laugh whenever I think about agile. There's this guy named Matt LeMay, who is super smart. And he wrote a really great book about product management that I read a few years ago. And he talked about the Agile Manifesto. And if you read the actual Agile Manifesto, it basically says, "Hey, figure out what's right for you and do it." Don't be prescriptive. Look at what your team really needs and develop a process based on that. And now you've got these people who got like 20 letters after their name with certifications and stuff.  

ADRIANA: Oh my God. That's so true. 

RICH: And it's like, it's super dogmatic. It's like a religion. And it's completely the opposite of that view of the manifesto. And so it's funny how that can happen and how things can just get away from us. And this is kind of a form of self-care for teams, too, right?

ADRIANA: Mm-hmm.

RICH: To not get sucked into the hype but to really be honest about what it is that you as a team need.

ADRIANA: Yeah, like, how are you as a team productive? Rather than, well, this team did that; therefore, we must do it. I totally agree with you on the agile thing. I always talk about people are focusing too much on the capital A Agile where they need to focus on the lowercase A agile. It's about agility, [laughter] not following some prescription of scrum or safe or whatever the hell, like, just do what works for you and makes you productive.

RICH: Oh my God, safe. [laughter] You have to blame safe because that's a real danger. 

[laughter]

ADRIANA: I am in agreement.

RICH: To those of you who may not be familiar with it that are listening, safe is basically like we're going to do agile. We're going to do two-week sprints, but we're going to plan like eight weeks of them or more.

ADRIANA: [laughs]

RICH: We're going to plan in like a whole quarter out instead of the two-week sprints. It's just so dumb, and it's caused many people tears, I'm sure.

ANA: [laughs]

ADRIANA: Yeah, yeah. I feel ya.

RICH: I think they ended up doing that at that shop that I was at as well, the shop that was excited about the Spotify squads. [laughter] They ended up doing the same too.

ADRIANA: Do all the things. [laughs]

ANA: As we're getting your hot takes and we're getting towards the end of the podcast, [laughter] what do you think people are doing wrong in the SRE space?

RICH: That's a good question. I guess I would say in general, and this goes along with what we've been talking about, but I think that getting too focused on tools, feeling like they need to use whatever newest, hottest cool tool comes up. There was a point in time where...was it Etsy who had the famous blog post about using boring technology? It was like we're using Apache and MySQL, and we're using boring stuff on purpose because we know it's super well-vetted. And you can get information about problems when they come up. And I think that there's a lot to be said for that. 

I think at the time that I saw that post at; first, I probably even mocked it, which is kind of funny that my feelings about this stuff have really changed a lot. I feel like to me, SRE is really about enabling the engineers building the products. And so the way to do it...I guess the other thing I'd say is don't view the Google Books as like a Bible, right? 

ADRIANA: Yes.

RICH: Like even that initial Google SRE book. Years ago, I went to SREcon. I haven't been there in a few years, and I really miss it because it's such a fun conference. But I was at one years ago where I had the opportunity...I ended up meeting Google SREs, who were on three different teams. And one of them was on one of the newer teams; I think it was like the Cloud Spanner team. And so they were very much the sort of like the Google...at least at that point in time, they were following the cutting-edge Google SRE practices. 

And then I met this other guy who was on this team. And I don't know for sure what product it was, but he mentioned it was one of the stateful services. So who knows? It might have been Gmail. I don't know. 

ANA: [laughs]

RICH: But they were kind of the opposite. They were people...they were super cautious, and they didn't want to break anything and moved really slow. And so it was really interesting to me as an outsider who had this sort of romanticized picture in my head about what Google SRE was to realize that because these SREs were embedded on these different product teams, their experiences differed really greatly just depending on what team they were embedded with. 

So it wasn't, at least at that point in time, just this big monolith. And it's like, we're Google SRE, and we all do things exactly the same way. This was several years ago. This was maybe five years ago. So it's possible that it's much more that way now. But yeah, so many of us don't have the problems that Google has. And we don't necessarily need to do things exactly the way that they do. 

I do think that some really core SRE principles like error budgets and SLOs are super valuable and that there aren't many teams that are not going to benefit from using those things. So I think definitely focusing on those things is important and really helpful. I just interviewed on my own podcast, on Kube Cuddle; I just interviewed Justin Garrison from AWS the other day. We were actually talking about that idea of, like, in the old days, it was like the CPU on this host is really high, or the disk is filling up on this host.

And it's like, for people who've been around for a while, they're more likely to have that kind of point of view. And you really need to shift away from that and think a lot more about service health. Is the service operating? Is it healthy? Now, that doesn't mean you can let all the disks fill up. But the focus really should be on the service health. And that's really the big thing that you should be measuring and looking at.

ANA: Yeah, services, users, customers, like, every shop is going to call it a little differently. But it's that, like, find something that matters for your business. And SLOs is a perfect way to bring those two together; definitely been talking about those a lot more. We also had Liz Fong-Jones, who her hot take was also that, like, you can't use a Google SRE model for every single type of job, like, it's not a one size fits all. And I personally think that I 100% agree with that statement. 

And we're starting to see more of those conversations happen where people are like, oh, I'm doing SRE, but I only took a few principles out of the SRE model. Or we are only focusing on these little nuggets that two SREs can do versus trying to do everything that Google is doing? 

RICH: Well, I have to say you just made my day because if I had the same hot take about SRE as Liz did, [laughter] wow, I'm very impressed with myself. Liz is definitely one of those people that I put up on a pedestal when it comes to SRE stuff. She's so smart.

ANA: My question to her was, what are people doing wrong in the SRE space? So yes, [laughs] it's true, though, like, I talk about the Google SRE book, and I call it a Bible because it is a Bible. But at the same time, it does not mean that once you read it, you have to go to your management team and say, "We're about to go through all these chapters, and we're going to start implementing them." You need to do a lot of undercover work on what's working, what's not working, and what does your business need. Are you literally just always down? Is your team always firefighting? Is there a lack of cross-collaboration?

RICH: Yeah. It was kind of funny at the time when I was at that SREcon where I met those SREs from Google. I afterward had a talk with a friend about it who's a very experienced DevOps person that I probably shouldn't name. But I was talking to them about this, and they said, "Oh, well, the SRE book is very aspirational." [laughter] And I thought that was probably a pretty fair characterization based on what I'd heard. 

And I don't think it was intended to be a Bible, either. It's a bunch of chapters about random topics, not random but these different topics. And the idea that everybody is addressing these topics in exactly the same way was maybe not accurate at that time anyway. I think that, yeah, it's really about finding out what works for you and your team. And, Ana, you, I'm sure remember… from when we worked together.

ANA: Yeah. 

RICH: Really, really awesome person. And he was on a podcast I hosted where we talked about this quite a bit. And he talked about the value of understanding the business. And I think that a lot of times, someone in kind of more IC, tactical role doesn't necessarily even understand the business that well. But understanding how is it that your company makes money? Just even understanding that, like, what it's like to be a customer of that company, and what their experience is like, and all of those things are very important things that can help you when you're looking at those trade-offs and making those choices about things like technologies and what you should be working on.

ADRIANA: And I think on the other side of it, though, then you also need the executives to take a look at what the folks on the front lines are doing as well because I feel like oftentimes, those folks can be so far removed that they don't have that kind of understanding. So having the convergence of those two pieces of information, I think, is what empowers you to be able to have a successful SRE organization, right?

RICH: Yeah, communication is so hard. And I think it's like...especially these early-stage companies that a lot of times when you're going through this hyper-growth phases where you're hiring a bunch of people, we're kind of in this situation now at Loft Labs. When I started a year and a half ago, I was employee number 4, and I think now we're at 20. So it's not like we've gone crazy. I think our founders have been very thoughtful about the hiring they've done, but we're hiring pretty actively right now. 

And we're already seeing a little bit of the growing pains at times, you know, about how do we facilitate communication with 20 people in 7 different departments now or whatever it is? Because that's a lot different than four people talking to each other. And then you suddenly get to the point of a big enterprise company where you've got these multiple levels of management. [laughter] That information has to filter through both up and down. It can get a lot worse. So, yeah, I think you're absolutely right. 

ADRIANA: Cool. I think we've reached our time. So thank you so much, Rich, for joining us today. It's been a really, really fun conversation. And it's always nice to be able to touch upon some of the same topics that we've discussed with our other guests because I think it really helps to reinforce a lot of these really important points. So, yeah, thanks so much for that.

Now, folks, don't forget to subscribe and give us a shout-out on Twitter via @oncallmemaybe. Be sure to check the show notes on oncallmemaybe.com for additional resources and to connect with us and with our guests on social media. For On-Call Me Maybe, we're your hosts, Adriana Villela and...

ANA: Ana Margarita Medina. Signing off with peace...

RICH: Love and code.

Twitter Mentions