About us:

Adriana Villela is a Sr. Developer Advocate at Lightstep, based in Toronto, Canada, with over 20 years of experience in technology. She focuses on helping companies achieve reliability greatness by leveraging Observability, SRE, and DevOps practices. Before Lightstep, she was a Sr. Manager at Tucows, running both a Platform Engineering team and an Observability Practices team. Adriana has also worked at various large-scale enterprises in both individual contributor and leadership roles, including Bank of Montreal, Ceridian, and Accenture. Adriana has a popular technical blog on Medium, co-leads the OpenTelemetry End-User Working Group, and is a HashiCorp Ambassador. Find her on Mastodon at @[email protected] to talk all things tech.

Ana Margarita Medina is a Staff Developer Advocate at Lightstep, where she speaks on all things SRE, DevOps, and Reliability, and is a podcast host for On-Call Me, Maybe. She is a self-taught engineer with over 12 years of experience, focusing on cloud infrastructure and reliability in the last few. She is also part of the Kubernetes Release Team (v1.25 - v1.27) and has been advising CNCF's Keptn project since 2019. When time permits her, she leads efforts to dispel the stigma surrounding mental health and bring more Black and Latinx folks into tech.  Catch her on Mastodon at @[email protected] about traveling, diversity in tech, and mental health. 

Find us on:

On-Call Me Maybe Podcast TwitterOn-Call Me Maybe Podcast LinkedIn PageOn-Call Me Maybe Podcast MastodonOn-Call Me Maybe Podcast InstagramOn-Call Me Maybe TikTokOn Call Me Maybe Podcast YouTube ChannelAdriana’s TwitterAdriana’s MastodonAdriana’s LinkedInAdriana’s InstagramAna’s TwitterAna's MastodonAna’s LinkedInAna's Instagram

Show Links:

AWS Well-Architected Framework - Sustainability PillarAdrian Cockcroft on "Architecting for Sustainability"HashiCorp NomadTracetestMalabiHeliosTrace-Based TestingHashiTalks 2023Julie GundersonConf42: Don’t Forget the Humans with Julie Gunderson and Ana Margarita MedinaKeptnLitmusChaosChaos MeshArgoCiliumBackstageKubernetes Release TeamCrossplaneAdriana’s blog series on running ArgoCD on KubernetesESCAPE 19: Chaos Engineering in a Multi-Cloud World with Ana Margarita MedinaOpenTelemetry End-User Working GroupOpenPolicy AgentAndi GrabnerObservability-Landscape-as-CodeTerraformService-Level ObjectivesStephen TownshendSlight Reliability Episode 39 - The Future of SRE with Adriana Villela and Ana Margarita Medina

Additional Links:

CNCF Working Group on Environmental SustainabilityCloud Carbon Footprint Project on GitHubTechstrongTV - What 2022 Taught Us About SRE’s Future

Transcript:

RIAAN NOLAN: Don't worry about failure. Fail fast if you do fail. None of us are superstars or anything, so your name doesn't mean shit anyway. [laughter] So put your code out there; if people think it sucks, it sucks; if they like it, they like it. The best thing is just paint your picture. Do your thing, and put it out there because that will help you grow. 

LIZ FONG-JONES: And I think this is an area where we, as developers of observability tools, can really help because an SLO has to be a living, breathing thing, not just a thing that you put up on a dashboard and you look at it 90 days later and, oops, we blew our SLO.

NORA JONES: But we're also all engineers. We've been a part of the big technical aspects of it all. And we've seen the social aspects not really be spoken about very much. I think it's a big miss. And it's honestly a business advantage to be able to talk about the social aspects as well. So we're really trying to give every company that advantage.

ADRIANA: Hey, y'all. Welcome to On-Call Me Maybe, the podcast about DevOps, SRE, observability principles, on-call, and everything in between. I am your host, Adriana Villela. And with me, I have my awesome co-host...

ANA: Ana Margarita Medina.

ADRIANA: And today, we are kicking off Season 2, and I'm super, super, super excited to be doing that. We've had a little hiatus over the holidays, and we've had time to get refreshed and revitalized. Ready to take on 2023, hopefully, knock on wood. How are you feeling? [laughter]

ANA: I'm hoping that the year continues to get better and that this year is better than last year, and that we see more amazing things happen in the industry, but that everyone takes care of themselves a little bit more.

ADRIANA: Yeah, cheers to that. Now, in proper OCMM fashion, we must ask each other what we're drinking. So, what do you have today? [laughs]

ANA: This morning is my famous Guayakí's organic Yerba Mate in enlighten mint flavor. You'll probably find me either having a latte or one of these as I kick-start my day. So we're still rolling out here in the Bay Area. We're trying to get the engine running for today's day. How about you, Adriana?

ADRIANA: I've got green tea. I made myself green tea just before we recorded because I figured I need something a little bit more interesting than the water. I mean, water is great, and drink your water, but green tea today. It's kind of a dreary, rainy day in Toronto today, not compared to the rain that y'all have been getting in California, so big hugs to you. Yeah, every day in the news, there's new stuff [laughs] about the rain. I'm like, ehh. [laughs]

ANA: There's definitely a big hug sent out to California. But I feel like there are so many places in this world right now that, with global warming, they're suffering from such terrible conditions of weather. The amount of flooding that I got to see in pictures and online and in parts of California was just mind-blowing that it can happen. But it's just a reminder that we take care of the world. 

And even though this podcast focuses on reliability, sustainability is equally as important. And maybe this season, we will get a guest that talks a little bit more about that, or next season. Or if y'all want to chat with us on social media, we're always happy to hear folks' thoughts on it.

ADRIANA: Yeah, absolutely. And also, a note for our guests: we now have a presence on Mastodon, TikTok, and Instagram in addition to our LinkedIn and Twitter presence. So you can find us on multiple social media platforms now.

ANA: And you can even put in a request there of your favorite podcast guest that you would love to see be part of On-Call Me Maybe.

ADRIANA: Yes.

ANA: Questions that you want to ask any of our guests, questions for us, any feedback. We'd love to hear it. Let us know what you thought about season one or if you missed us between this break.

ADRIANA: Oh, you know, I want to circle back to what you were saying about sustainability and technology because I don't know if you feel this way. So I've always, like, the environment has been near and dear to my heart since I was really young. And I kind of feel guilty working in tech knowing that the type of work that we do contributes to a certain extent to environmental problems because when you think about things running in the cloud, you've got servers running all the time, consuming a crap ton of energy. And I feel a little bit guilty. And I'm always curious as to how there can be a marriage between what we do and environmental sustainability if there's sustainable tech.

ANA: Hmm. I think that is an interesting question, like, is there sustainable tech? I think, as an industry, we are waking up to the conversation of sustainability slowly. And it has increased more than ever in the last two, three years. And once again, I think this is actually something to attribute to COVID. We've had more of a demand within the supply chain. We've also had slowdowns on air pollution due to travel, and folks are starting to realize, like, oh, if we were to make this change, our world is happier, our world is sadder. And I think that cause and effect made people be like, oh shit, my actions do matter. 

And then specifically to technology, I remember...I want to say it was three or four years ago. If you follow Amazon Web Services, they have something called the AWS Well-Architected framework. It covers five different pillars of how organizations can actually be building their applications. This covers things like making sure that you have a reliability pillar, that you have a security pillar, that you have an operational pillar. 

But two, three years ago, one of the vice presidents at Amazon Web Services, Adrian Cockcroft, created the Sustainability Pillar and created this entire sector within Amazon Web Services that was going to focus on reducing the footprint that they were imprinting in this world or just letting their customers be more aware of it. And I thought a big organization such as them taking that step is a step, and it gets the conversation moving. 

And then it trickles down to what actionable things can companies do? Well, even starting to look at capacity planning is one way to move forward in the sustainability conversation. Like, if you have all of your servers constantly just using 20%-30% of your resources, maybe you shouldn't be running a fleet that large, and maybe you can make it a little bit smaller. And therefore, your footprint is a lot less, and you're damaging earth a little less.

ADRIANA: Yeah, that's so true. That's so true. Because I think sometimes we get so caught up on reliability that we need to make sure customers are happy, systems are up and running that sometimes it can be easy to over-provision resources, especially when you have the cash to burn.

ANA: [laughs]

ADRIANA: Maybe not so much when you don't have the cash to burn. In that case, yeah, it's like...[laughs] and it's worthwhile doing an audit of your systems to make sure that, like, yes, I actually need all of that capacity versus, oh, shoot, I'm using like 10%-20% of my resources.

ANA: Do you have any thoughts on some of the things folks can do to audit what their spend is? I know that there are consultants out there that would help you lower your cloud footprint bill. But when we're thinking about this audit, a lot comes to mind. Because I think this is actually something interesting to throw out there as we kick off the New Year, as folks are going into their jobs thinking like, what are we going to do to survive the economic atmosphere that we have but also bring in a positive value to this organization?

ADRIANA: I'd say the biggest one for me is definitely making sure that you're using what needs to be used. A more maybe not so direct to cloud usage but even related to a certain extent to the type of work that we do, which does involve some travel, is reducing business travel. That's a huge one. I would say consulting travel that's a huge one.

So when I started my career back in 2001, I worked at a consulting company, and it was expected that you were going to be traveling to work at client sites. You were lucky to get an in-town project. I was actually quite lucky that most of my time with the company, I worked on in-town roles. But my husband, who's still at the company, he spent, I think, the first ten years of his career traveling for work. When the pandemic hit and all of a sudden, everyone's working from home. To me, it was like, oh, well, this finally proves to folks that you don't have to travel for all the work things.

ANA: [laughs]

ADRIANA: So, to me, it's like, make those travel plans if they're absolutely necessary because a lot of the stuff you can accomplish just working from home. Even traveling to an office, if you're working from home, it means that you're not commuting to an office. So it means that you are not driving or you're not taking public transit, which mass transit is awesome, and I will take mass transit whenever I can. I personally hate driving. But even taking mass transit, if you don't need to take that mass transit to get to the office, you're already, I'd say, helping offset some of that carbon footprint compared to having to commute.

ANA: That is true. And it kind of makes you wonder, like, what goal, what wish list can you put this on on reducing your footprint as an individual, as an engineer, as an organization? Like, what small step can you take today that could actually accumulate by the end of the year to be a pretty significant amount?

ADRIANA: Yeah, totally. Another one that comes to mind is I remember at some workplaces; I would always try to log in and out of...when I had a tower, I'd log in and out of my desktop power off for the day. But some companies will put so much crap on your machine that the process of booting up your machine and shutting it down for the day is so lengthy. I remember for a while, at some companies, I'd just leave my machine on.

It kind of defies logic that there's so much crap on your machine that now you're compelled to leave it on to get a little extra bit of productivity, so even something simple like having the ability to power off your machine. I think with a laptop; it's a lot easier because even though I don't power my laptop off and on regularly, I don't always have it plugged in. I'll only plug it in when I absolutely need to. 

My personal habit is I always turn off my power bar when I'm not using it because even though it does consume a small amount of power, it still consumes power. So it's these little, little things that you can do in technology to help offset the footprint. And if you've got enough people doing that, well, you can actually make a pretty decent dent.

ANA: I 100% agree. I want to say that I don't always take a lot of these actions, but they definitely come to mind sometimes where it's like, it's that small thing of unplugging your laptop before a long weekend or any weekend. Or just kind of being like, you don't need to be connected to power to run updates, or even to just constantly be charging because I'm not going to be using you for a longer length of time. 

But even like you said, I think it's actually a really nice idea to be like; this is the power strip for my entire work setup. When I'm not at work, the entire strip itself can be off. That's actually something I've always wanted to implement for when I travel; when I go away that it's like, "Alexa, power all of this off," or "Power strip goodbye."

ADRIANA: Yeees.

ANA: And we know that we are not consuming power, like, less risk of any fire emergency. And at the same time, you're just like, my devices are actually fully sleeping, not connected to the internet, not connected to anywhere.

ADRIANA: Yeah, so you have some peace of mind, a little bit of mental health injected in there, too, right?

ANA: [laughs] Are you telling me you're sending my devices on a mental health break?

ADRIANA: [laughs] I think sending your devices on a mental health break sends you on a mental health break, [vocalizing] tan-tan-tan. [laughter]

ANA: On that note, I'll always throw my little snippy where I say that if you travel with your laptop, you're not really going on vacation. [laughs]

[laughs]

ADRIANA: Yes, I totally agree. I've never gone on a vacation with my laptop, thankfully. I always make it a policy when I'm on vacation; I'm on vacation. I'm not going to respond to your Slack messages. You can text me if it's for fun stuff. 

ANA: [laughs]

ADRIANA: Otherwise, we're not talking, [laughs] and I will block your ass. [laughs]

ANA: It's funny because this was just a conversation I recently had with my best friend around laptop going on vacation or not. And I'm one that says, "Don't take your laptop on vacation," and known to do it. And sometimes that I've done it, we've been on an island and having a laptop comes in clutch for locating cars, for locating things to do, restaurants, ordering, getting a new flight, and stuff. It's kind of interesting when you're planning your vacation, and you're like, well, last time I took it, but I told myself not to, but it came in clutch. But wait, I shouldn't do it, right? [laughs]

ADRIANA: And that poses a conundrum because then if you brought a personal laptop with you instead of a work laptop, would it be less tempting to do worky things? Because nowadays, even though work laptops have a lot of the systems that you need to access to be able to do your job, a lot of times organizations have their email and stuff on the cloud, so therefore, you could technically access it from anywhere, which means that even if you're not bringing your work laptop, it can be super, super tempting to log into your email. Oh, let me just see what people are saying on Slack or emailing. [laughs]

ANA: And, I mean, this is a personal preference on how folks handle vacation, I think. And it's a hard conversation because [laughs] I think I interject in this conversation quite often. Because then the same could be argued that they already have that access on their phone. 

ADRIANA: Yes.

ANA: So we might be agreeing no laptops on the trip for work reasons, like, no work laptops. And then it's like, let's not bring personal laptops so that we're not tempted to just fall into your habits: checking Slack, checking Twitter, and doing what you do at home, what you do at your office. 

But then again, we forget that we might have one or two of the applications we use at work already set up on our phone. And just out of boredom, doomscrolling, like, waiting for your food, and you don't know what to do with your hands or time, and you end up checking your email. And you're just like, oh, crap, I'm supposed to be spending time with friends and family right now, oopsie. [laughs]

So it's kind of interesting because, for some people, that doesn't bother them. And for me, I have to have 100% disconnect. I have to unplug that cable and walk away to feel like I'm actually recharging. Some other people don't have that need, and it's a personal preference.

ADRIANA: And I think that's a really good point because I think it shows a lot of self-awareness that you've gotten to the point where you're like, no, this is what I need to stay healthy. And it's true; it depends from one person to the next. But having that level of awareness, I think, is huge because it ensures that you have a healthy vacation.

Because you end up with a couple of problems when it comes to vacations, right? You get the people who don't take vacation because they feel guilty about taking vacation. And then you have the people who take the vacation, but then they're working through a chunk of their vacation, so is it really a vacation?

ANA: Totally. It's always kind of interesting. And I know for me since we left you all last episode thinking about what we were going to do and what we were grateful for, I know I for sure was very grateful for the time off that I had and the people I spent it with and very much being able to disconnect. 

I ended up being kind of bummed out that I didn't get to do as much cooking and Latin American traditions that I was hoping to incorporate during the holiday season. But when work kicked off, I felt recharged. I don't know what the holiday break was for you, Adriana.

ADRIANA: Yeah, I kind of felt the same way. And I don't know if you're like this, but I'm always like, go, go, go. And if I'm not doing something, I feel guilty that I'm not doing something. And during the holidays, the first week was still kind of busy because it was like the week between Christmas and New Year's, so there was stuff to do kind of, sort of. And I decided to repaint my office, my home office, [laughs] so that took a chunk of the week. 

But then, the second week, my husband and my daughter, and I just bummed out completely. Like, we didn't go out much, like outside even. We just bummed around and watched so much TV that we got to the point where we're like, "We've watched too much TV. Let's do something that's not TV." So then we started playing games, but it was super idle, but it was so good. 

And I'm proud of the fact that I told myself, like, it's okay to give yourself permission to relax. Because what I always feel at the end of each year is it's been a hustle the entire year. And for me, last year was super stressful with losing my mom and stuff. So it was really nice to take that time to recharge, give myself permission to relax, knowing full well that when January hits, it's going to be all hands on deck because that's just my personality where I've got to be doing something; otherwise, I feel unproductive. So I'm grateful for that time to have been doing nothing, [laughs] especially that second week of break.

ANA: [laughs] I relate there too. I know that my first one was very relaxing. And then my second one was more vegetating, and I very much needed it. I know you mentioned jumping back into work and it being pretty busy. Can you tease listeners with some of the projects that you're interested in working with or that you're already working with that folks we'll be seeing in the upcoming weeks? 

ADRIANA: Oh yeah. So I guess some of the projects that I'm working on right now, I don't know if they're necessarily related to any of the recordings that we're doing. But I've got a HashiTalk coming up in February where I talk about How to Convert Kubernetes Manifests into Nomad Jobspecs. 

I wrote a blog post about it just before the holidays, so I've turned it into a talk. So I've been hard at work on that. It's funny how even having a blog post with the script for a talk it's supposed to make things a little bit easier but converting from a blog post to a talk is a little bit difficult because you're going from all words to visual. So that's been kind of challenging. 

And then this year, I'm super stoked to be playing around with trace-based testing. I've spent some time playing around with Tracetest, which I played around with it when it first came out last year. I think it was sometime in the summer. It was like an early, early version. Then I took it up again in the fall just before the break, I guess, where I had originally played with it because I'm like, oh, I wonder if this thing runs on Kubernetes. I wonder if I can run it on Nomad. So I'm like, that was my project in the summer. 

And then I took it up again thinking, okay, what modifications do I need to make to Tracetest, like the new version of Tracetest, the Nomad jobs to get it to run on Nomad? And it ran pretty smoothly. And it's cool to see how much Tracetest has evolved even since the early days. There's a CLI, and you can define tests programmatically or declaratively through YAML, which has been super cool. So I really want to explore that. 

And then the other cool thing is, you know, on the trace-based testing theme, I met with the founder of Helios, and they also do trace-based testing, but they take a different approach to it, which is really cool. Like, whereas Tracetest is more declarative, and you install the tool on your own cluster, whether it's Kubernetes or Nomad in my case, Helios is a SaaS product, and they've gone for more of an imperative approach. 

So they've got a whole SDK for declaring your tests, and they support different languages. So I had a chance to speak with him last week. And I'm actually super stoked to try that out because, hey, what's better than one trace-based testing tool? Two. So hopefully, in the next few weeks, I'll have a chance to play around with that, and we'll see where it goes. So that's where my mind has been at. 

And, of course, we have been hard at work interviewing folks for OCMM. And we're hoping also to bring content beyond just the podcast. So Ana and I have been brainstorming just fun ways to bring y'all content that's not just audio-based but also some fun visuals, so stay tuned for that as well. How about you, Ana?

ANA: It's a very exciting next few weeks, I think, for you and for myself included. There are definitely a lot of amazing ideas going. And as you've shared, some of the work that you're doing with trace-based testing is very much exciting to see that space continuing to grow, considering it's something that was just a myth, an idea that folks were talking about, I think, in 2018. 

And I think we're pushing the envelope. I think it is driving those conversations to think about how do we actually make some of those Day 2 operations a lot easier? And just make it easier to have the developer do more, care more, actually be an owner when it comes to their code, which is very exciting. I know for me, there's a lot going on always in my life and a lot of moving pieces. 

So I have a few talks coming up; some of them will be talking about the similarities between caring about humans and caring about systems. So I'll be speaking at Conf42 with Julie Gunderson on Don't Forget the Humans. I'll also be keynoting some conferences around site reliability engineering and chaos engineering that have yet to be announced. So when those do come out, we'll be pushing them over on social media. 

And as far as some of the other projects that I have going on for these next few weeks and this year, is that I'll continue being heavily involved with Kubernetes. I'm joining the release team of release notes team again as a shadow, so for Kubernetes version 1.27. And I'll be joining the team again, which is exciting. We just had the kickoff call last year. 

And the release team is beautiful in terms of everything that happens behind the scenes to make all the Kubernetes versions go out. But being part of that program really allows for you to continue learning just how complex systems work and the human side of it, of course, by just staying up to date on the CLI portion of it, the storage aspect of it. Like, there's a lot to learn in Kubernetes, and I constantly feel like I know nothing, and I've been working with it for a few years. But I'm very excited to continue working with Keptn. I joined their governance board in December. 

ADRIANA: Yay.

ANA: So I'll be a little bit more involved in helping shape up what it means for this project to be within CNCF, along with partnering with other open-source projects. And as time frees up, I'm very excited to also be playing around with projects like Argo, and Cilium, and LitmusChaos, Chaos Mesh and seeing how they all come together. I really do enjoy that space of site reliability engineering and understanding your systems more and making them more reliable. And in my wish list for when I have more time and stuff, of course, I do have Backstage and Crossplane to revisit of just really being able to package things up as neatly as I want.

ADRIANA: Yes.

ANA: And really being like, oh, this is the POC of what I wish that I could do if I was starting an organization, an SRE practice from the start. So there's a lot of learning and content to be putting out there. And, hopefully, we get to teach folks a little bit more of how to modernize their applications, especially with those that are coming in from cloud-native environments only, that they're coming in from blended environments of being all bare metal that are slowly transitioning to the cloud, or folks that are completely like no access to the internet. How can they use some of these open-source technologies to start implementing and start going through a digital transformation within their organization?

ADRIANA: Yeah, that's awesome. And those are really cool technologies that you've mentioned. And I can't wait to see what comes out of your explorations. I think you and I were chatting the other day about Argo because I got my start, I guess, on my blog by blogging about running Argo CD on Kubernetes. And I have not touched Argo in a few years. 

ANA: [laughs]

ADRIANA: So I would love to see how it has evolved in the last little while. And Crossplane, I think I played with it back in 2021 when it was still pretty new. And there wasn't a lot of support for creating resources in GCP, there was some. That's another really cool one that would be fun to revisit.

ANA: Same. I think the last time I touched Crossplane had to be around 2019. I was speaking at ESCAPE/19. It was a multi-cloud conference. So I was trying to use Crossplane to do my demo. And after various hours of getting an almost demo working, like, what I was trying to do is just how do you leverage chaos engineering to build out your multi-cloud strategy? Like, making sure that you have latency in mind or that you're thinking about failover in between the clouds and all these little, tiny details that we were having to think about while I was working at Uber. 

So looking at Crossplane, to me, was really interesting because it was kind of like breaking the fold in the industry in that sense, like, not many folks were talking about these spaces. Now that it's been a few years and we've seen the growth of Crossplane, we've seen their community be pretty successful. And a lot more tutorials, a lot more stories are out there. It will be pretty insightful to pick it up and see how it all ties together with all the other advanced open-source projects that we also have out there.

ADRIANA: Yeah, absolutely, absolutely. I think it's going to be a really exciting year. And I also want to make a little plug for OpenTelemetry. Since we're talking about various open-source projects, one of my goals for 2023 is to be even more involved with OpenTelemetry. And last year, I was super excited to have made my first-ever pull request into OpenTelemetry. And I've made a few contributions throughout the year, which is exciting. 

So this year, I also joined the OpenTelemetry End User Working Group, which I'm super stoked for because I think this is such a great opportunity to hear stories of real people using OpenTelemetry in real life. So I think it's going to be super educational. I hope that we can learn a lot and share a lot of stuff with the community around that. 

And my call to action is anyone who loves using OpenTelemetry and is interested in getting involved; there are so many different ways of getting involved. You can write a blog post for OpenTelemetry talking about how you've used it in your organization, contribute to the docs, make contributions to the OpenTelemetry demo app, which I think has blown up since it first came out in the summer of last year. Now there's like so many different services, so many different moving parts to showcase how cool and awesome it is. 

Or if you want to, just join the OpenTelemetry End User Working Group and share your learnings with the rest of the community and take that as an opportunity to learn from others as well. Those are some great and easy lower barrier to entry ways of contributing.

ANA: Oh, man, how can I forget about the OTEL project? I was thinking about all the other open-source projects that I'm so eager about. But getting a chance to be involved in OTEL last year was pretty exciting. I've always tried to stay a little bit away from observability because I always felt like I wasn't going to understand it too much. And last year, I did get a chance to do contributions to projects around OpenTelemetry, and that was pretty exciting. 

But as we think about all these other site reliability engineering projects that are coming together, it's also going to be exciting to see them all roll up into larger initiatives that we get to put together. And I know that through the time that we've been working on On-Call Me Maybe, working with some other guests, one of the other things that has come up for Adriana and I is very much in, like, what is the wish list that we have for SRE going into 2023? 

I think, for me, a lot of it is making sure that we are understanding our system. And the way that I see us understanding our system, of course, is by leveraging observability first. And then, as we do that, making sure that we go back into those principles that infrastructure as code taught us. And we codify everything that we can, including those service-level objectives. 

ADRIANA: Yes.

ANA: As you're setting up those service-level objectives, make sure that they have in mind reliability goals that you're practicing, injecting failure, embracing the failure that you have, that you pause and you learn from that failure and that you're ready to make changes, but that you only do it when it makes sense for your team. Not everyone needs to go and change their entire culture because a brand new SRE book came out.

ADRIANA: Yes, that's right. It's the fit-for-purpose SRE. Do what works for you.

ANA: What was on your SRE wish list? I know you, and I have a lot of similarities. But I know there's other stuff that you also think about. 

ADRIANA: I think the underlying theme that you and I have hit upon is really honoring SRE and DevOps principles by codifying all the things which I'll make this a plug for our Observability-Landscape-as-Code work that we started last year where we're basically treating the observability landscape as things that can be codified, which includes...it's not an exhaustive list, but it includes instrumenting your application, having a means of sending your telemetry to your observability back end, so configuring like an OpenTelemetry collector programmatically, whether it's Kubernetes or whatever, like via Terraform to deploy your collector somewhere.

Codifying SLOs, which is something that I'm hoping that we can hit this year as part of our wish list. I feel like I'm missing stuff because I'm doing this from memory. Oh, being able to configure an observability back end through code, which is like part of a little demo that Ana and I did last year where we created dashboards in Lightstep through Terraform. 

I think most observability backends have some Terraform provider that allows for configuration, so we really want to expand on that. So keeping that theme codify all the things, I want to see greater adoption of policy-as-code. I think there's OPA, Open Policy Agent, out there which enables that. So wouldn't it be nice to see greater adoption of that sort of thing? 

And, of course, trace-based testing is big on my wish list. I'd like to see more of that, whether it's through...whatever tool you choose that's out there, Helios, Tracetest. There's another one called Malabi. I think bringing testing into both developer concerns and reliability concerns is the best way to do it, and trace-based testing makes that possible. 

And then finally, on that similar vein, that means that developers take more ownership of their code. So this whole notion of not throwing it over the wall, which is what DevOps was meant to be but then got bastardized and turned into yet another layer. I would like to see that layer removed and streamlined a little bit more. I'm not saying take out DevOps as a concept. I'm saying take out the DevOps engineer and feel free to disagree with me. I'm sure that this might be a spicy take in some circles.

ANA: [laughs] Spicy.

ADRIANA: But that's how I feel about it. [laughs]

ANA: No, I think that nails it. And I think that trace-based testing approach is very important on par to where we see the industry going. I know 2019, 2020, Andi Grabner and I were preaching test-based operations as we were advocating for Keptn.

ADRIANA: Nice. 

ANA: So I think it is a slow time coming that we are seeing this movement start happening. I remember in another time, in another wish list conversation I had with you, you wanted to see something more around organizations having SRE workflows that they share with one another.

ADRIANA: Oh yeah, that's right, that's right. Yeah, yeah, because, again, honoring SRE principles there...and we actually...this can be a shameless plug for us. We spoke with Stephen Townsend of the Slight Reliability Podcast. We mentioned this as well where to honor truly the mission of the SRE of codifying all the things that are part of the mission. 

There are so many instances where we find ourselves switching from job to job where we're like, oh, I've solved this problem already. But now I have to reinvent the wheel because that knowledge stayed at employer X. And wouldn't it be nice if we had a means of sharing some common workflows? Like, yes, the details are probably going to be a little bit different from employer to employer. But it would be nice if we had some framework, if you will, for that. And we see that a lot with CI/CD systems, more CI, less CD. Wouldn't it be nice to have an equivalent for SRE? So that's definitely top of my wish list.

ANA: I'm definitely hoping that we make this happen with all the open-source tooling that we have that does all these things. But they're not doing it all together in a way that you can easily grab and take from organization to organization and that you get to then share those stories from organization A all the way to organization W.

ADRIANA: Yeah, exactly.

ANA: It's going to be a very, very exciting year. And we very much look forward to having all of our listeners here, the guests that we have that get to shed a little bit more of light into the world of SRE, of engineering, of being a human in the technology space, and such. 

So with that, we would like to thank you all for joining in for Season 2 and very much look forward to having you all join us as we talk about SRE, observability, DevOps, on-call, and everything in between. Don't forget to subscribe and give us a shout-out on Twitter, Mastodon, LinkedIn, Instagram, or Tiktok. At this point, we're doing all the socials. [laughter] 

ADRIANA: All the socials. [laughter] 

ANA: Be sure to check out the show notes on oncallmemaybe.com for additional resources and to connect with us on social media. For On-Call Me Maybe, we're your hosts Ana Margarita Medina...

ADRIANA: And Adriana Villela. Signing off with...

ANA: Peace.

ADRIANA: Love.

ANA: And Code. [laughs]

Twitter Mentions