From SCUBA to Kubernetes with Abby Bangser of Syntasso

On-Call Me Maybe

English - November 15, 2022 05:00 - 44 minutes - 41.1 MB - ★★★★★ - 3 ratings
Technology monitoring tracing distributed tracing sre oncall on-call software software development technology tech Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Kube Cuddles with Rich Burroughs of Loft Labs

Next Episode: Adventures in Open Source Software with Riaan Nolan of Servian

About the guest:

Abby Bangser is a UK-based Principal Engineer at Syntasso, delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across platforms, site reliability, and quality engineering.

Abby is an international keynote speaker, co-host of the #CoffeeOps London meetup, and supports SLOConf as a global captain. Outside of work, Abby spoils her pup Zino and enjoys playing team sports.

Find our guest on:

Abby’s Twitter Abby’s LinkedIn Abby’s Mastodon

Find us on:

On Call Me Maybe Podcast Twitter Adriana’s Twitter Adriana’s Mastodon Adriana’s LinkedIn Adriana’s Instagram Ana’s Twitter Ana’s LinkedIn Ana's Instagram

Show Links:

Syntasso Kratix Kubernetes O11ycast Clean Code ThoughtWorks Quality Assurance (QA)Parveen Khan on OCMM Tracetest Kubernetes Controller

Additional Links:

O11ycast Podcast: Ep. 16, Observability and Test Engineers with Abby Bangser of MOO #CoffeeOps London Meetup Global SLOConf Captain Abby at Agile Testing Days 2022 Video: Observability in Testing with Abby Bangser Ministry of Testing: Meet the Instructor Podcast with Abby Bangser Slight Reliability Episode 24 - Interview with Abby Bangser

Transcript:

ADRIANA: Hey, y'all. Welcome to On-Call Me Maybe, the podcast about DevOps, SRE, observability principles, on-call, and everything in between. I am your host, Adriana Villela, with my awesome co-host...

ANA: Ana Margarita Medina.

ADRIANA: And today, we are talking to Abby Bangser of Syntasso. So welcome to the show, Abby.

ABBY: Thank you so much for having me.

ADRIANA: Awesome. Well, we're super stoked to have you. Ana and I were saying that we've been Twitter-fangirling you.

[laughter]

ABBY: All around with this crew. All around with this crew

ADRIANA: So yeah, we're super stoked to have you. First things first, what are you drinking today?

ABBY: Yes. So I've brought along one of my favorite beers, the zero, the non-alcoholic version of a blanc, which is a specialty beer that is from 1664, but they only sell it in France. So all we get is the 1664 lager here in the UK. And so we took a trip to France and brought back some cases.

ADRIANA: That's awesome. Then you're going to have to return to France to get more.

ABBY: Oh, it's an annual pilgrimage to pick these up for sure. [laughs]

ADRIANA: That is awesome. My drink today is a homemade bubble tea sans bubbles. I put basil seeds in mine, and I added some mango juice, so I got a little bit of flair.

ANA: Both of those sounds so refreshing, and I'm so jealous because I'm just sipping on a classic Coca-Cola ice drink, and I'm just like, hmm, something's missing. I need a little bit of different flavor or just like even some boba. Can I do bubble tea with Coca-Cola?

ADRIANA: Oh my God. [laughter] That's so cool.

ABBY: I feel like that's the chaotic something in those quadrants you put it together.

ANA: Chaotic evil.

ABBY: Yeah, chaotic evil. [laughs]

ADRIANA: That would be some really cool conference swag. I mean, it's perishable, but wouldn't that be neat?

[laughter]

ANA: I feel like we can do some interesting stuff with sodas and bubble tea at conferences that folks are not doing. I think, in general, I would love to see more tech conferences have non-alcoholic options. Just going to events where it's constantly just pushing booze, it's like, wait, that's not the most inclusive space. Like, you don't know where folks are at or the type of environment that it could do.

So it's always really nice when I got to go somewhere, and we have those 0% lager showing up now more. or you're not pushing alcohol is like the main consumption that it's like, we have cold beverages. Just do things like that or offer mocktails too.

ADRIANA: Yeah, totally, totally. I hear that the mocktail movement is growing.

ABBY: The non-alcoholic beer is so good these days as well, beyond just the basic lagers. I find that when you just end up with the one basic lager that's not alcoholic, it's kind of a cop-out. And I've definitely pushed for more non-alcoholic options in places before.

ADRIANA: Awesome. On to more techy things. Abby, why don't you tell us a little bit about the work that you do?

ABBY: Yeah, absolutely. So as you mentioned, I work at a company called Syntasso. And what we are building is an open-source tool called Kratix, which is trying to help platform engineering teams build platforms. So there are lots of tools for how to deliver platform aspects, so deliver a database to software engineering teams that need one or deploy applications in an effective way.

But how do you actually create a coherent story of a product of what your platform team delivers is the kind of problem space that we're trying to tackle. So it's been really interesting. It's my first-time, full-time engineering on Kubernetes. So we're building a controller to do that. And so it's been really fun getting to know that aspect of development.

ADRIANA: That's awesome. You've had kind of a varied career.

ABBY: [laughs]

ADRIANA: I mean, I first heard about you from Ana when I was doing some research for a blog post on observability and testing. So that's where I first heard of you and the work that you were doing, and I heard your interview on o11ycast as well, which I absolutely loved. Yeah, so why don't you tell us a little bit about your career path, how you got to where you are now?

ABBY: Oh, that I do need a beer for. [laughter] So I graduated at a time when it wasn't super obvious or easy to find jobs. So I graduated university in 2008. And so there were opportunities, but they were a little bit less than maybe when I started uni. And I didn't really know what I wanted to be when I grew up anyways. I was actually working as a scuba dive master at the time.

ADRIANA: Oh.

ABBY: And I figured I probably had to move on a little bit, wasn't sure to where. And I actually started working for an investment firm just doing data entry. And it was a really small firm. It was just three full-time people and then some data entry people who came in temporary. And they brought me on full-time to manage that data collection and analysis side of things. In doing so, it was actually generating the data we use to make smart investments, and that meant a lot of times scraping websites about opportunities that we had. So it was about real estate investment and finding information about the real estate that we were looking at.

And so I started using a scripting tool that, I mean, this is remember 2008. This was basically Selenium written in VBScript but a proprietary software that I installed from a CD. I'm still quite young, but I still have some stories that make you wonder about tech. [laughter]

ADRIANA: Holy cow.

ABBY: It made me understand clean code before I ever knew the terms. I started all of a sudden realizing that I could reuse a script from one website to another if I did a good job of kind of isolating what was unique about that website and all these kinds of aspects of it, and I really enjoyed that.

And a friend of mine worked at Thoughtworks in legal. And when I was saying I'm enjoying the coding side more than the investment side, she said, "Hey, well, we teach people how to be developers." And I showed up not knowing anything about Thoughtworks and how great it is with some code I'd written in a text editor and after reading Head First Java for a few days. And they kind of looked at me and said, "That's interesting. Probably not quite right for a developer job just yet, but really like how you think. Come join us as a QA."

And what makes that really ironic is I then spent the next seven years working with them, absolutely pushing against the idea that people who failed out of developer interviews should go into the QA track despite that being how I entered the QA track. Because I felt like if they had put me through the interview for QA, I would have smashed it, and then I would have earned my position in QA rather than it being just, oh, you didn't quite cut it at dev, well, there's this other role. And I like to give them the benefit of the doubt that they knew I'd smash it, and that's why they pushed me that way. But I felt like it was important that you do that process.

So yeah, I worked at Thoughtworks as a QA for seven years, moving from quite automation-focused to more analysis-focused to more DevOps and delivery-focused because if it works on your machine, does that really matter? Got to get to production. And then eventually into production systems and infrastructure-focused and then moved on from there to be in-house. I was really excited to get into a product, and I worked on a platform engineering team as a QA there and lead engineer for a couple of years and then as an SRE for a couple of years. And now, I find myself with my first title as a software developer.

ADRIANA: That is so cool.

ABBY: What a journey. Sorry, it's a bit of a mouthful there. [laughs]

ANA: No, I love the journey because I think that's the beauty of technology. The space is so large, and there are so many different components that come into technology that until you touch something and you learn about it, you have that like, oh, this was really cool moment. But then, until someone is able to guide you in a way to see how everything works, even your transition from QA to going closer to reliability and DevOps was like, oh, I'm getting closer to things are working properly. But what is that business level, big picture of it as a business, like, what they really need? And then getting a chance to move on to SRE is just always really cool.

I do have a question for you because something that we constantly talk about is just the terms always changing, and I'm really curious what your take is. Now that you're an engineer working on a platform that is building platform as a service, and you've been an SRE, and you've been an engineer on a platform team, how are things different? For folks that are always still trying to answer that question, is SRE supposed to be responsible for a platform, or is this something that's been held differently?

ABBY: Oh, that's a tough one. And I think I have to fall back to the it depends, I'm so sorry. [laughter] But it does in that I think that it really...I'll go with a lesson I learned. So I actually was chasing the title SRE for a while. As I was getting more into the quality of the system rather than the quality of a feature or an application, I realized that doing that with the title QA could put up barriers that I didn't think were fair, but they existed.

And so I thought if I can switch titles a bit, I can maybe help shuffle some of that responsibility and the opportunity back towards the title QA, but I need to break down those barriers first. So I'm not embarrassed to say I was chasing a title for a while. But when I got to that title, I think that it was very interesting to see how that plays out. SRE is different at different scale or organizations, at different types of organizations, different cultures of organizations.

And in my experience, I was working as the title SRE in an organization that was quite small, and its problems were significantly more about how do we have a reliable, up-to-date database that has the correct backups and disaster recovery and things like that than it was around setting service-level objectives for our very small user base. And if I were to place where I am on the...I think of SRE as a bit of a spectrum of very deeply technically skilled in an area that can help with those nuance issues of resilience and reliability for a specific tech through to kind of the more higher level customer-focused service-level objective side of things, and I'm probably closer to that side.

So this was maybe a misfit for what I was hoping to work on. Even if I can look at and go, yes, I see that as viable and reasonable SRE work, it's not the side of SRE work that I'm most interested by. My biggest learning is don't worry so much about titles; worry more about what are you getting to work on.

ADRIANA: Yeah, that's so important. I think I spent so much of my career lamenting the fact that people my age were having these fancy titles and stuff, and I'm like, oh my God, I'm failing at life. And then I'm like, wait, but if I get to do awesome work that fulfills me, then that's what matters, right?

ABBY: And that's actually how I got to learn as much as I have about platform engineering and things is the job I joined after Thoughtworks. I was looking at a role there. The manager there I knew and I knew from the QA community. And she said, "Look, we have this role open for QA. We aren't really actively interviewing for it because it's quite a niche role. And we don't really just want generic I write Selenium tests testers. We want people who think more globally about quality and things like that." And she was like, "You'd be perfect for it."

And I was like, "Oh, but QA. [laughter] I'm really nervous about the title because I've seen...I've had brick walls after brick wall, and I busted through a lot of them. But I'm getting pretty tired. I'd like to be able to get into observability, and telemetry, and all these things." And she's like, "I can bring you in as a platform engineer, but your salary is going to reflect your experience there. I think that your experience as a QA puts you in senior bracket, and you get your senior salary, and you'll have that senior-level impact on the team. But I guarantee you that this team will not hold you back based on the title of QA."

And I sort of just had to take a leap of faith with someone that I trusted, and it worked out brilliantly. So I think sometimes shying away from your titles can actually cause problems as well because they do open doors to more senior roles where you can have more influence over what it looks like.

ADRIANA: Yeah, absolutely. And also having somebody who you know has your back and your interests at heart. I think that makes such a huge difference because then it makes you feel like you can contribute to the job and really put everything into it, right?

ABBY: Absolutely.

ANA: I mean, especially being transparent on the salary aspect of it where it's like, this is something that's going to really matter to you because it's how you live your day-to-day. Let's make sure that you're getting what you're worth, like, your years do matter.

What do you say to folks that continue having this misconception about QA work? I mean, you did extensive work in QA. But I know that the industry still needs a lot of work for QA to get uplifted. Because we had one of our other guests that got to talk a little bit more about how they use observability in QA, and it's like, it's really extensive engineering work of understanding a system and explaining it to someone else. That work is going to possibly get the product engineer promoted, but the underlying work is being done by someone else that's going to not get credit.

ABBY: There's such amazing work being done in QA. Nothing about what I said should be interpreted as me running away from a terrible part of the industry in any way. It's more about how other people perceive things and how other people open doors for you. The challenge is that every role in this industry can feel quite siloed.

If you're in the SRE role and you're going to SREcon, and you're going to like SLOconf, and you're going to these things that are all kind of echo chambers of SRE, you're going to be assuming that certain things are obvious to other people or that you're going to start to build maybe an inflated view of what you do because no one else is doing these things. Same thing goes for QA, same thing goes for certain specialties within engineering and software engineering and all of that.

So I think the big thing is just to try and build bridges between the different roles as much as possible. And it's important, I think, that the roles around the QA actually identify and lift up the value they can bring. Like, if a QA is sitting there chasing behind some flaky tests every time that you make a change to the UI of the app, of course, they're not going to be adding too much value.

But if no one else is taking ownership over running tests before they push, there will be some pressure that someone is doing that, and it will tend to fall into a QA role. So team ownership is the big push these days. And I think that that's super key to quality, let alone role parity across the team.

ADRIANA: On that similar vein, you talked about how you've been pushing against the, I guess, QA stereotypes. What do you think has been your greatest success when you're doing QA work in terms of really breaking down those silos and really getting people, getting some more respect for QA that QA wasn't getting?

ABBY: Interesting question. I feel like there are some small wins. So one of my favorite things is that a lot of my former colleagues will say, "I asked myself, what would Abby ask?" [laughter] And I just love the idea that I've influenced someone to think harder about the complexities and the risks of what they're working on before immediately just falling to someone else.

Like, they just got sick of me asking the same questions over and over again to the point where they would come to me and be like, "I've already answered those questions. Ask me new ones." [laughter] And I'm like, great, this is progress, right? So that's quite on the low level, right? Like, the one-to-one person level, that I think is quite a big deal.

I think the other thing that I'm quite happy and proud of is the community that I'm a part of (I don't want to claim any ownership over it.), but I'm a part of with bringing observability to the QA space. I think the number of QAs who are making that transition towards more systems-level quality with or without an SRE title; I don't think that's important. But systems-level quality asking for proof in telemetry before a feature is done; things like that is just booming. And I would say that I was part of that movement, and that's really quite cool to watch.

ADRIANA: That is so awesome. And we talked to Parveen Khan also about observability and testing. And it's just the fact that there's more of a conversation around that space, I think, is so heartening because it's one of those often forgotten things that should be always top of mind. So I love that it's getting more airtime now.

ABBY: Yeah, absolutely. I was actually going to turn it around for a second and just ask because both of you all work with engineering teams in different capacities, sometimes through more of the DevRel side but also with them; how have you seen your relationship with quality and testing change with the different organizations you've worked with and over the years?

ADRIANA: I can say, like, early on, when I started my career, I had a similar situation as you. My first role out of university was on a testing team. And it felt like this was where the non-developers were relegated to, and it felt like a very, I don't know, unfulfilling role at the time. But I feel like over the last several years, over the last couple of decades that I've been working, I've honestly felt like it's evolved so much.

I think people have started viewing testing in a different light, even when you look at, for example, agile teams, the fact that if you were to practice agile properly...I don't know that everyone's practicing agile properly. In theory, the idea that there isn't a separate tester in your agile team that testing is the responsibility of everybody on the team I think that's so cool. And I think it really elevates the importance of testing and the fact that automated testing is a big deal now as well. Because why should we be sitting here [laughtyer] clickety-clicking on the same things over and over? That's not to say that exploratory testing isn't important but --

ABBY: Everything clicking through the same old things.

[laughter]

ADRIANA: Exactly. So it's been quite an evolution over time. And I think it's really great because it reflects the importance of testing. And then, for me, the quote, unquote, "newer concept" in my life of observability and testing is freaking mind-blowing to me, honestly. Because I'm like, of course, this makes perfect sense now, right?

ABBY: I sort of wander with the topic about being on-call and all that. I've been on-call. I've never been on-call with the title, well, QA on the platform team. I was an on-call engineer as well, so I have been on-call with the QA title. But I often find that who's on-call is quite a limiting scope sometimes on teams. It's like the most senior engineers, and it's deeply technical, deeply hands-on people often. Have you seen QAs and test-focused people be on-call for products alongside people who are front-end back-end engineers and have different titles?

ADRIANA: Not me. [laughs]

ANA: I was going to say I haven't seen it either. It definitely...I think we're starting to see more of a movement of that, especially when it's like be on-call for the code that you wrote, like, you own it. I think through the last year or two, we've seen a bigger grasp of people following that trend. I see that happening more, and I'm assuming that it's happening in some companies; we're just not hearing about it just yet.

I hear constantly more of just like the product teams having to be the ones on-call and then having to wake up the other corresponding members of the organizations as like, oh, it was actually this test that keeps on being flaky that's restarting this dependency that we need for production. [laughs]

ADRIANA: Yeah, yeah.

ABBY: [laughs] Oh boy. I think it's super important that these different roles that are a part of building the product are also equally a part of being on-call. I think one of the things I learned the most from my experiences was that on-call has very little to do with technical depth for the vast majority of times that thing rings, at least the times when I've been on. The majority of times that I've had things happen, it's been things that I could triage where the problem was and follow some sort of a runbook to get to some level of certainty about what's going on. And then, if I don't know what's going on, I know how to escalate to the next level, right?

ADRIANA: Right.

ABBY: And I think having people, even product people, potentially definitely people who are like QA who are in the codebase but maybe not developing features, I think having those people on that rota, you get to feel the pain of that thing that keeps happening coming back around. And I think you can help advocate more for changes and things like that. So yeah, I hope that we'll continue to see that movement that you described of more people on product teams sharing the load of on-call, just without the specific software engineer title or whatever the title is, SRE, whatever the title is in your organization for on-call.

ANA: I think you nailed it. A lot of what you do end up doing when you do get paged is having to research and find the information in all your tools which is maybe you do have observability, and it's like going straight into there to find that context. But then it's like, starting to get the logs, start to go log into a system or start a Slack channel, start communications externally with customers where a lot of those things any contributor is actually able to help. We should always be making sure to rotate folks, to level up folks, to have everyone on the rotations.

I think it sucks when teams are like; it's usually those two, three senior principal engineers that are doing it. It's like, as an intern with enough training, you can definitely go on-call and maybe shorter rotations if that's something you work with your manager. But there are ways that we can do this to have a better space that allows for everyone to be learning from one another. And in my prior role doing chaos engineering, it was a lot of that where it's like, let's talk about having to go through those little fires in a very planned way so that we can actually know what to do next on-call and be faster at it or start automating a lot of these things.

ADRIANA: To add to that, I think making on-call less about domain knowledge and more about freedom of information because everyone has access to the same information easily, which then makes debugging a lot easier. But also, as you said, I'd be having more people involved in being on-call because then, that way, you're kind of lighting a fire under people's asses. Because it's like, oh, okay, the code doesn't end with it dropping into production. So I have an incentive to make this better quality as a result, right?

ABBY: And easier to triage, and easier to understand, and all those aspects as well. It's amazing how I went through a bit of training or preparation before going on-call for the first time. And yet that first incident, when I was the person that was on-call, was still just absolutely terrifying. Even though I had read the document a million times, and I'd done some test runs during the day, I'd done all these things; it's just absolutely terrifying.

And I think the saying rubber hits the road doesn't really happen until you are the only person who's awake, and you have to make decisions about who to wake up or what to do. And all of a sudden, it starts to really click, and I think it really can enhance somebody's skills, and experience, and confidence when they go through that if given the right training going in.

And I mentioned things like giving some on-call shifts that are during the day to start with the right kind of mentorship around you that doesn't jump in and tell you what to do, but it's just available to you during the day. I think you can really get a lot of the same experiences and start to build those muscles without that terror [laughs] of 3:00 a.m., whatever that terrible song is you have playing for you or beeping, or whatever coming at you.

ANA: And even just being secondary on on-call shifts. It's like even being able to watch everything happen as it's going on or being part of incident reviews. I think reading post mortems or incident reviews is something that folks don't realize that it's a bunch of information that one can be learning through other organizations, not even just your own. And I remember being an intern and getting that as a tip as I was going through like early-career stuff. It was like, oh yeah, why wouldn't I read the post-mortems of large-scale infrastructure going down? This definitely is going to continue helping me in my career.

ADRIANA: Yeah, I really like that advice.

ANA: As we think of the work that you're doing of building a platform as a service, where is there room for QA in that? Where does QA fall into this?

ABBY: Everywhere. [laughter] I think, just to be clear, there is no QA role at my current company. There wasn't at my last company either. So where organizations really can benefit is when they think about the techniques and the impacts that QAs can have and make sure that those activities are getting taken care of and those mindsets are being exercised and that kind of thing without needing one person to take on ownership of it. And that can be hard because trading hats around can be quite difficult to remember to put them on at the right times and in the right ways.

And to answer your question more directly, I think of QA as people who can help advocate for user experience. I think of QA as people who can help make judgment calls on the priorities of things based on impact and risk and whether or not things need to be fixed right away. I think a mature QA isn't just going to say, "Oh, a bug, it must be fixed," but to be thoughtful about that. And I think those are obviously quite important when you're making a product, and we're making a product. It might be a platform, might be internal, but it's that. And yeah, so I think all those kinds of questions.

And then thinking about testing in the space, I think automated testing in this space I think is really interesting and exploratory because the feedback loops are quite long. The tooling is not quite as mature as with some of the more software-oriented problem spaces. So we're writing a Golang controller, okay, great. That's quite a mature testing ecosystem. You can unit-test your Golang just fine. But testing of controllers, what kind of lives there for kind of end-to-end testing of that, right? It's not quite the same ecosystem as websites or services. So I think there's a lot of space for testing mindsets and skill sets to make an impact.

ADRIANA: Hey, here's an idea for the work that you guys are doing is, including a QA suite as part of your platform offerings, tan-tan-tan.

ABBY: So when you think about platform offerings, one of the first things you think about is how do you give people a test environment [laughs] and the impacts of how you give people a test environment. Do you give people the ability to spin up a fully production-like environment on the fly, which everybody wishes they had?

ANA: [laughs]

ABBY: Well, also, though, how long does that let people live on branches for because they have no need to merge? So it's like, I think that the problem space that we're in, which is trying to support platform teams in building their product, is really interesting because we're trying to stay as unopinionated as possible right now. If a team thinks that the right solution is to push everyone towards a shared environment as fast as possible, that might be right for their organization.

Another organization might want you to have a production-like environment to yourself as possible, that might be right for them. But I love the play out of that because I've been in both of those situations. And I've seen what happens after you have that for long enough and how that can...neither is perfect. Neither is terrible. It's just what is the influence of how the people start interacting when that's the environment they're in?

ANA: Is this like it depends as a service?

[laughter]

ABBY: I mean, as a team, we all have a lot of consulting background, so that might be slipping into our messaging. [laughter] Yeah, it's such an interesting space because it's such an underpinning to every organization and influences. I think it's Jess who always talks about socio-technical. There are lots of people who do now, but I think she was the first person I heard speak about it and talk about socio-technical like how you design your platform, how you design what is easy to request and what isn't, what is easy to use and what isn't, will completely define if people create a new microservice to introduce that new feature or they just tack it on to whatever already exists because getting a new database isn't worth it.

Like, those can completely change the architecture of your organization, of your software, of your user experience. And I don't know if they always get the respect and the time that they deserve, those kinds of thinking.

ANA: For folks that are listening that are maybe just starting out a platform or going through this down the road asking these questions of like, what should I be doing for my organization, what would you say are two or three questions they should be asking themselves and their users of like, which way do we go? Like, what do you ask?

ABBY: One big thing for me is thinking about it as a product. So I think a lot of people, and me included, can't sit here and say I'm perfect, and I've done this all along. But I've learned the hard way that if you think about your platform as something that people just have to use because they're your colleagues and they're internal, and they have to use the thing you build, you have a different mindset about what you're building, and you have a different delivery mechanism and feedback mechanism and polish to what you're doing.

It doesn't mean you need things to be perfect because it is still an internal tool. But you do need to make sure that you're building what people need. Make people want to work with your system rather than just assume they have to work with it because shadow IT is real. [laughs] People will figure out a way around your system if it's not enjoyable. And then you're working on something that's not actually useful. I think we've all been there.

I at one point was building a product for a client when I was at Thoughtworks. Turns out that nobody liked the user interface or what we were building. One of the employees had figured out how to export the data into Excel and had created an entire UI in Excel that just completely rendered our UI useless. And we were sitting there trying to figure out how to make it better for them. And they were like, don't care, just do not mess with our data format. And I was like, okay, jeez, why not? [laughter] And that's how we uncovered the Excel front end.

So yeah, I think treating it like a product, understanding that people don't have to use what you give them, make them want to use it, I think would be the number one thing I would say.

ANA: I think those are amazing points. I think some folks sometimes forget, or you get to the point that you're building without asking questions. Like, you didn't really interview your internal users, and all of a sudden, you're building it in a way that is unusable or not the API specs that are being used for other services that are being implemented.

ABBY: Absolutely. And I think the flip side to that is you can't build for everything. [laughter] So it's like I say, build for everything, make everyone happy. But it's okay to start with the 80% case and to say we're going to take care of, you know, what most people have as challenges. And we're going to build it in such a way that it can be extended and changed later, but we want to get there.

I think sometimes, as engineers, we try and solve all the problems we run into right away. And that's actually why I'm most excited about working on Kratix is that I've talked about this problem for years, years before I was in platform engineering, years when I was transitioning towards DevOps things. It's ten years plus. And every time we've had conversations, it's always been, but how do we design it, so we're not just recreating the APIs of our tools? So we're not just recreating AWS API, or GCP, or Kubernetes.

Like, we don't want to recreate the wheel. We can just give people GCP [laughs] because that's going to be a lot better documentation than we're going to put together if we can't figure out how to build the abstraction in a way that makes sense for our company. And that's what we're trying to tackle. It's not an easy problem, but that's why I'm excited about it because I think that it's a hard problem and one that's worth solving.

ADRIANA: All right, I think that's such a really cool problem space. One thing that I was wondering is, with all your prior work on observability and testing, are you looking into doing that as you continue to work on Kratix?

ABBY: Yes. So I had a look at Tracetest. I think from your recommendation to have a look.

ANA: Nice.

ADRIANA: Wooo. [laughs]

ABBY: I'm super excited about it, and I actually owe them a response. [laughs] Because I asked them a question, and they sent back a very cool response about what that might look like as a controller world. Because right now, everything with Tracetest is around API-driven. So you can send in a request, and then that sets the trace ID, which then kind of they can use with their assertions. Well, what happens when the request is the creation of an object in Kubernetes? How do you set the trace ID for that and track through your software code? And when does that trace end?

Because your controller can generate knock-on effects, you know, many deep, and so there's definitely some challenges there to figure out what does that look like, but it's something that I definitely want to keep chewing on and figuring out what it looks like. Because I've found it quite difficult to describe to some people exactly what Kratix does, and it's true of any controller I've seen of any sort of complexity because it's like, well, it does these things, which then have these knock-on effects and then these other things happen.

And being able to visualize that in some way via a trace using trace data using events would be brilliant. So it's definitely something I have an interest in, but I do not yet have a solution for. [laughs] So, if anyone does that's listening, please reach out. I'm happy to put in some legwork and help move along other people's ideas in the space because I don't have any yet. [laughs]

ADRIANA: That's awesome, man. That's quality --

ABBY: Absolutely.

ANA: That's a very interesting problem to solve. But I do agree visualization as things being spun up with a controller would be kind of cool because it's like, we all have questions sometimes of how Kubernetes works under the hood, and things are being built out.

As you still get to work really close and adjacent to platform and SRE teams and hear all these use cases and ways of every company embodying SRE, what do you think is pretty common that folks continue doing wrong in the SRE space? It's a very spicy take option.

[laughter]

ABBY: What is it? Yes, spicy take. I'm going to go; it's quite a generic answer, and so you can call BS on me if you'd like. But I think genuinely, the thing that I'm the most wishing to see change is the concept that SRE is one thing and not having this very real conversation about how reliability and resilience depend on extreme deep technical knowledge and very narrow-scoped aspects.

Database reliability is a deeply technical, deeply nuanced piece, so is load balancing, so is each of these layers of the stack that you're in. Each software language has its own nuances, all these things. But then also, there are these big questions of like, but how does that roll up to user impact? And I don't think that it's super realistic to think that one person can embody that entire spectrum. Definitely not at any one time, possibly not at all, if that's just not their interest or not their skill sets.

And so I think just saying, oh, we need SRE is, I think, too generic. And so, I wish people would speak more about what does reliability mean for you, for your team, for your organization at this time? And I think it's okay to say that that changes over time. When you're a small organization, you're probably looking at quite broad strokes of technical competence not yet really user-focused, or there aren't too many users out there yet. But you're also not hitting too many huge scaling issues that really need nuance, a lot of those problems they've been solved.

Then you start moving towards a lot of user-focused stuff because you're starting to scale up hockey stick growth. You need to really be focused on what's happening to those users, make sure you don't have attrition. And then with that scale, you start to hit those really deeply technical, really nuanced, possibly first team ever to hit it kind of problems, and that's okay. And you need different people at the different stages.

ADRIANA: Yeah, that makes a lot of sense, actually, and I love your take on...it's so logical when you say it that there are so many aspects of a system, database, Kubernetes cluster. You can't be a jack of all trades not to be able to...not if you want to really properly ensure system reliability, right?

ABBY: Not at certain scales.

ADRIANA: There's a lot of depth that you require. But as you said, when you're in a startup, you kind of have to be, but then as you scale up, you need to have more specialization, and that makes so much sense. The other thing that I really like that you said earlier on, which I think is the best quote, is that SRE is basically like caring about the quality of the system, which I think is such a great way to put it because really that's the most succinct and accurate description of what it is. [laughs]

ABBY: I'm pulling you all into the terrible spider web that QAs have lived in for all of eternity, which is what is quality? [laughter] When you could answer that, please let me know because it is the eternal question, I think, that gets talked about a lot.

ADRIANA: Another call to action if anyone has an answer for this question.

ANA: I feel like that's supposed to be answered with some nines.

[laughter]

ABBY: From the SRE space, it will be, yes.

ADRIANA: All the nines.

ABBY: I think the most common one in the tester space is something along the lines of, like, quality is value to someone or something to that effect. It's something about the fact that quality means different things to different people at different times. And that's what makes it hard is that there isn't a hard and fast rule to write your automated tests about. You can write around correctness.

ADRIANA: Yeah, true.

ABBY: But quality is a different beast.

ANA: As we're getting to wrap up the podcast episode. I was wondering what tips do you have for folks that are trying to break in into SRE coming in from the QA space? I think you're one of the few that I got to see transition from QA to SRE and make a big impact. And I think there's a lot of room for folks to be hungry for that, like, just salary alone and career opportunities that can come out of it. How do you go about it? What are some of the subjects that they should be going in and learning or resources or websites?

ABBY: There are a couple of concrete things that I think you can do. So we talked a little bit earlier today about the idea that on-call shouldn't be limited to only the most senior engineers and only the most deeply technical. And I think getting involved as a QA in that space, offering to be comms on an incident, or offering to facilitate the retrospective or post mortem, whatever you might call it, about the incident that occurred. Things like that are ways to get hands-on experience with a piece of work that you're going to need to do if you are an SRE.

Making on-call be something that you are comfortable with, you have experience with will lower that barrier to entry a bit. By being on-call, I think that you will...or being involved with the on-call process, even if you're not the engineer on-call, I think you will be exposed to a lot more of the tools in the ecosystem of your company.

So if you don't currently already look at what your teams do around dashboarding, and metrics, and logs, and you're not asking questions and getting a sense of what normal looks like, and what abnormal looks like, and exploring that data...I absolutely love...I ran a workshop at one point about exploratory testing and logs. I mean, you can just jump through that data all day looking for interesting kind of things in it the same way you would do exploratory testing in a browser or on a web service setting kind of a question that you want to answer the same way that you might set in exploratory testing.

Like, you might ask yourself, what are the variations of the way people navigate this? Or what makes people come back the most? Or you have to start asking questions and seeing if your data can answer it, and if your data can't, that's a really good entry point to start introducing that data, which will hopefully get you involved in how that data gets collected. And that will move you forward as well.

And then, yeah, I guess the last thing I would say is, and this is biased from my own personal experiences, but finding someone who can give you a foot in the door and realizing that, yes, title changes come with salary sometimes and things like that. But yeah, jumping in on opportunities that you can get, trusting people to give you those, I think, is really important. And it's really the only way I got to where I am is from people taking a chance on me, so yeah,

ADRIANA: I love that. I love that. Yeah, I had the same experience, just having someone believe in you and giving you a chance. It makes such a huge impact on your career. I think all three of us have experienced that in our careers.

ANA: I love that about what you said of, like, ask questions and when you don't have that data, go find ways to put it.

ADRIANA: Yes.

ANA: Because that is that mentality of improvement in an engineering world where it's like, what can we be doing to make the system better? But when it comes to transitioning from quality to reliability work, it is that of, like, I have all these questions, but our system is not there yet. Like, where do we go from here? How do we go about making sure that we're capturing that information?

ABBY: So often, you get left with just the proxies. People ask questions, and they're like, well, I can't answer that, but I can answer this completely different thing that will make you really excited because the numbers are going up into the right. [laughs] And you're just like, but, but that doesn't quite answer what I was hoping to get to. So yeah, keeping on that track, of you have a question, you don't yet have the right answers to it. You have proxies you can use to try and estimate. But how you get to where you can actually answer that question and iterating towards that is important.

ADRIANA: And then that feeds into making sure that you're collecting the right telemetry from your system. It basically helps you instrument your code better.

ABBY: Absolutely.

ADRIANA: And then when you have all those answers, that means you've instrumented properly, right? [laughs]

ABBY: Tell me anyone who's ever gotten there, and I want to interview them. Bring them on next.

[laughter]

ADRIANA: Right? I think the best we can do is get as close as possible.

[laughter]

ABBY: Yes, keep iterating towards it. Exactly.

ADRIANA: But I don't think there's ever going to be, like, is it instrumented enough? Because there are always going to be more questions. There's always more that we can do, right?

ANA: Most definitely. Well, with that, thank you so much, Abby, for joining us for today's podcast. We loved all the chats, getting to talk to you about platform, SRE, and QA.

Don't forget to subscribe and give us a shout-out on Twitter via @oncallmemaybe. Check out our LinkedIn page. And don't forget to read the show notes of today's episode on oncallmemaybe.com. We will be posting additional resources on there and connect with our guests on social media. For On-Call Me Maybe, we're your hosts, Ana Margarita Medina...

ADRIANA: And Adriana Villela.

ANA: And signing off...

ABBY: Peace, love, and code.

From SCUBA to Kubernetes with Abby Bangser of Syntasso

On-Call Me Maybe

Twitter Mentions