Episode #75: Achieving Operational Excellence with Taavi Rehemagi

Serverless Chats

English - November 16, 2020 09:00 - 41 minutes - 40.9 MB - ★★★★★ - 29 ratings
Technology Education serverless faas baas cloud aws lambda Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Episode #74: Measure and Increase Developer Productivity using Serverless with Vadym Kazulkin and Christian Bannes

Next Episode: Episode #76: Building Well-Architected Serverless using CDK Patterns with Matt Coulter

About Taavi Rehemagi

Taavi Rehemägi is the Co-Founder & CEO of Dashbird, a serverless monitoring and intelligence platform for building and operating complex applications on AWS environment. He has over 13 years of experience as a software developer and 5+ years of advocating for the serverless revolution and building Serverless applications at various organizations himself.

Twitter: https://twitter.com/rehemagiDashbird: https://dashbird.io/

Watch this episode on YouTube: https://youtu.be/xeF19VCuoV0

Transcript

Jeremy: Hi everyone. I'm Jeremy Daily and this is Serverless Chats. Today, I'm chatting with Taavi Rehemägi. Hey Taavi, thanks for joining me.

Taavi: Hey, thank you, Jeremy. Nice to be here.

Jeremy: So you are the CEO and co-founder at Dashbird. So why don't you tell the listeners a little bit about your background and what Dashbird does.

Taavi: Sure. I've been a developer myself for pretty much my entire life. I started coding when I was 14 and since then, before starting Dashbird, I was an employee in two different startups. The last one I was working a lot on serverless. That was in 2016/'17, which led me and some of the team at Dashbird to found this company called Dashbird. We're an operations platform for serverless workloads. We help companies who are building on serverless to achieve excellence with their infrastructures.

Jeremy: Awesome. So we have done a number of shows about observability because observability and serverless seems to be that third-party offshoot that has been missing. There's a lot of things that AWS just didn't really tackle initially with a lot of the observability stuff. Now, they've added quite a few things, but again, it's nowhere near as easy to use as some of these third-party tools like the Dashbird are. So there are obviously constant enhancements.

They just launched, and we can get into this in a little bit more detail, but they just launched not too long ago, this idea of the extensions API for Lambda, which allows tools like the Dashbird or whatever, to have more control over the life cycle, if you wanted to have control over the life cycle of the Lambda function being able to get metrics and telemetry data and things like that. But I think there's still a bunch of stuff missing. I think you would agree with me on this, that there's more we have to do in order to understand and observe our serverless applications. So I'd love to get your input because I think Dashbird has sort of a different outlook or I guess a different roadmap for how you want to address the observability problems, and it's super interesting. So why don't we start there? What's missing in your opinion with observability and serverless?

Taavi: Sure. So I think first off observability is one thing we do, but when it comes to operating the serverless infrastructure, we're talking about high load like ad scale environments, there's a lot going on there that we try to help companies with. As an engineering team, if you're really building something that has hundreds or thousands of functions, for example, and a lot of different Cloud resources, then the one thing that's really difficult obviously is monitoring data and getting an overview of the activity going on across those resources and across your infrastructure.

But there's also, how do you detect failures and how do you get notified quickly and how do you respond to incidents and solve them? There's also keeping up with things like security and Cloud infrastructure for best practices, optimizing for performance and costs. So the monitoring this one part of the puzzle and then having been in this role where we were building a pretty substantial serverless infrastructure, there's a lot going on there. A lot of those things as a team you would have to build yourself and to figure out yourself and to construct strategies around how to improve. So that's really what we're trying to do for our organization. So we're trying to build an abstraction level for operational practices pretty much.

Jeremy: I love that because it's a more sort of holistic approach, I guess, to building a serverless. So building and managing a serverless application, as opposed to just sort of being responsible for, I guess, the monitoring aspect of it. Because again operational-wise ... and this is something, I forget who I was talking about this to, but essentially where it's like serverless or monitoring and observability in serverless is great when you get an alert that says something went wrong. But it's also really good and comforting to know that something went right.

Right? To know that events are flowing through the system and that the SQS queues are processing correctly and knowing that those things are working correctly and give you that level of confidence. I think that's really cool. From the Dashbird perspective, and again, I want to keep this a little bit more general. We don't want to just decide all about the Dashbird, but I really do love this perspective that you have. What is the vision in terms of being able to manage, not just the monitoring piece of it, but also the operational piece and implementing those best practices? How do you look forward or how do you plan a product that does that?

Taavi: When we started working on Dashbird, obviously we didn't come up with this vision in the first iteration. At first, we were just building a tool to monitor Lambda functions pretty much. What that came up early on was hundreds of people or companies who are actually struggling with this. And after all of those conversations, I think we kind of constructed this hypothesis around what this platform should look like for those teams that were the early adopters. So what Dashbird is today and what we're building it to be is this platform, you can look at it in three different pillars and I can go into those pillars if it makes sense?

Jeremy: Yeah, let's do that.

Taavi: Sure. So the first pillar that we have is a data centralization pillar. So what we do is we connect your AWS account without any code instrumentation. We don't use Lambda extensions or layers or instruments to code at all. Instead, what we do is we discover the entire Cloud infrastructure that you have and start ingesting all different types of monitoring data for those resources. So that includes things like log data, metric data, tracing data, configuration data, and really everything that the system is putting out externally. And from that extent of data, we're trying to understand the state of the infrastructure and to make that data available to the engineering teams, to be able to search and query and to interrogate that data in all different ways. So basically the first operating is to get everything in one place to break down the silos between logs and metrics and traces, and to be able to look at services and activity across different services and different resources. So that's the first thing that we do.

Jeremy: Well, let's talk about that for a second. So the idea of instrumentation, so this was something right from the beginning with Lambda that you really couldn't do? Right? I mean you can't install an agent somewhere that just listens to all the activity that happens with a Lambda function. Now we got layers, we got custom runtimes, mow we have extensions API. So there's different ways that within a Lambda function, you could add some type of instrumentation, ev...

Twitter Mentions

@rehemagi