Show Notes:

On event-driven architectures...

Mike Deck: (Episode #5) I think that it's probably easiest to understand it when contrasted against kind of a command-driven architecture, which I think is what we're mostly sort of used to. So this idea that I've got some set of APIs that I go out and call and I kind of issue commands there, right? So I maybe have an order service. I'm calling create order or I've got downstream from that. There's some invoicing service now, and so the order service goes out and calls that  and says, "Create the invoice, please." So that's kind of the standard command-oriented model that you typically see with API-driven architectures. An event-driven architecture is kind of, instead of creating specific, directed commands, you're simply publishing these events that talk about facts that have happened, you know these are signals that state has changed within the application. So the order service may publish an event that says, "hey, an order was created." And now it's up to the other downstream services to, they can observe that event and then do the piece of the process that they're responsible for at that point. So it's kind of a subtle difference, but it's really powerful once you really start kind of taking this further down the road in terms of the ability to decouple your services from one another, right? So when you've got a lot of services that need to interact with a number of other ones, you end up kind of with a lot of knowledge about all of those downstream services getting consolidated into each one of your other kind of microservices, and that leads to more coupling; it makes it more brittle. There's more friction as you're trying to change those things, so that's a huge kind of benefit that you get from moving to this event-driven kind of architecture. And then in terms of kind of the relationship to serverless, obviously with services like AWS Lambda, you know, that is a fundamentally event-driven service. It's about being able to run code in response to events. So when you move to more of this model of hey, I'm just going to kind of publish information about what happened, then it's super easy to now add on additional kind of custom business logic with Lambda functions that can subscribe to those various different events and kind of provide you with this ability to build serverless applications really easily.

On understanding the connectivity of microservices...

Ran Ribenzaft: (Episode #8) We broke them from being a big monolith, a big single monolith, to multiples of microservices, you can call it microservices, service, nanoservices, but the fact that there was one giant thing that broke into 10 or hundreds of resources, suddenly presents a different problem. A problem where you need to understand what is the interconnectivity between these resources, that you need to keep track of messages that [are] going from one service to another, and once something bad happened, you want to see the root cause analysis. This is like a repetitive thing that you can hear over and over. This root cause analysis, so the ability to jump from the error - the error can be like a performance issue or like exception in the code - all the way to the beginning. The beginning can be the user that clicks on a button on your business website that caused this chain of events. So these are the kinds of things that you want to see where, in traditional APMs, in traditional monitoring solutions, you don't have it. And in the future, once you'll find it more and more like that.

On monitoring interconnectivity...

Emrah Şamdan: (Episode #12) In serverless, on the other hand, it is like you have different piles of logs, which it comes out of box from CloudWatch, from the resource that Cloud vendor propose. But these are actually separate, and these are not actually giving the full picture of what happened in the distributed serverless environment. And what you need here is that the problems are different. In a normal environment, the problem, most of the time, was actually about scalability and you were responding to that by giving more resources, by just increasing the power of your system. But with serverless, the problem is about like some problem occurs in any kind of a system in a distributed network and you need some more than log files. You need like all three pillars of observability, which is called traces. In our case, it is distributed traces, which shows the interaction between Lambda functions and the managed APIs and the managed resources and third-party APIs, and the local traces, which shows what happens in the Lambda function, and the metrics and the logs.

On the purpose of AWS X-Ray...

Nitzan Shapira: (Episode #2) You can do it to some extent. X-Ray will integrate pretty well with the AWS APIs inside the Lambda function, for example, and will tell you what kind of API calls you did. It's mostly for performance measurements, so you can understand how much time the DynamoDB putItem operation took or something of that sort. However, it doesn't try to go into the application layer and the data layer. So if information is passed from one function to another via an SNS message queue and then going into an S3, triggering another function - all this data layer is something that X-Ray doesn't look at because it's meant to measure performance. That's why it would not be able to connect asynchronous events going through multiple functions. Because again, this is not the tool's purpose. The purpose is to, again, measure performance and improve the performance of certain specific Lambda functions that you wanna optimize, for example.

On instrumentation...

Ran Ribenzaft: (Episode #8) Instrumentation is the way or a technique which allows a developer to, let's call it hijack, or add something to every request that he wants to instrument. For example, if I'm making a calls using Axios to a REST API for my own code to an external or third-party API. I want to be able to capture each and every request and response that is coming in and out from that resource, from that Axios request. Why would I like to do that? Because I want to capture vital information that I'll be able to ask questions about later on. For example, if my Axios is calling Stripe to make a purchase or to send an invoice to my customer, I wouldn't know how long it takes, because I don't want my customer to wait on this purchase page or wait for his invoice to get into his email. I want to make sure of how long it takes so I can measure that, put that as a metric in CloudWatch metrics or in any other service. And then I'll be able to ask, "Well, was there any operation against Stripe that took more than 100 milliseconds?" If so, it's bad, and this is only accomplished using instrumentation. I mean, the other way around is just to wrap my own codes every time that I'm calling Stripe or every time that I'm calling any other service. But with the amount of annotations that you'll have to add to your code, it's almost unlimited, so you won't get out with it without a proper instrumentation in your code.

On the problems with manual instrumentation...

Nitzan Shapira: (Episode #2) It's not just the fact that you can forget. It's also just going to take you a certain amount of time - always - that you're going to basically waste instead of writing your own business software. Even if you do remember to do it every time, it's stil...