Security is Thy Friend with Michael Kehoe of Confluent

On-Call Me Maybe

English - March 07, 2023 05:00 - 39 minutes - 36.5 MB - ★★★★★ - 3 ratings
Technology monitoring tracing distributed tracing sre oncall on-call software software development technology tech Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Transform Thyself with Shingi Kanhukamwe

Next Episode: The Crooked Path to Tech with Jen Shute of Slalom Build

About the guest:

Michael Kehoe is an author, speaker and Sr Staff Security Engineer at Confluent. Previously he was a Sr Staff Site Reliability Engineer (SRE) at LinkedIn architecting LinkedIn’s move to Microsoft Azure. Before graduating with a Bachelor of Electrical Engineering from the University of Queensland (Australia), Michael interned at NASA Ames Research Center building small-satellites known as Phonesats.

While working at LinkedIn, Michael led the company's work on Incident Response, Disaster Recovery, Visibility Engineering & Reliability Principles. He has also been embedded with the profile, traffic, espresso (KV Store) teams. After leading LinkedIn’s last physical data-center build, he is now the architect for how LinkedIn builds its infrastructure in Azure.

Michael has spoken at numerous events all over the world and has authored the books "Cloud Native Infrastructure with Azure" and “Reducing MTTD for High Severity Incidents”.

Find our guest on:

Twitter LinkedIn GitHub Personal Blog

Find us on:

On-Call Me Maybe Podcast Twitter On-Call Me Maybe Podcast LinkedIn Page On-Call Me Maybe Podcast Mastodon On-Call Me Maybe Podcast Instagram On-Call Me Maybe TikTok On Call Me Maybe Podcast YouTube Channel Adriana’s Twitter Adriana’s Mastodon Adriana’s LinkedIn Adriana’s Instagram Ana’s Twitter Ana's Mastodon Ana’s LinkedIn Ana's Instagram

Show Links:

Confluent LinkedIn SRE Lightweight Directory Access Protocol (LDAP)eBPF Liz Rice General Data Protection Regulation (GDPR)Terraform tfsec (security scanner for your Terraform code)Cloud Native Computing Foundation (CNCF)Azure Policy Open Policy Agent (OPA)Kubernetes Admission Controller

Additional Links:

Cloud Native Infrastructure with Azure Reducing MTTD for High Severity Incidents

Transcript:

ADRIANA: Hey, y'all. Welcome to On-Call Me Maybe, the podcast about DevOps, SRE, observability principles, on-call, and everything in between. I am your host, Adriana Villela, with my awesome co-host...

ANA: Ana Margarita Medina.

ADRIANA: And today we are talking to Michael Kehoe, who works at...I don't know where you work. [laughs]

MICHAEL: Confluent.

[laughter]

ADRIANA: Confluent. Excellent. Welcome, Michael. [laughs]

MICHAEL: Thank you, ladies. It's so great to be recording with you both.

ADRIANA: So, first things first, what are you drinking today?

MICHAEL: Today, it is just water. I just got over a cold recently. So we're recording this in the middle of the day. So water for now, but I've got some coffee liqueur to finish before I get on my plane tomorrow. So I think that will be my after-work drink this evening.

ADRIANA: There you go, goals for the end of the day.

MICHAEL: Absolutely.

ANA: That seems really tasty to look forward to towards the end of the day, so I'm definitely a bit jealous. For me, I'm actually very much feeling in the very festive mood. And I decided to get a white chocolate peppermint mocha with peppermint from Starbucks.

ADRIANA: Yay.

ANA: I'm on that holiday cheer kick today.

ADRIANA: Hooray, hooray. I've got a can of Perry lime with me and some water to supplement. So not super exciting, I'm afraid.

ANA: [laughs]

ADRIANA: But hydration is hydration, so good vibes all around. [laughs] All right, cool. Well, Michael, we always like to hear from our guests, like, how'd you get into your current career path?

MICHAEL: When I was very young, four or five, I had an interest in computers. And during school, I had the opportunity to tap into that a little bit, but not too much. When I was in college, I got an opportunity to work for my university's IT department. I went to school back in Australia, and the universities are generally much larger there, so we're talking about a user base of 40,000 students, 10,000 staff across dozens of campuses.

This role was in their network department. I started doing low-level tasks, making sure that switch ports worked in offices, setting up network switches. And that got to evolve into helping build out new data centers. It also allowed me to go and do more network engineering tasks. So my university was an ISP as well for not only the university but also commercial customers as well in the form of both residences but also other universities, other schools.

So this gave me a lot of experience to go and help build out actual solutions, build customer experience, customer service mentality a little bit. I got some exposure working with our Linux team as well. So I got to learn Puppet. I got to learn a little bit of LDAP and was able to start putting these different skill sets together.

At the same time, Google recruiters came to campus, and they had a presentation on SRE. And I'm like, oh, this is so cool. You get to do a little bit of coding, a little bit of Linux, a little bit of network, and problem-solving as well. And I really loved the combination of those skill sets. This is back in about 2012, 2013. There was this blog, I think it's still around, called "High Scalability," which talked about real-world architectures of different tech companies.

And I immersed myself in that to learn how all these things work. And so towards the end of college, I did some job interviews, and, thankfully, was able to interview at LinkedIn for an SRE position, and thankfully got the job and then went from there. I was on the Profile team at LinkedIn, and then from there went on to a more central team that handled more infrastructure across the whole site and practices and procedures. And during that time, I also got seconded to a number of different teams to help them through some trying situations.

So I got exposed to everything from the Ingress traffic layer all the way to backend key-value databases and grabbed a bunch of experience across teams, which was really awesome to be able to learn such specific knowledge across a variety of different experiences.

ADRIANA: Cool. So you actually got an SRE role right out of college, then.

MICHAEL: Yes. It was really challenging to do that. A lot of companies won't hire SREs out of college because it's very difficult to get that experience in college unless you've done an internship where the bar is a little bit different. But yes, I graduated and got on the first plane after grabbing my visa and got to work. And I'm very grateful for the opportunities I had at LinkedIn, especially to be able to just immerse myself in so many different areas of the stack.

ADRIANA: That is wild.

ANA: You also got a chance to come into SRE at a prime time. It hadn't necessarily kind of picked into a lot of big companies starting to use it, and the book had not come out yet. So folks were kind of still like, oh, there's this thing that you do to keep systems up for humans and Google calls it SRE. And, I mean, at that time, knew that Facebook called it production engineering, but that was it. Like, nobody was really trying to name-grab or anything. It was a really interesting time, so...

ADRIANA: Probably wasn't solid like it is now [laughs] because it was kind of at the forefront, right?

MICHAEL: I mean, I think at the time, there was a loose definition of what it was. And for those hyper-scale companies like Facebook, like Google, like LinkedIn, and others, I'm sure they worked out what made sense for their companies. At the time, I think a number of other companies were like, what is this thing? And post-Google SRE book publication, companies are now trying to work out how do I make this work in my organization?

And we definitely seem to be at the point where like, okay, this book isn't canon; it is meant to be a guide. And each company finds their own way to put these various principles into practice in a way that makes sense for both the culture of the business but also in terms of how the business is run.

ANA: 100%. And I think that we're still seeing that shift to people realizing that it's not a one-size-fits-all. I think we've had a lot of those conversations in our podcast recently where it's like, nope, you can't just grab the book and apply all to all your systems. It's all a lot of hard work. And as you said, culture comes into play. But the way that Google does it, the way that Facebook or LinkedIn does it does not apply to your enterprise company or your startup.

MICHAEL: Right. A friend of mine who worked at LinkedIn also has his own podcast series. He has a quote that "Culture eats technology for breakfast," which is very true.

ANA: I know that you, like, I met you when you were working at LinkedIn. You were working specifically on EBF and just sharing SRE practices. What do you feel about that space now, considering EBF observability has taken a huge rise? And as you were talking about it in 2018, you were the only person at conferences speaking on it in a sense.

MICHAEL: Right. I wish I had more time to spend on it, [laughs] honestly. So eBPF, when I was talking about it at, I think, O'Reilly Conferences and also the SREcon series it, was in its infancy in the industry. The foundations of eBPF started in; I think, 1991. And then the first commits to what's now known as eBPF was late 2013 or early 2014. And so it took a couple of years for people to start picking it up. And now you see companies specifically like Cilium that have really accelerated its growth, and they've created a very well-known product.

I think the great thing about that space is it is more or less an open-source technology eBPF. So anyone can go and create something exceptionally powerful with it, which is great. Obviously, Cilium has cornered the market and makes it more user-friendly to go and use in a Kubernetes containerized environment. But there's nothing really stopping me from going and building a high-performance load balancer or DDoS system if I really want to.

I think for the space that I'm in now in security, we can go and do very low-level, kernel-level auditing of system call events, or go and do deep introspection of network flows without having to worry about the overhead of that software. And that's so exceptionally cool. So while Cilium has cornered the market for now, I can't see a reason why other companies won't come out with eBPF-based network flow monitoring and different IDS solutions that fit a broader market than just platform-hosted Kubernetes.

ADRIANA: Now, for our listeners who might not have heard of eBPF or have heard of it but aren't really sure what it is, can you provide a super high-level definition description?

MICHAEL: Sure. So BPF essentially allows you to run your own programs in kernel space without needing a kernel module. The best example of this is tcpdump. You define your own filter, and that filter actually gets compiled into kernel bytecode. And then the kernel actually goes and pauses the packets coming through your network interface, rather than going and doing that in user space, and that is massively efficient.

So that concept is not particularly new. tcpdump has been around since the early '90s. But with eBPF, you can go and create your own filters and programs, go and create those in userspace. And then, they go and get compiled and get run safely in kernel space, which generates the efficiency of the program. So you can go and do network-level parsing, load balancing, or even look at all the system call events happening on a machine, do that in kernel space, get the results back in userspace, but do it at exceptionally high throughput, which makes it very attractive to system operators such as myself.

ANA: That's super awesome. I mean, it gives a lot of flexibility, and you get to use it in what ways works for your team.

MICHAEL: Absolutely. There are endless ways you can take it. Definitely, the bar to entry has been a little bit challenging, just because you need to be able to write programs that actually run in the kernel, so understanding all the intricacies of those C or Assembly if you're really eager for these programs. It is a little bit more challenging, especially for... I've written some network programs, you know, you need to understand the packet structure exactly to make things work. But yeah, if you know what you're doing, the possibilities are endless. And I think that's what makes it really exciting.

ANA: I haven't been following too much of the trends around it. But are we also then seeing, I guess, this being used a lot more in the security space?

MICHAEL: Definitely for doing container-based networking, yes. And Cilium also allows you to do the identity-based policies as well, which I think is very exciting. Because you sort of say, all right, this service can talk to this service, and a bunch of layers of abstraction take care of that for you. I think there's still a space where...the hyperscalers have definitely done it. Like Cloudflare, if you go and look at the eBPF tag on the Cloudflare blog, you'll find endless amounts of ways that they've used eBPF, probably the same with Facebook as well. The rest of the industry, we really haven't seen eBPF IDSs come to the forefront yet where we can really do deep introspection.

Falco uses eBPF as well to do some of its system monitoring. So we're getting there, but I don't think we're going to fully mature for probably another two or three years. I know Liz Rice, who is another person who has written on eBPF before and also works at Cilium; she's working on an O'Reilly book at the moment, which is really going to take us back to first principles and walk people through how to build their own programs. So I'm sure when the publication of that book happens next year, we'll again see a jump in interest in the space.

ANA: That's exciting. One of the questions we also haven't touched upon is how did you transition from SRE to security?

MICHAEL: While I was at LinkedIn, I had an interest in the space. LinkedIn, I think they still do it, had a Security Champions Program. So they took someone from each engineering department and allowed them to go through a quarterly training program. So you did some online Stanford cybersecurity courses and got paired with a security team member to go and work on a security-based project in your engineering team. So I did one of those programs while I was at LinkedIn.

While I was working on some of my projects towards the later part of my tenure at LinkedIn, I was heavily engaged with the security team on helping make our cloud infrastructure secure there. When I was looking for a new role, I was really excited to go and do something a little bit different, something a little bit new, and somewhere where I had a lot of space to grow and a lot of things to learn. So making the jump to Confluent and joining their security team was an exciting step. I definitely have learned a lot to date, especially since Confluent is a true multi-cloud vendor. So there are a lot of interesting challenges to be solved there.

ADRIANA: To continue on the security thread, I feel like, in some ways, security is a thankless job, right? Because I've been in companies where security policies are enacted, and developers are like, "What the hell, man? I need to be able to do my job for blah blah blah blah blah." But then there's like all this stuff happening behind the scenes that security folks can't necessarily talk about. What are your thoughts around this?

MICHAEL: So there's definitely a balance for infosecurity professionals defined in their roles. For myself, I see my partnership working on two levels, sort of similar to SRE, number one, the business needs our role to be successful for the business to keep growing. Working for a SaaS vendor, customers want certain requirements of our platform, certain guarantees, and it's up to us to help ensure that those are in place.

For the internal customers, which is mostly engineers, we definitely see our role as building a partnership. Yes, we need to put various controls and constraints in place. But how can we go and do that in a developer-friendly manner? For some of the things we have enacted, and I won't go into specifics, we've definitely consulted with various engineering partners of, like, okay, we need to do this. How can we make this user experience work well?

And so definitely taking that SRE mindset of how can we make the business successful on multiple fronts and bringing that to security has definitely been a focus of mine. We don't want to be the organization of saying, "No," we want to be the organization of like, let's work together and find a solution that suits the business across multiple different departments.

ADRIANA: Cool. And have you ever encountered a situation where you're like, you know, I want to enact this security policy, and then you start talking to developers, and they're like, "No, no, no, you haven't thought of XYZ." And how do you ensure that you keep things secure while still keeping the developers happy?

MICHAEL: That's a great question. So the thing that first comes to mind is a quote from Star Wars that "Only siths believe in absolutes," and security is sort of like that. Like, there are very few absolutes in anything that we do. Again, there are definitely policies that we always want to put in place. And it is challenging sometimes to be able to go and say, "I'm going to blanket-apply this policy."

For any project that we're doing, we're now by default putting a quarter aside of work just to do research on like, okay, we want to go in this direction. What impact will it have across the business? And so by the time it comes to going and executing on that, we're not going to upset a bunch of people. Like, we've already planned and accounted for it.

ADRIANA: Oh, nice. That's refreshing.

MICHAEL: Right. Changing our stance to being more expecting problems rather than going directly to escalations when we're trying to execute that has definitely been helpful. And Confluent is very good in its planning process. So before the quarter happens, we can go and talk with other teams saying, "All right, this is the direction we want to go in. For the next quarter, we're going to work with you to work out how we can make this policy happen. And then the quarter after, we'll need your partnership to go and execute on it." So that model has been very helpful for us in reducing the number of conflicts or potential problems that we have in the execution of our projects.

That being said, also just being a great partner in being involved in the design of projects as early on as we possibly can definitely help. Again, we're a vendor in a multi-cloud space that comes with a number of just unique challenges on the design of our infrastructure. So being involved in how some of our very low-level infrastructure works and then you design for those at the beginning definitely ensures that we can be much more successful later on. And instead of things having to go through a design review just before the projects are meant to go live, that happens in the months before. And we avoid surprises at the end.

ADRIANA: The true shift-left as DevOps intended. It's so refreshing to hear that. That's awesome.

MICHAEL: Right. We've taken that shift-left approach as well to our on-call. On-call is very different to SRE on-call, I found, which surprised me. I guess I didn't think about it a lot. Definitely, for any alerts we receive as a security team, we're looking at ways of, like, how can we automate this away? How can we make this more friendly to ourselves so that we don't have to get woken up by something in the middle of the night?

In the SRE space, I can kind of limit what changes are made to the infrastructure and keep that until the business day. But security people around the world or researchers or adversaries they don't really care about my nine-to-five. So we have to come up with ways to make our on-call exceptionally efficient so that we don't get burnt out during our on-call stints.

ANA: How else have you felt that is kind of different in terms of the principles that you advocate for working in the reliability space versus a security space?

MICHAEL: I think for reliability, it's a little bit easier to justify what you're trying to achieve because the site does need to be up, or the infrastructure needs to be up for you to make money. Security is a little bit more difficult to justify why you're doing something and its impact on the business. For confluent, we need our product to be secure for our customers to want it. And I think the company as a whole does understand that, which makes things easier.

But, again, making all these things happen really means that we have to be a very plugged-in partner for other teams. Since I joined, we've grown our security team very rapidly over the last 18 months. We've tried to be very diligent in that planning process, in that partnership process. Everyone now understands that, yes, security is very important to the growth of the business and also, of course, to our customers.

And so, now, instead of coming to people when there's a problem, we can tell them in advance, "This is the roadmap that we want to go down over the next time period. This is where we're going to need your partnership, and this is what we're going to be asking of you." And that has been a very successful way to ensure that we have buy-in from all those teams and so that they can also plan for some of the restrictions, or policies, or controls that we want to put in place over a period of time to uplevel what we do on a daily basis.

ADRIANA: Now, do you typically have security folks embedded with development teams as part of the work, or is it more of a team that services multiple teams?

MICHAEL: That's a great question. For our application security team, they're definitely very much embedded, not in a formal sense, but they're very much on the pulse of what different teams are doing. Confluent also has an on-prem offering as well. They need to be very plugged into release cycles, bug management, vulnerability management, et cetera.

For my team, which is the cloud security team, definitely, we've made a bunch of partnerships across the company. So people know to come to us if they've got a question. Over the period of time, we want to get more involved and formalize some of those partnerships. But we're still a very young security team, and we're still growing. So hopefully, in 2023, we can formalize some of those partnerships. And it would be great to have a similar model to what we did at LinkedIn and have those cloud security champions or application security champions in different teams that can be our eyes and ears on the ground on a daily basis.

ANA: That'd be cool to be able to see you completely start up one of those champion programs because it's true, that's one way to help make a practice a lot more sticky in an organization but to also uplevel folks and be like, this is something you're interested in, and you might want a career in it.

MICHAEL: Right. We actually have a cross-team collaborator award that my organization gives out every month. So we give, I think it's a gift card to someone who's really helped us over the last month. So we definitely would like to recognize those people in the company that are working to help us meet our goals. And I'm sure at some point; we'll expand that into formalizing some sort of program and really solidifying those relationships and also maybe doing some recruiting on the down-low.

ANA: With your role currently, do you find yourself also having to build software, or do you see it more as enforcing policies, reviewing?

MICHAEL: Great question. I haven't done a whole lot of software development while I'm here. That being said, I am rushing to finish one of my software development OKRs before the end of this week. My team definitely wants to build more software to help make our lives run smoother. Building software is really important for my team so that we can help enforce policies. So we need to do a bunch of compliance tasks. Being a vendor doing those tasks manually is not necessarily fun.

So we're actually building software to go and ensure that, yes, we're ticking these boxes. Yes, this thing has been done. Yes, we're monitoring these things automatically instead of manually. So that has definitely been a focus in our recruiting when we hire people. And then OKRs of, all right, how can I automate this away? How can I make this thing show up by default so that our jobs and the work that we do is scalable as the business grows?

AWS, GCP Azure, they keep opening new regions. Our footprint in those regions keep going up quite rapidly. The way that we secure the infrastructure also needs to be scalable with that infrastructure growth. So we're trying to build preventative and monitoring measures for different parts of the business and different parts of the engineering process that scale in a very hands-off manner.

ANA: With your work working on many different cloud providers and Confluent being a multi-cloud vendor, what are some things that companies can be doing currently to stay more secure?

MICHAEL: I think it comes down to doing fundamentals very well. Thinking about if I'm designing a system, how is the system going to scale over time? What are the points of manipulation in the system, and how can we, number one, audit what happens in that system for both regular actions against a system but also malicious actions against that system? And then looking at how can you put preventative controls in that system? If people did that when building things, that would make my job sort of obsolete mostly.

A lot of what my team does is work out how can we ensure that the system doesn't get abused? And if it potentially does or if we need to do an investigation, how can we ensure that we have the data to go and do that and draw positive conclusions? A lot of how we should be building infrastructure in going forward needs to be thinking about putting different access controls in place, like our backend systems.

Now, a thing in the industry, especially with the rise of GDPR and also different policies across different U.S. states, is thinking about access control, auditing from the beginning, and building that into the product or the system that you're building really makes a difference in ensuring that it is scalable over time in the security space.

ANA: It makes perfect sense. Is there any other metric, or observability, or monitoring that kind of also should be getting added to these procedures or these processes or systems?

MICHAEL: That's a great question. We're actually starting to build security scorecards for different teams. So we have a number of different measures of how well the team is doing on various security goals that we want to see. I won't detail all of it publicly. But that gives engineering teams, engineering leaders a good insight to see where we want to see those teams get to over a period of time or how quickly some of those things get implemented as well because that does matter to us.

We've definitely seen in the SRE space...like, at LinkedIn, we had this thing called service scorecard which was like a list of 50 different requirements that we'd like to see your service have. And that was all from dependency management to load balancing preferences, you know, having ownership data correctly listed, et cetera. So we're starting to build something similar. And I've definitely seen a few companies pop up offering similar things, mostly in the productivity engineering space. But I wouldn't be surprised to see it come to security at some point in time, a startup idea maybe for someone out there.

ANA: [laughs] I've also seen the productivity ones. And I definitely agree, like, on security, they should be forefront, or even just seeing a little bit more of tying it all together from security reliability, and developer experience, everything that goes behind the scenes of operating systems.

MICHAEL: Right. As we were talking about earlier, SRE is very much a role where you're serving different parts of the business to make it successful. Security, honestly, isn't that much different. It's a slightly different space. But security and infrastructure reliability are more intertwined than people probably give on first thought.

And I've definitely been in situations in my current role where I've been thinking about how do I make this infrastructure reliable but also secure at the same time? And thankfully, in some of those design meetings, I have the SRE experience to draw on and help those engineers make good design decisions from early stages of implementation.

ANA: It's funny you say that because that was actually one of the questions that we planned on asking you. Like, as you have done SRE and security work, what is that intersection?

MICHAEL: Closer than you think. My team definitely is very involved in the running of different security pieces of our infrastructure. We don't necessarily run all of those ourselves, but other teams do. So we're very invested in ensuring that those systems are reliable and also auditable. We definitely have these partner teams where we have partner infrastructure teams where we're very invested in what their roadmap looks like and how we can ensure that what they're doing to scale our infrastructure can be made more secure over a period of time.

ANA: With you being able to work on SRE and security in the past, what are your thoughts around the DevSecOps movement? Is this different from what y'all are doing right now?

MICHAEL: I think this is a really untapped area. So definitely, the DevOps space got all on board with Terraform. I have a lot of personal thoughts about Terraform.

ANA: I mean, podcasts are the perfect place to put all the spicy nuggets. I'm just going to throw that out there. [laughs]

MICHAEL: I'm more than happy to talk about this. I've definitely had people come and have conversations with me and use the word Terraform like it's a Jedi mind trick, [laughter] and so it's like, Terraform fixes all your problems. And Terraform definitely has its place, but Terraform by itself does not do that. Allowing people to go and write Terraform code and then run Terraform apply from your desktop is not that much more secure than allowing them to do that from a CLI or a portal.

All it does is make someone maybe commit it somewhere so people can work out what the hell you're trying to do. And so you definitely need systems that audit that behavior, which Terraform Enterprise provides that. That is also ungodly expensive. For the security space, we definitely see infrastructure as code as a good place for us to be very proactive about our stances.

There is now tfsec and a couple of other different projects out there where we're able to specify policy in pre-commit and tell people, "Okay, you shouldn't do that." So a very classic example is the creation of S3 buckets. You should not be able to create an S3 bucket that is public to the world. And instead of having a monitoring control on that or some sort of latter control that happens after the deployment of that S3 bucket, we can say at pre-commit time, "Hey, developer, this is not a good idea. You can expose S3 data to the world, but you have to do it in this way with these approvals."

ADRIANA: So you're putting some guardrails in place.

MICHAEL: Yeah, absolutely. The cloud providers are getting better at this as well. Like, I really love Azure Policy, where you can specify very granular policy across your cloud infrastructure and also script that to different management groups or subscriptions. That is a great preventative control. AWS is getting there.

But bringing the experience to the developer at pre-commit time, I think, is a really good experience and especially because those tools can allow us to point to that internal documentation of what to do instead of getting a blanket you're denied from doing this. So I think the DevSecOps space can really leverage IAC in a very positive manner.

And then there are a bunch of other security projects that are part of Cloud Native Computing Foundation that also enable other security features like signing images, et cetera, and also admission controllers as well. They're very important parts of the security worldview that have now become pretty baked into the development experience and the developer workflow.

ANA: That's really cool. I do like also what you mentioned of being able to tie it into your internal tooling with that pre-commit because you definitely just kind of want to guide them into, like, hey, it's not like a hit to the hand. It's just more of a let me guide you to the best practice that the organization actually wants you to do.

MICHAEL: So even this scorecard product we had at LinkedIn only told developers that they weren't following best practice after they had developed or deployed something. With the DevSecOps movement, moving that all to pre-commit goes a long way to ensuring that, number one, engineers are provided with actionable feedback and that it also becomes a blocking mechanism for them to get their pre-commit or get their commit pushed.

So this becomes a really good forcing function where we can helpfully tell the developer or engineer what to do and ensure that that gets done. So the best practice is adhered straightaway rather than something being deployed, we find it, then we have to go and ask them to go and fix it, which becomes a pretty bad cycle, especially at scale. So there are a number of different and interesting ways that we can be much more proactive and ensure that everything is secure by default rather than having to chase a team to go and adhere to a best practice retrospectively.

ADRIANA: Which is quite refreshing. Before we wrap up, I did want to ask you a question about policy as a service. Is that something...because you mentioned Azure has some mechanism to let you define --

MICHAEL: Yeah, Azure Policy.

ADRIANA: Yeah. And that AWS is kind of getting there. Are you aware of any tools out there that are kind of cloud-agnostic that give you that policy as a service thing?

MICHAEL: Definitely, there's an Open Policy Agent, which is a CNCF project and can be plugged into a number of different things. I think it can be used as an admission controller for Kubernetes. It can be used as a policy evaluator for SSH access. So there are definitely projects out there. And OPA is just a REST API essentially.

So I think when these systems become a little bit more mature, especially in the way that policy is crafted, OPA I found was a little bit trickier than I expected. Once these projects get a little bit more exposure and we can plug them into a whole variety of systems very easily, then we make life much more simple. And we don't have to worry about applying policies retroactively.

ADRIANA: Cool.

MICHAEL: We also centralize the policy, which is also very helpful.

ADRIANA: Yeah, yeah. That's cool. So I guess keep an eye out for OPA and other such CNCF projects out there. It's only going to get better from here.

MICHAEL: Right.

ANA: It's always cool to see how many other projects there are within the CNCF umbrella. Every time we record, I feel like I learn about a new one or a second one. And I'm like, whoa, [laughs] this space keeps growing.

ADRIANA: I know. It's wild.

MICHAEL: Right. There are obviously a lot of projects in the CNCF space. There are definitely a number that have a lot of potential to really make a huge difference, especially to smaller companies to do things that the hyperscalers were doing a couple of years ago. And they can do that from day one. One of the benefits that startups have is they are completely greenfield and are able to be a lot more nimble. So maybe one day that will be me, not yet. But I would love that opportunity at some point in my career to build from greenfield with these mature cloud-native projects out of the books.

ADRIANA: Yeah, that will be awesome.

ANA: Most definitely. Definitely agree there, too.

ADRIANA: Cool. As we wrap up, do you have any words of wisdom that you want to share with our listeners on security nuggets?

MICHAEL: I think, going back to what I said earlier, only a Sith believes in absolutes, but there are very few absolutes in the work that we do. Security is meant to serve multiple parts of the business. And so the role of a security team is not to say, "No," it's to work out solutions to help the business be successful and meet its various contractual obligations, and take that SRE mindset and try and build off what you know SRE has done over the last 15 years and move things to the left, build those partnerships, use software to make our lives easier and to automate as much as we possibly can.

ADRIANA: Love it. Those are really great pieces of advice.

ANA: Well, with that, thank you so much, Michael, for joining us in today's podcast.

MICHAEL: Thank you.

ANA: Don't forget to subscribe and give us a shout-out on all social medias via oncallmemaybe. And be sure to check out the show notes on oncallmemaybe.com for additional resources and to connect with us and our guests on social media. For On-Call Me Maybe, we're your hosts, Ana Margarita Medina...

ADRIANA: And Adriana Villela. Signing off with...

MICHAEL: Peace, love, and code.

ADRIANA: Woo-hoo! Yay.

Security is Thy Friend with Michael Kehoe of Confluent

On-Call Me Maybe

Twitter Mentions