This episode features an interview with Holden Karau, an Open Source Engineer at Netflix. Holden is best known for her work on Apache Spark, her advocacy in the open source software movement, and her creation of a variety of related projects including spark-testing-base. Previously, Holden worked at Big Tech companies like Apple, IBM, and Google as a software engineer and developer advocate.

In this episode, Sam sits down with Holden to discuss the data analysis stack, functional programming, and the future of open source software data tooling.

-------------------

“These things are not one off. We may think that they're one off and they don't need testing, but that's not the reality. When you write something, it needs to be maintainable and as software people, the only real way that I think we know to make something vaguely maintainable is to at least have tests. And these tests need to cover common failure cases that we've experienced. And certainly, there's different approaches to this. There's property based testing, there's golden sets, all kinds of different options. I don't think necessarily any one approach is right or better here, but I think we need something. We need less untitled 5.IPython Notebook running in production, scheduled every hour. That is not a way to run a company.” – Holden Karau

-------------------

Episode Timestamps:

(02:27): What open source data means to Holden

(04:37): What interested Holden in mathematical computer science 

(09:51): What drew Holden to Spark

(12:49): What Holden has learned about cognitive systems

(20:02): What we need to learn as developers and data specialists

(25:28): The future of the data analysis stack

(31:21): Improvements in data tooling over the next 5 years

(34:25): A question Holden wishes to be asked

(40:51): Holden’s advice for open source data project committers

(43:18): Executive producer, Audra Montenegro's backstage takeaways

-------------------

Links:

LinkedIn - Connect with Holden

Buy Holden’s books

Visit Holden’s website