Accelerating Computation, Machine Learning, and Data Mesh with Sophie Watson

Open||Source||Data

English - July 06, 2022 14:45 - 38 minutes - 35.3 MB - ★★★★★ - 16 ratings
Technology Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Democratization and Cognition with Margot Gerritsen, Rachel Chalmers, and Patricia Boswell

Next Episode: Season 3 Compressed Edition with Sam and Audra

This episode features an interview with Sophie Watson, Technical Product Marketing Manager at NVIDIA. Previously, Sophie served as a software engineer and principal data scientist at RedHat where she used machine learning to solve business problems in the hybrid cloud. Sophie has a PhD in Bayesian statistics and frequently speaks about machine learning workflows on Kubernetes, recommendation engines, and machine learning for search.

In this episode, Sam and Sophie discuss Principal Component Analysis, computational acceleration, and MLOps.

-------------------

“We all start when we get hold of a data set by visualizing it to try to understand it. So that usually for me involves starting with a simple technique, something like PCA, Principal Component Analysis. It's been around since the eighties, probably longer, maybe the sixties. Don't quote me on that. With Principal Component Analysis, we can map our high dimensional data down to a smaller number of dimensions. Let's map it down to two so that we can visualize it. So we can go ahead and visualize it. But Principal Component Analysis is quite a simple technique in what it's doing and it's just mapping onto key components of our data. We might not be able to see, perhaps, separation of classes if we're working with data that's from a set of classes. Maybe we're looking at transactions, are they fraudulent or are they legitimate? And we might not be able to see that distinction. So that makes us think, "Is there something interesting in my data? Am I going to be able to train a machine learning model?" I don't know. Back in the day, I think the next step would've been, “Oh, let's train a model in C”, but now with accelerated compute within a really reasonable amount of time, we can go ahead and use a more sophisticated technique so we can use something like UMAP that's leaning on differential manifolds to do that projection to lower dimensions. And because this technique is slightly more sophisticated, what we find in general is that within the same amount of time, we're able to get more insight into the data. We're able to see the distinction in classes between our data sets. It keeps you in that loop. It keeps you in that productivity state.” – Sophie Watson

-------------------

Episode Timestamps:

(01:22): What open source data means to Sophie

(02:47): How Sophie is spending her time

(07:52): What excites Sophia about the data science community

(10:13): What Sophie is most excited about in data visibility

(16:29): Data on servers versus data in the cloud

(18:09): Accelerated computation on machine learning

(22:27): Sophie breaks down probabilistic programming

(24:21): What problem was Sophie trying to solve in her career

(32:12): Sophie’s dream job of working for Taylor Swift

(34:48): Sophie’s advice for those interested in open source