Latest Digressions Podcast Episodes

So long, and thanks for all the fish

Linear Digressions - July 26, 2020 23:32 - 35 minutes ★★★★★ - 350 ratings
All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge p...

Technology data science machine learning linear

A Reality Check on AI-Driven Medical Assistants

Linear Digressions - July 19, 2020 23:51 - 14 minutes ★★★★★ - 350 ratings
The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver can...

Technology data science machine learning linear

A Data Science Take on Open Policing Data

Linear Digressions - July 13, 2020 02:02 - 23 minutes ★★★★★ - 350 ratings
A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay ...

Technology data science machine learning linear

The Data Science Open Source Ecosystem

Linear Digressions - June 29, 2020 02:34 - 23 minutes ★★★★★ - 350 ratings
Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom d...

Technology data science machine learning linear

Criminology and Data Science

Linear Digressions - June 15, 2020 01:26 - 30 minutes ★★★★★ - 350 ratings
This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicat...

Technology data science machine learning linear

Racism, the criminal justice system, and data science

Linear Digressions - June 07, 2020 23:33 - 31 minutes ★★★★★ - 350 ratings
As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to giv...

Technology data science machine learning linear

An interstitial word from Ben

Linear Digressions - June 05, 2020 01:38 - 5 minutes ★★★★★ - 350 ratings
A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.

Technology data science machine learning linear

Convolutional Neural Networks

Linear Digressions - May 31, 2020 21:46 - 21 minutes ★★★★★ - 350 ratings
This is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional netwo...

Technology data science machine learning linear

Protecting Individual-Level Census Data with Differential Privacy

Linear Digressions - May 18, 2020 01:49 - 21 minutes ★★★★★ - 350 ratings
The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That proble...

Technology data science machine learning linear

Causal Trees

Linear Digressions - May 11, 2020 01:34 - 15 minutes ★★★★★ - 350 ratings
What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are ...

Technology data science machine learning linear

The Grammar Of Graphics

Linear Digressions - May 04, 2020 01:12 - 35 minutes ★★★★★ - 350 ratings
You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’...

Technology data science machine learning linear

Gaussian Processes

Linear Digressions - April 27, 2020 01:33 - 20 minutes ★★★★★ - 350 ratings
It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option wher...

Technology data science machine learning linear

Keeping ourselves honest when we work with observational healthcare data

Linear Digressions - April 20, 2020 02:43 - 19 minutes ★★★★★ - 350 ratings
The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hun...

Technology data science machine learning linear

Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell

Linear Digressions - April 13, 2020 01:55 - 28 minutes ★★★★★ - 350 ratings
AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now t...

Technology data science machine learning linear

Putting machine learning into a database

Linear Digressions - April 06, 2020 01:51 - 24 minutes ★★★★★ - 350 ratings
Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actuall...

Technology data science machine learning linear

The work-from-home episode

Linear Digressions - March 29, 2020 22:23 - 29 minutes ★★★★★ - 350 ratings
Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode e...

Technology data science machine learning linear

Understanding Covid-19 transmission: what the data suggests about how the disease spreads

Linear Digressions - March 23, 2020 01:03 - 25 minutes ★★★★★ - 350 ratings
Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but ...

Technology data science machine learning linear

Network effects re-release: when the power of a public health measure lies in widespread adoption

Linear Digressions - March 15, 2020 22:43 - 26 minutes ★★★★★ - 350 ratings
This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: m...

Technology data science machine learning linear

Causal inference when you can't experiment: difference-in-differences and synthetic controls

Linear Digressions - March 09, 2020 01:39 - 20 minutes ★★★★★ - 350 ratings
When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situ...

Technology data science machine learning linear

Better know a distribution: the Poisson distribution

Linear Digressions - March 02, 2020 02:55 - 31 minutes ★★★★★ - 350 ratings
This is a re-release of an episode that originally ran on October 21, 2018. The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of ...

Technology data science machine learning linear

The Lottery Ticket Hypothesis

Linear Digressions - February 23, 2020 23:03 - 19 minutes ★★★★★ - 350 ratings
Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides...

Technology data science machine learning linear

Interesting technical issues prompted by GDPR and data privacy concerns

Linear Digressions - February 17, 2020 01:50 - 20 minutes ★★★★★ - 350 ratings
Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of gi...

Technology data science machine learning linear

Thinking of data science initiatives as innovation initiatives

Linear Digressions - February 10, 2020 01:10 - 17 minutes ★★★★★ - 350 ratings
Put yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing that data science is revolutionizing the world and the firms that survive and thrive in the coming years are those that execute on a data strategy....

Technology data science machine learning linear

Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng

Linear Digressions - February 02, 2020 23:36 - 31 minutes ★★★★★ - 350 ratings
As demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foundation for their work, higher education institutions are coming to terms with what’s required to educate the next cohorts of data scientists. The h...

Technology data science machine learning linear

Running experiments when there are network effects

Linear Digressions - January 27, 2020 00:13 - 24 minutes ★★★★★ - 350 ratings
Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment uni...

Technology data science machine learning linear

Zeroing in on what makes adversarial examples possible

Linear Digressions - January 20, 2020 02:41 - 22 minutes ★★★★★ - 350 ratings
Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make ...

Technology data science machine learning linear

Unsupervised Dimensionality Reduction: UMAP vs t-SNE

Linear Digressions - January 13, 2020 00:53 - 29 minutes ★★★★★ - 350 ratings
Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with informat...

Technology data science machine learning linear

Data scientists: beware of simple metrics

Linear Digressions - January 05, 2020 22:54 - 24 minutes ★★★★★ - 350 ratings
Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think...

Technology data science machine learning linear

Communicating data science, from academia to industry

Linear Digressions - December 30, 2019 01:53 - 26 minutes ★★★★★ - 350 ratings
For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, ...

Technology data science machine learning linear

Optimizing for the short-term vs. the long-term

Linear Digressions - December 23, 2019 02:50 - 19 minutes ★★★★★ - 350 ratings
When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sa...

Technology data science machine learning linear