Latest Digressions Podcast Episodes
So long, and thanks for all the fish
Linear Digressions - July 26, 2020 23:32 - 35 minutes ★★★★★ - 350 ratingsAll good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge p...
A Reality Check on AI-Driven Medical Assistants
Linear Digressions - July 19, 2020 23:51 - 14 minutes ★★★★★ - 350 ratingsThe data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver can...
A Data Science Take on Open Policing Data
Linear Digressions - July 13, 2020 02:02 - 23 minutes ★★★★★ - 350 ratingsA few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay ...
The Data Science Open Source Ecosystem
Linear Digressions - June 29, 2020 02:34 - 23 minutes ★★★★★ - 350 ratingsOpen source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom d...
Criminology and Data Science
Linear Digressions - June 15, 2020 01:26 - 30 minutes ★★★★★ - 350 ratingsThis episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicat...
Racism, the criminal justice system, and data science
Linear Digressions - June 07, 2020 23:33 - 31 minutes ★★★★★ - 350 ratingsAs protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to giv...
An interstitial word from Ben
Linear Digressions - June 05, 2020 01:38 - 5 minutes ★★★★★ - 350 ratingsA message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.
Convolutional Neural Networks
Linear Digressions - May 31, 2020 21:46 - 21 minutes ★★★★★ - 350 ratingsThis is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional netwo...
Protecting Individual-Level Census Data with Differential Privacy
Linear Digressions - May 18, 2020 01:49 - 21 minutes ★★★★★ - 350 ratingsThe power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That proble...
Causal Trees
Linear Digressions - May 11, 2020 01:34 - 15 minutes ★★★★★ - 350 ratingsWhat do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are ...
The Grammar Of Graphics
Linear Digressions - May 04, 2020 01:12 - 35 minutes ★★★★★ - 350 ratingsYou may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’...
Gaussian Processes
Linear Digressions - April 27, 2020 01:33 - 20 minutes ★★★★★ - 350 ratingsIt’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option wher...
Keeping ourselves honest when we work with observational healthcare data
Linear Digressions - April 20, 2020 02:43 - 19 minutes ★★★★★ - 350 ratingsThe abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hun...
Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell
Linear Digressions - April 13, 2020 01:55 - 28 minutes ★★★★★ - 350 ratingsAI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now t...
Putting machine learning into a database
Linear Digressions - April 06, 2020 01:51 - 24 minutes ★★★★★ - 350 ratingsMost data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actuall...
The work-from-home episode
Linear Digressions - March 29, 2020 22:23 - 29 minutes ★★★★★ - 350 ratingsMany of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode e...
Understanding Covid-19 transmission: what the data suggests about how the disease spreads
Linear Digressions - March 23, 2020 01:03 - 25 minutes ★★★★★ - 350 ratingsCovid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but ...
Network effects re-release: when the power of a public health measure lies in widespread adoption
Linear Digressions - March 15, 2020 22:43 - 26 minutes ★★★★★ - 350 ratingsThis week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: m...
Causal inference when you can't experiment: difference-in-differences and synthetic controls
Linear Digressions - March 09, 2020 01:39 - 20 minutes ★★★★★ - 350 ratingsWhen you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situ...
Better know a distribution: the Poisson distribution
Linear Digressions - March 02, 2020 02:55 - 31 minutes ★★★★★ - 350 ratingsThis is a re-release of an episode that originally ran on October 21, 2018. The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of ...
The Lottery Ticket Hypothesis
Linear Digressions - February 23, 2020 23:03 - 19 minutes ★★★★★ - 350 ratingsRecent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides...
Interesting technical issues prompted by GDPR and data privacy concerns
Linear Digressions - February 17, 2020 01:50 - 20 minutes ★★★★★ - 350 ratingsData privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of gi...
Thinking of data science initiatives as innovation initiatives
Linear Digressions - February 10, 2020 01:10 - 17 minutes ★★★★★ - 350 ratingsPut yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing that data science is revolutionizing the world and the firms that survive and thrive in the coming years are those that execute on a data strategy....
Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng
Linear Digressions - February 02, 2020 23:36 - 31 minutes ★★★★★ - 350 ratingsAs demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foundation for their work, higher education institutions are coming to terms with what’s required to educate the next cohorts of data scientists. The h...
Running experiments when there are network effects
Linear Digressions - January 27, 2020 00:13 - 24 minutes ★★★★★ - 350 ratingsTraditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment uni...
Zeroing in on what makes adversarial examples possible
Linear Digressions - January 20, 2020 02:41 - 22 minutes ★★★★★ - 350 ratingsAdversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make ...
Unsupervised Dimensionality Reduction: UMAP vs t-SNE
Linear Digressions - January 13, 2020 00:53 - 29 minutes ★★★★★ - 350 ratingsDimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with informat...
Data scientists: beware of simple metrics
Linear Digressions - January 05, 2020 22:54 - 24 minutes ★★★★★ - 350 ratingsPicking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think...
Communicating data science, from academia to industry
Linear Digressions - December 30, 2019 01:53 - 26 minutes ★★★★★ - 350 ratingsFor something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, ...
Optimizing for the short-term vs. the long-term
Linear Digressions - December 23, 2019 02:50 - 19 minutes ★★★★★ - 350 ratingsWhen data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sa...
Related Digressions Topics