Data Skeptic artwork

Data Skeptic

533 episodes - English - Latest episode: about 3 hours ago - ★★★★★ - 477 ratings

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Science Technology machinelearning skepticism datamining datascience science statistics
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Consensus Voting

September 07, 2020 14:00 - 22 minutes - 26.3 MB

Mashbat Suzuki joins us to discuss the paper How Many Freemasons Are There? The Consensus Voting Mechanism in Metric Spaces. Check out Mashbat’s and many other great talks at the 13th Symposium on Algorithmic Game Theory (SAGT 2020)

Voting Mechanisms

August 31, 2020 14:00 - 27 minutes - 31.4 MB

Steven Heilman joins us to discuss his paper Designing Stable Elections. For a general interest article, see: https://theconversation.com/the-electoral-college-is-surprisingly-vulnerable-to-popular-vote-changes-141104 Steven Heilman receives funding from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

False Consensus

August 24, 2020 22:16 - 33 minutes - 37.9 MB

Sami Yousif joins us to discuss the paper The Illusion of Consensus: A Failure to Distinguish Between True and False Consensus. This work empirically explores how individuals evaluate consensus under different experimental conditions reviewing online news articles. More from Sami at samiyousif.org Link to survey mentioned by Daniel Kerrigan: https://forms.gle/TCdGem3WTUYEP31B8

False Concensus

August 24, 2020 22:16 - 33 minutes - 37.9 MB

Sami Yousif joins us to discuss the paper The Illusion of Consensus: A Failure to Distinguish Between True and False Consensus. This work empirically explores how individuals evaluate concensus under different experimental conditions reviewing online news articles. More from Sami at samiyousif.org

Fraud Detection in Real Time

August 18, 2020 07:12 - 38 minutes - 35.1 MB

In this solo episode, Kyle overviews the field of fraud detection with eCommerce as a use case.  He discusses some of the techniques and system architectures used by companies to fight fraud with a focus on why these things need to be approached from a real-time perspective.

Listener Survey Review

August 11, 2020 17:01 - 23 minutes - 26.5 MB

In this episode, Kyle and Linhda review the results of our recent survey. Hear all about the demographic details and how we interpret these results.

Human Computer Interaction and Online Privacy

July 27, 2020 21:43 - 32 minutes - 37.3 MB

Moses Namara from the HATLab joins us to discuss his research into the interaction between privacy and human-computer interaction.

Authorship Attribution of Lennon McCartney Songs

July 20, 2020 15:00 - 33 minutes - 57 MB

Mark Glickman joins us to discuss the paper Data in the Life: Authorship Attribution in Lennon-McCartney Songs.

GANs Can Be Interpretable

July 11, 2020 02:42 - 26 minutes - 30.5 MB

Erik Härkönen joins us to discuss the paper GANSpace: Discovering Interpretable GAN Controls. During the interview, Kyle makes reference to this amazing interpretable GAN controls video and it’s accompanying codebase found here. Erik mentions the GANspace collab notebook which is a rapid way to try these ideas out for yourself.

Sentiment Preserving Fake Reviews

July 06, 2020 22:48 - 28 minutes - 32.8 MB

David Ifeoluwa Adelani joins us to discuss Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection.

Interpretability Practitioners

June 26, 2020 16:43 - 32 minutes - 36.8 MB

Sungsoo Ray Hong joins us to discuss the paper Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs.

Facial Recognition Auditing

June 19, 2020 18:34 - 47 minutes - 54.4 MB

Deb Raji joins us to discuss her recent publication Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing.

Robust Fit to Nature

June 12, 2020 15:56 - 38 minutes - 43.8 MB

Uri Hasson joins us this week to discuss the paper Robust-fit to Nature: An Evolutionary Perspective on Biological (and Artificial) Neural Networks.

Black Boxes Are Not Required

June 05, 2020 19:59 - 32 minutes - 37.2 MB

Deep neural networks are undeniably effective. They rely on such a high number of parameters, that they are appropriately described as “black boxes”. While black boxes lack desirably properties like interpretability and explainability, in some cases, their accuracy makes them incredibly useful. But does achiving “usefulness” require a black box? Can we be sure an equally valid but simpler solution does not exist? Cynthia Rudin helps us answer that question. We discuss her recent pape...

Robustness to Unforeseen Adversarial Attacks

May 30, 2020 15:29 - 21 minutes - 24.9 MB

Daniel Kang joins us to discuss the paper Testing Robustness Against Unforeseen Adversaries.

Estimating the Size of Language Acquisition

May 22, 2020 21:36 - 25 minutes - 28.7 MB

Frank Mollica joins us to discuss the paper Humans store about 1.5 megabytes of information during language acquisition

Interpretable AI in Healthcare

May 15, 2020 15:49 - 35 minutes - 41 MB

Jayaraman Thiagarajan joins us to discuss the recent paper Calibrating Healthcare AI: Towards Reliable and Interpretable Deep Predictive Models.

Understanding Neural Networks

May 08, 2020 17:07 - 34 minutes - 39.7 MB

What does it mean to understand a neural network? That’s the question posted on this arXiv paper. Kyle speaks with Tim Lillicrap about this and several other big questions.

Self-Explaining AI

May 02, 2020 05:23 - 32 minutes - 36.7 MB

Dan Elton joins us to discuss self-explaining AI. What could be better than an interpretable model? How about a model wich explains itself in a conversational way, engaging in a back and forth with the user. We discuss the paper Self-explaining AI as an alternative to interpretable AI which presents a framework for self-explainging AI.

Plastic Bag Bans

April 24, 2020 15:45 - 34 minutes - 39.9 MB

Becca Taylor joins us to discuss her work studying the impact of plastic bag bans as published in Bag Leakage: The Effect of Disposable Carryout Bag Regulations on Unregulated Bags from the Journal of Environmental Economics and Management. How does one measure the impact of these bans? Are they achieving their intended goals? Join us and find out!

Self Driving Cars and Pedestrians

April 18, 2020 17:58 - 30 minutes - 35.2 MB

We are joined by Arash Kalatian to discuss Decoding pedestrian and automated vehicle interactions using immersive virtual reality and interpretable deep learning.

Computer Vision is Not Perfect

April 10, 2020 17:53 - 26 minutes - 18.1 MB

Computer Vision is not Perfect Julia Evans joins us help answer the question why do neural networks think a panda is a vulture. Kyle talks to Julia about her hands-on work fooling neural networks. Julia runs Wizard Zines which publishes works such as Your Linux Toolbox. You can find her on Twitter @b0rk

Uncertainty Representations

April 04, 2020 15:18 - 39 minutes - 45.5 MB

Jessica Hullman joins us to share her expertise on data visualization and communication of data in the media. We discuss Jessica’s work on visualizing uncertainty, interviewing visualization designers on why they don't visualize uncertainty, and modeling interactions with visualizations as Bayesian updates. Homepage: http://users.eecs.northwestern.edu/~jhullman/ Lab: MU Collective

AlphaGo, COVID-19 Contact Tracing and New Data Set

March 28, 2020 06:00 - 33 minutes - 38.5 MB

Announcing Journal Club I am pleased to announce Data Skeptic is launching a new spin-off show called "Journal Club" with similar themes but a very different format to the Data Skeptic everyone is used to. In Journal Club, we will have a regular panel and occasional guest panelists to discuss interesting news items and one featured journal article every week in a roundtable discussion. Each week, I'll be joined by Lan Guo and George Kemp for a discussion of interesting data science relat...

Visualizing Uncertainty

March 20, 2020 15:00 - 32 minutes - 37.6 MB

Interpretability Tooling

March 13, 2020 15:00 - 42 minutes - 48.8 MB

Pramit Choudhary joins us to talk about the methodologies and tools used to assist with model interpretability.

Shapley Values

March 06, 2020 20:29 - 20 minutes - 23 MB

Kyle and Linhda discuss how Shapley Values might be a good tool for determining what makes the cut for a home renovation.

Anchors as Explanations

February 28, 2020 14:46 - 37 minutes - 42.5 MB

We welcome back Marco Tulio Ribeiro to discuss research he has done since our original discussion on LIME. In particular, we ask the question Are Red Roses Red? and discuss how Anchors provide high precision model-agnostic explanations. Please take our listener survey.

Mathematical Models of Ecological Systems

February 22, 2020 00:10 - 36 minutes - 42 MB

Adversarial Explanations

February 14, 2020 23:10 - 36 minutes - 42.2 MB

Walt Woods joins us to discuss his paper Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness with co-authors Jack Chen and Christof Teuscher.

ObjectNet

February 07, 2020 16:00 - 38 minutes - 44.2 MB

Andrei Barbu joins us to discuss ObjectNet - a new kind of vision dataset. In contrast to ImageNet, ObjectNet seeks to provide images that are more representative of the types of images an autonomous machine is likely to encounter in the real world. Collecting a dataset in this way required careful use of Mechanical Turk to get Turkers to provide a corpus of images that removes some of the bias found in ImageNet. http://0xab.com/

Visualization and Interpretability

January 31, 2020 16:00 - 35 minutes - 41 MB

Enrico Bertini joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at http://enrico.bertini.io/. More from Enrico with co-host Moritz Stefaner on the Data Stories podcast!

Interpretable One Shot Learning

January 26, 2020 05:00 - 30 minutes - 35.1 MB

We welcome Su Wang back to Data Skeptic to discuss the paper Distributional modeling on a diet: One-shot word learning from text only.

Fooling Computer Vision

January 22, 2020 18:38 - 25 minutes - 23.3 MB

Wiebe van Ranst joins us to talk about a project in which specially designed printed images can fool a computer vision system, preventing it from identifying a person.  Their attack targets the popular YOLO2 pre-trained image recognition model, and thus, is likely to be widely applicable.

Algorithmic Fairness

January 14, 2020 02:31 - 42 minutes - 48.2 MB

This episode includes an interview with Aaron Roth author of The Ethical Algorithm.

Interpretability

January 07, 2020 08:33 - 32 minutes - 37.7 MB

Interpretability Machine learning has shown a rapid expansion into every sector and industry. With increasing reliance on models and increasing stakes for the decisions of models, questions of how models actually work are becoming increasingly important to ask. Welcome to Data Skeptic Interpretability. In this episode, Kyle interviews Christoph Molnar about his book Interpretable Machine Learning. Thanks to our sponsor, the Gartner Data & Analytics Summit going on in Grapevine, TX on...

NLP in 2019

December 31, 2019 11:51 - 38 minutes - 31 MB

A year in recap.

The Limits of NLP

December 24, 2019 01:18 - 29 minutes - 23.8 MB

We are joined by Colin Raffel to discuss the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer".

Jumpstart Your ML Project

December 15, 2019 17:25 - 20 minutes - 23.5 MB

Seth Juarez joins us to discuss the toolbox of options available to a data scientist to jumpstart or extend their machine learning efforts.

Serverless NLP Model Training

December 10, 2019 02:13 - 29 minutes - 24 MB

Alex Reeves joins us to discuss some of the challenges around building a serverless, scalable, generic machine learning pipeline.  The is a technical deep dive on architecting solutions and a discussion of some of the design choices made.

Team Data Science Process

December 03, 2019 22:54 - 41 minutes - 37.9 MB

Buck Woody joins Kyle to share experiences from the field and the application of the Team Data Science Process - a popular six-phase workflow for doing data science.  

Ancient Text Restoration

December 01, 2019 06:25 - 41 minutes - 36.5 MB

Thea Sommerschield joins us this week to discuss the development of Pythia - a machine learning model trained to assist in the reconstruction of ancient language text.

ML Ops

November 27, 2019 08:18 - 36 minutes - 41.8 MB

Kyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.

Annotator Bias

November 23, 2019 21:46 - 25 minutes - 23.7 MB

The modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on.  Folk wisdom estimates used to be around 100k documents were required for effective training.  The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora. Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use ...

NLP for Developers

November 20, 2019 03:00 - 29 minutes - 26.6 MB

While at MS Build 2019, Kyle sat down with Lance Olson from the Applied AI team about how tools like cognitive services and cognitive search enable non-data scientists to access relatively advanced NLP tools out of box, and how more advanced data scientists can focus more time on the bigger picture problems.

Indigenous American Language Research

November 13, 2019 09:40 - 22 minutes - 18.3 MB

Manuel Mager joins us to discuss natural language processing for low and under-resourced languages.  We discuss current work in this area and the Naki Project which aggregates research on NLP for native and indigenous languages of the American continent.

Talking to GPT-2

October 31, 2019 19:45 - 29 minutes - 23.4 MB

GPT-2 is yet another in a succession of models like ELMo and BERT which adopt a similar deep learning architecture and train an unsupervised model on a massive text corpus. As we have been covering recently, these approaches are showing tremendous promise, but how close are they to an AGI?  Our guest today, Vazgen Davidyants wondered exactly that, and have conversations with a Chatbot running GPT-2.  We discuss his experiences as well as some novel thoughts on artificial intelligence.

Reproducing Deep Learning Models

October 23, 2019 01:15 - 22 minutes - 20.8 MB

Rajiv Shah attempted to reproduce an earthquake-predicting deep learning model.  His results exposed some issues with the model.  Kyle and Rajiv discuss the original paper and Rajiv's analysis.

What BERT is Not

October 14, 2019 21:02 - 27 minutes - 24.7 MB

Allyson Ettinger joins us to discuss her work in computational linguistics, specifically in exploring some of the ways in which the popular natural language processing approach BERT has limitations.

SpanBERT

October 08, 2019 08:27 - 24 minutes - 22.7 MB

Omer Levy joins us to discuss "SpanBERT: Improving Pre-training by Representing and Predicting Spans". https://arxiv.org/abs/1907.10529

Twitter Mentions

@sami_r_yousif 2 Episodes
@leerosevere 2 Episodes
@halfak 1 Episode
@boreshkin 1 Episode
@tomlevenson 1 Episode
@mark_azurecat 1 Episode
@randal_olson 1 Episode
@karthick_sh 1 Episode
@andersdrachen 1 Episode
@iamzareenf 1 Episode
@rajiinio 1 Episode
@chengtao_chu 1 Episode
@antoine77340 1 Episode
@samuelmehr 1 Episode
@rajcs4 1 Episode
@anderssandberg 1 Episode
@celestiaward 1 Episode
@akalatian 1 Episode
@niftyc 1 Episode
@maverickpramit 1 Episode