Data Skeptic
533 episodes - English - Latest episode: about 3 hours ago - ★★★★★ - 477 ratingsThe Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed
Episodes
Consensus Voting
September 07, 2020 14:00 - 22 minutes - 26.3 MBMashbat Suzuki joins us to discuss the paper How Many Freemasons Are There? The Consensus Voting Mechanism in Metric Spaces. Check out Mashbat’s and many other great talks at the 13th Symposium on Algorithmic Game Theory (SAGT 2020)
Voting Mechanisms
August 31, 2020 14:00 - 27 minutes - 31.4 MBSteven Heilman joins us to discuss his paper Designing Stable Elections. For a general interest article, see: https://theconversation.com/the-electoral-college-is-surprisingly-vulnerable-to-popular-vote-changes-141104 Steven Heilman receives funding from the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.
False Consensus
August 24, 2020 22:16 - 33 minutes - 37.9 MBSami Yousif joins us to discuss the paper The Illusion of Consensus: A Failure to Distinguish Between True and False Consensus. This work empirically explores how individuals evaluate consensus under different experimental conditions reviewing online news articles. More from Sami at samiyousif.org Link to survey mentioned by Daniel Kerrigan: https://forms.gle/TCdGem3WTUYEP31B8
False Concensus
August 24, 2020 22:16 - 33 minutes - 37.9 MBSami Yousif joins us to discuss the paper The Illusion of Consensus: A Failure to Distinguish Between True and False Consensus. This work empirically explores how individuals evaluate concensus under different experimental conditions reviewing online news articles. More from Sami at samiyousif.org
Fraud Detection in Real Time
August 18, 2020 07:12 - 38 minutes - 35.1 MBIn this solo episode, Kyle overviews the field of fraud detection with eCommerce as a use case. He discusses some of the techniques and system architectures used by companies to fight fraud with a focus on why these things need to be approached from a real-time perspective.
Listener Survey Review
August 11, 2020 17:01 - 23 minutes - 26.5 MBIn this episode, Kyle and Linhda review the results of our recent survey. Hear all about the demographic details and how we interpret these results.
Human Computer Interaction and Online Privacy
July 27, 2020 21:43 - 32 minutes - 37.3 MBMoses Namara from the HATLab joins us to discuss his research into the interaction between privacy and human-computer interaction.
Authorship Attribution of Lennon McCartney Songs
July 20, 2020 15:00 - 33 minutes - 57 MBMark Glickman joins us to discuss the paper Data in the Life: Authorship Attribution in Lennon-McCartney Songs.
GANs Can Be Interpretable
July 11, 2020 02:42 - 26 minutes - 30.5 MBErik Härkönen joins us to discuss the paper GANSpace: Discovering Interpretable GAN Controls. During the interview, Kyle makes reference to this amazing interpretable GAN controls video and it’s accompanying codebase found here. Erik mentions the GANspace collab notebook which is a rapid way to try these ideas out for yourself.
Sentiment Preserving Fake Reviews
July 06, 2020 22:48 - 28 minutes - 32.8 MBDavid Ifeoluwa Adelani joins us to discuss Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection.
Interpretability Practitioners
June 26, 2020 16:43 - 32 minutes - 36.8 MBSungsoo Ray Hong joins us to discuss the paper Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs.
Facial Recognition Auditing
June 19, 2020 18:34 - 47 minutes - 54.4 MBDeb Raji joins us to discuss her recent publication Saving Face: Investigating the Ethical Concerns of Facial Recognition Auditing.
Robust Fit to Nature
June 12, 2020 15:56 - 38 minutes - 43.8 MBUri Hasson joins us this week to discuss the paper Robust-fit to Nature: An Evolutionary Perspective on Biological (and Artificial) Neural Networks.
Black Boxes Are Not Required
June 05, 2020 19:59 - 32 minutes - 37.2 MBDeep neural networks are undeniably effective. They rely on such a high number of parameters, that they are appropriately described as “black boxes”. While black boxes lack desirably properties like interpretability and explainability, in some cases, their accuracy makes them incredibly useful. But does achiving “usefulness” require a black box? Can we be sure an equally valid but simpler solution does not exist? Cynthia Rudin helps us answer that question. We discuss her recent pape...
Robustness to Unforeseen Adversarial Attacks
May 30, 2020 15:29 - 21 minutes - 24.9 MBDaniel Kang joins us to discuss the paper Testing Robustness Against Unforeseen Adversaries.
Estimating the Size of Language Acquisition
May 22, 2020 21:36 - 25 minutes - 28.7 MBFrank Mollica joins us to discuss the paper Humans store about 1.5 megabytes of information during language acquisition
Interpretable AI in Healthcare
May 15, 2020 15:49 - 35 minutes - 41 MBJayaraman Thiagarajan joins us to discuss the recent paper Calibrating Healthcare AI: Towards Reliable and Interpretable Deep Predictive Models.
Understanding Neural Networks
May 08, 2020 17:07 - 34 minutes - 39.7 MBWhat does it mean to understand a neural network? That’s the question posted on this arXiv paper. Kyle speaks with Tim Lillicrap about this and several other big questions.
Self-Explaining AI
May 02, 2020 05:23 - 32 minutes - 36.7 MBDan Elton joins us to discuss self-explaining AI. What could be better than an interpretable model? How about a model wich explains itself in a conversational way, engaging in a back and forth with the user. We discuss the paper Self-explaining AI as an alternative to interpretable AI which presents a framework for self-explainging AI.
Plastic Bag Bans
April 24, 2020 15:45 - 34 minutes - 39.9 MBBecca Taylor joins us to discuss her work studying the impact of plastic bag bans as published in Bag Leakage: The Effect of Disposable Carryout Bag Regulations on Unregulated Bags from the Journal of Environmental Economics and Management. How does one measure the impact of these bans? Are they achieving their intended goals? Join us and find out!
Self Driving Cars and Pedestrians
April 18, 2020 17:58 - 30 minutes - 35.2 MBWe are joined by Arash Kalatian to discuss Decoding pedestrian and automated vehicle interactions using immersive virtual reality and interpretable deep learning.
Computer Vision is Not Perfect
April 10, 2020 17:53 - 26 minutes - 18.1 MBComputer Vision is not Perfect Julia Evans joins us help answer the question why do neural networks think a panda is a vulture. Kyle talks to Julia about her hands-on work fooling neural networks. Julia runs Wizard Zines which publishes works such as Your Linux Toolbox. You can find her on Twitter @b0rk
Uncertainty Representations
April 04, 2020 15:18 - 39 minutes - 45.5 MBJessica Hullman joins us to share her expertise on data visualization and communication of data in the media. We discuss Jessica’s work on visualizing uncertainty, interviewing visualization designers on why they don't visualize uncertainty, and modeling interactions with visualizations as Bayesian updates. Homepage: http://users.eecs.northwestern.edu/~jhullman/ Lab: MU Collective
AlphaGo, COVID-19 Contact Tracing and New Data Set
March 28, 2020 06:00 - 33 minutes - 38.5 MBAnnouncing Journal Club I am pleased to announce Data Skeptic is launching a new spin-off show called "Journal Club" with similar themes but a very different format to the Data Skeptic everyone is used to. In Journal Club, we will have a regular panel and occasional guest panelists to discuss interesting news items and one featured journal article every week in a roundtable discussion. Each week, I'll be joined by Lan Guo and George Kemp for a discussion of interesting data science relat...
Visualizing Uncertainty
March 20, 2020 15:00 - 32 minutes - 37.6 MBInterpretability Tooling
March 13, 2020 15:00 - 42 minutes - 48.8 MBPramit Choudhary joins us to talk about the methodologies and tools used to assist with model interpretability.
Shapley Values
March 06, 2020 20:29 - 20 minutes - 23 MBKyle and Linhda discuss how Shapley Values might be a good tool for determining what makes the cut for a home renovation.
Anchors as Explanations
February 28, 2020 14:46 - 37 minutes - 42.5 MBWe welcome back Marco Tulio Ribeiro to discuss research he has done since our original discussion on LIME. In particular, we ask the question Are Red Roses Red? and discuss how Anchors provide high precision model-agnostic explanations. Please take our listener survey.
Mathematical Models of Ecological Systems
February 22, 2020 00:10 - 36 minutes - 42 MBAdversarial Explanations
February 14, 2020 23:10 - 36 minutes - 42.2 MBWalt Woods joins us to discuss his paper Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness with co-authors Jack Chen and Christof Teuscher.
ObjectNet
February 07, 2020 16:00 - 38 minutes - 44.2 MBAndrei Barbu joins us to discuss ObjectNet - a new kind of vision dataset. In contrast to ImageNet, ObjectNet seeks to provide images that are more representative of the types of images an autonomous machine is likely to encounter in the real world. Collecting a dataset in this way required careful use of Mechanical Turk to get Turkers to provide a corpus of images that removes some of the bias found in ImageNet. http://0xab.com/
Visualization and Interpretability
January 31, 2020 16:00 - 35 minutes - 41 MBEnrico Bertini joins us to discuss how data visualization can be used to help make machine learning more interpretable and explainable. Find out more about Enrico at http://enrico.bertini.io/. More from Enrico with co-host Moritz Stefaner on the Data Stories podcast!
Interpretable One Shot Learning
January 26, 2020 05:00 - 30 minutes - 35.1 MBWe welcome Su Wang back to Data Skeptic to discuss the paper Distributional modeling on a diet: One-shot word learning from text only.
Fooling Computer Vision
January 22, 2020 18:38 - 25 minutes - 23.3 MBWiebe van Ranst joins us to talk about a project in which specially designed printed images can fool a computer vision system, preventing it from identifying a person. Their attack targets the popular YOLO2 pre-trained image recognition model, and thus, is likely to be widely applicable.
Algorithmic Fairness
January 14, 2020 02:31 - 42 minutes - 48.2 MBThis episode includes an interview with Aaron Roth author of The Ethical Algorithm.
Interpretability
January 07, 2020 08:33 - 32 minutes - 37.7 MBInterpretability Machine learning has shown a rapid expansion into every sector and industry. With increasing reliance on models and increasing stakes for the decisions of models, questions of how models actually work are becoming increasingly important to ask. Welcome to Data Skeptic Interpretability. In this episode, Kyle interviews Christoph Molnar about his book Interpretable Machine Learning. Thanks to our sponsor, the Gartner Data & Analytics Summit going on in Grapevine, TX on...
The Limits of NLP
December 24, 2019 01:18 - 29 minutes - 23.8 MBWe are joined by Colin Raffel to discuss the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer".
Jumpstart Your ML Project
December 15, 2019 17:25 - 20 minutes - 23.5 MBSeth Juarez joins us to discuss the toolbox of options available to a data scientist to jumpstart or extend their machine learning efforts.
Serverless NLP Model Training
December 10, 2019 02:13 - 29 minutes - 24 MBAlex Reeves joins us to discuss some of the challenges around building a serverless, scalable, generic machine learning pipeline. The is a technical deep dive on architecting solutions and a discussion of some of the design choices made.
Team Data Science Process
December 03, 2019 22:54 - 41 minutes - 37.9 MBBuck Woody joins Kyle to share experiences from the field and the application of the Team Data Science Process - a popular six-phase workflow for doing data science.
Ancient Text Restoration
December 01, 2019 06:25 - 41 minutes - 36.5 MBThea Sommerschield joins us this week to discuss the development of Pythia - a machine learning model trained to assist in the reconstruction of ancient language text.
ML Ops
November 27, 2019 08:18 - 36 minutes - 41.8 MBKyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.
Annotator Bias
November 23, 2019 21:46 - 25 minutes - 23.7 MBThe modern deep learning approaches to natural language processing are voracious in their demands for large corpora to train on. Folk wisdom estimates used to be around 100k documents were required for effective training. The availability of broadly trained, general-purpose models like BERT has made it possible to do transfer learning to achieve novel results on much smaller corpora. Thanks to these advancements, an NLP researcher might get value out of fewer examples since they can use ...
NLP for Developers
November 20, 2019 03:00 - 29 minutes - 26.6 MBWhile at MS Build 2019, Kyle sat down with Lance Olson from the Applied AI team about how tools like cognitive services and cognitive search enable non-data scientists to access relatively advanced NLP tools out of box, and how more advanced data scientists can focus more time on the bigger picture problems.
Indigenous American Language Research
November 13, 2019 09:40 - 22 minutes - 18.3 MBManuel Mager joins us to discuss natural language processing for low and under-resourced languages. We discuss current work in this area and the Naki Project which aggregates research on NLP for native and indigenous languages of the American continent.
Talking to GPT-2
October 31, 2019 19:45 - 29 minutes - 23.4 MBGPT-2 is yet another in a succession of models like ELMo and BERT which adopt a similar deep learning architecture and train an unsupervised model on a massive text corpus. As we have been covering recently, these approaches are showing tremendous promise, but how close are they to an AGI? Our guest today, Vazgen Davidyants wondered exactly that, and have conversations with a Chatbot running GPT-2. We discuss his experiences as well as some novel thoughts on artificial intelligence.
Reproducing Deep Learning Models
October 23, 2019 01:15 - 22 minutes - 20.8 MBRajiv Shah attempted to reproduce an earthquake-predicting deep learning model. His results exposed some issues with the model. Kyle and Rajiv discuss the original paper and Rajiv's analysis.
What BERT is Not
October 14, 2019 21:02 - 27 minutes - 24.7 MBAllyson Ettinger joins us to discuss her work in computational linguistics, specifically in exploring some of the ways in which the popular natural language processing approach BERT has limitations.
SpanBERT
October 08, 2019 08:27 - 24 minutes - 22.7 MBOmer Levy joins us to discuss "SpanBERT: Improving Pre-training by Representing and Predicting Spans". https://arxiv.org/abs/1907.10529