Data Skeptic artwork

Data Skeptic

533 episodes - English - Latest episode: about 1 hour ago - ★★★★★ - 477 ratings

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Science Technology machinelearning skepticism datamining datascience science statistics
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

ARiMA is not Sufficient

August 30, 2021 13:00 - 22 minutes - 25.8 MB

Chongshou Li, Associate Professor at Southwest Jiaotong University in China, joins us today to talk about his work Why are the ARIMA and SARIMA not Sufficient.

Comp Engine

August 23, 2021 13:00 - 36 minutes - 41.3 MB

Ben Fulcher, Senior Lecturer at the School of Physics at the University of Sydney in Australia, comes on today to talk about his project Comp Engine. Follow Ben on Twitter: @bendfulcher For posts about time series analysis : @comptimeseries comp-engine.org

Detecting Ransomware

August 16, 2021 13:00 - 31 minutes - 35.9 MB

Nitin Pundir, PhD candidate at University Florida and works at the Florida Institute for Cybersecurity Research, comes on today to talk about his work “RanStop: A Hardware-assisted Runtime Crypto-Ransomware Detection Technique.” FICS Research Lab - https://fics.institute.ufl.edu/  LinkedIn - https://www.linkedin.com/in/nitin-pundir470/

GANs in Finance

August 09, 2021 13:00 - 23 minutes - 26.5 MB

Florian Eckerli, a recent graduate of Zurich University of Applied Sciences, comes on the show today to discuss his work Generative Adversarial Networks in Finance: An Overview.

Predicting Urban Land Use

August 02, 2021 13:00 - 27 minutes - 31 MB

Today on the show we have Daniel Omeiza, a doctoral student in the computer science department of the University of Oxford, who joins us to talk about his work Efficient Machine Learning for Large-Scale Urban Land-Use Forecasting in Sub-Saharan Africa.

Opportunities for Skillful Weather Prediction

July 26, 2021 13:00 - 34 minutes - 31.3 MB

Today on the show we have Elizabeth Barnes, Associate Professor in the department of Atmospheric Science at Colorado State University, who joins us to talk about her work Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks. Find more from the Barnes Research Group on their site. Weather is notoriously difficult to predict. Complex systems are demanding of computational power. Further, the chaotic nature of, well, nature, makes accurate forecasting e...

Predicting Stock Prices

July 19, 2021 13:00 - 34 minutes - 39.2 MB

Today on the show we have Andrea Fronzetti Colladon (@iandreafc), currently working at the University of Perugia and inventor of the Semantic Brand Score, joins us to talk about his work studying human communication and social interaction. We discuss the paper Look inside. Predicting Stock Prices by Analyzing an Enterprise Intranet Social Network and Using Word Co-Occurrence Networks.

N-Beats

July 12, 2021 15:04 - 34 minutes - 39.2 MB

Today on the show we have Boris Oreshkin @boreshkin, a Senior Research Scientist at Unity Technologies, who joins us today to talk about his work N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. Works Mentioned: N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting By Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio https://arxiv.org/abs/1905.10437 Social Media Linkedin Twitter 

Translation Automation

July 06, 2021 01:48 - 36 minutes - 33.1 MB

Today we are back with another episode discussing AI in the work field. AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Carl Stimson, a Freelance Japanese to English translator, comes on the show to talk about his work in translation and his perspective about how AI will change translation in the future. 

Time Series at the Beach

June 28, 2021 13:00 - 23 minutes - 26.3 MB

Shane Ross, Professor of Aerospace and Ocean Engineering at Virginia Tech University, comes on today to talk about his work “Beach-level 24-hour forecasts of Florida red tide-induced respiratory irritation.”

Automatic Identification of Outlier Galaxy Images

June 21, 2021 19:11 - 36 minutes - 41.7 MB

Lior Shamir, Associate Professor of Computer Science at Kansas University, joins us today to talk about the recent paper Automatic Identification of Outliers in Hubble Space Telescope Galaxy Images. Follow Lio on Twitter @shamir_lior

Do We Need Deep Learning in Time Series

June 16, 2021 16:10 - 29 minutes - 33.5 MB

Shereen Elsayed and Daniela Thyssens, both are PhD Student at Hildesheim University in Germany, come on today to talk about the work “Do We Really Need Deep Learning Models for Time Series Forecasting?”

Detecting Drift

June 11, 2021 00:05 - 27 minutes - 31.3 MB

Sam Ackerman, Research Data Scientist at IBM Research Labs in Haifa, Israel, joins us today to talk about his work Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time. Check out Sam's IBM statistics/ML blog at: http://www.research.ibm.com/haifa/dept/vst/ML-QA.shtml  

Darts Library for Time Series

May 31, 2021 14:52 - 25 minutes - 28.8 MB

Julien Herzen, PhD graduate from EPFL in Switzerland, comes on today to talk about his work with Unit 8 and the development of the Python Library: Darts. 

Forecasting Principles and Practice

May 24, 2021 14:57 - 31 minutes - 36.2 MB

Welcome to Timeseries! Today’s episode is an interview with Rob Hyndman, Professor of Statistics at Monash University in Australia, and author of Forecasting: Principles and Practices.

Prequisites for Time Series

May 21, 2021 18:36 - 8 minutes - 14.6 MB

Today's experimental episode uses sound to describe some basic ideas from time series. This episode includes lag, seasonality, trend, noise, heteroskedasticity, decomposition, smoothing, feature engineering, and deep learning.  

Orders of Magnitude

May 07, 2021 18:55 - 33 minutes - 76 MB

Today’s show in two parts. First, Linhda joins us to review the episodes from Data Skeptic: Pilot Season and give her feedback on each of the topics. Second, we introduce our new segment “Orders of Magnitude”. It’s a statistical game show in which participants must identify the true statistic hidden in a list of statistics which are off by at least an order of magnitude. Claudia and Vanessa join as our first contestants.  Below are the sources of our questions. Heights https://en.wiki...

They're Coming for Our Jobs

May 03, 2021 16:00 - 43 minutes - 40 MB

AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Unless progress in AI inexplicably halts, the tasks done by humans vs. machines will continue to evolve. Today’s episode is a speculative conversation about what the future may hold. Co-Host of Squaring the Strange Podcast, Caricature Artist, and an Academic Editor, Celestia Ward jo...

Pandemic Machine Learning Pitfalls

April 26, 2021 07:00 - 40 minutes - 36.9 MB

Today on the show Derek Driggs, a PhD Student at the University of Cambridge. He comes on to discuss the work Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. Help us vote for the next theme of Data Skeptic! Vote here: https://dataskeptic.com/vote

Flesch Kincaid Readability Tests

April 19, 2021 07:50 - 20 minutes - 18.7 MB

Given a document in English, how can you estimate the ease with which someone will find they can read it?  Does it require a college-level of reading comprehension or is it something a much younger student could read and understand? While these questions are useful to ask, they don't admit a simple answer.  One option is to use one of the (essentially identical) two Flesch Kincaid Readability Tests.  These are simple calculations which provide you with a rough estimate of the reading ease....

Fairness Aware Outlier Detection

April 09, 2021 15:30 - 39 minutes - 36.2 MB

Today on the show we have Shubhranshu Shekar, a Ph. D Student at Carnegie Mellon University, who joins us to talk about his work, FAIROD: Fairness-aware Outlier Detection.

Life May be Rare

April 05, 2021 14:24 - 43 minutes - 39.6 MB

Today on the show Dr. Anders Sandburg, Senior Research Fellow at the Future of Humanity Institute at Oxford University, comes on to share his work “The Timing of Evolutionary Transitions Suggest Intelligent Life is Rare.” Works Mentioned: Paper: “The Timing of Evolutionary Transitions Suggest Intelligent Life is Rare.”by Andrew E Snyder-Beattie, Anders Sandberg, K Eric Drexler, Michael B Bonsall  Twitter: @anderssandburg

Social Networks

March 29, 2021 14:21 - 49 minutes - 45.6 MB

Mayank Kejriwal, Research Professor at the University of Southern California and Researcher at the Information Sciences Institute, joins us today to discuss his work and his new book Knowledge, Graphs, Fundamentals, Techniques and Applications by Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekley. Works Mentioned “Knowledge, Graphs, Fundamentals, Techniques and Applications”by Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekley

The QAnon Conspiracy

March 22, 2021 14:39 - 43 minutes - 40.2 MB

QAnon is a conspiracy theory born in the underbelly of the internet.  While easy to disprove, these cryptic ideas captured the minds of many people and (in part) paved the way to the 2021 storming of the US Capital. This is a contemporary conspiracy which came into existence and grew in a very digital way.  This makes it possible for researchers to study this phenomenon in a way not accessible in previous conspiracy theories of similar popularity. This episode is not so much a debunking ...

Benchmarking Vision on Edge vs Cloud

March 15, 2021 12:00 - 47 minutes - 54.8 MB

Karthick Shankar, Masters Student at Carnegie Mellon University, and Somali Chaterji, Assistant Professor at Purdue University, join us today to discuss the paper "JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads" Works Mentioned: https://ieeexplore.ieee.org/abstract/document/9284314 “JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads.” by: Karthick Shanka...

Goodhart's Law in Reinforcement Learning

March 05, 2021 13:00 - 37 minutes - 42.5 MB

Hal Ashton, a PhD student from the University College of London, joins us today to discuss a recent work Causal Campbell-Goodhart’s law and Reinforcement Learning. "Only buy honey from a local producer." - Hal Ashton   Works Mentioned: “Causal Campbell-Goodhart’s law and Reinforcement Learning”by Hal AshtonBook  “The Book of Why”by Judea PearlPaper Thanks to our sponsor!  When your business is ready to make that next hire, find the right person with LinkedIn Jobs. Just visit Li...

Video Anomaly Detection

March 01, 2021 14:00 - 24 minutes - 22 MB

Yuqi Ouyang, in his second year of PhD study at the University of Warwick in England, joins us today to discuss his work “Video Anomaly Detection by Estimating Likelihood of Representations.”Works Mentioned: Video Anomaly Detection by Estimating Likelihood of Representations https://arxiv.org/abs/2012.01468 by: Yuqi Ouyang, Victor Sanchez

Fault Tolerant Distributed Gradient Descent

February 22, 2021 14:30 - 36 minutes - 33 MB

Nirupam Gupta, a Computer Science Post Doctoral Researcher at EDFL University in Switzerland, joins us today to discuss his work “Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent.”   Works Mentioned:  https://arxiv.org/abs/2101.12316 Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent by Nirupam Gupta and Nitin H. Vaidya   Conference Details: https://georgetown.zoom.us/meeting/register/tJ0sc-2grDwjEtfnLI0zPnN-GwkDvJdaOxXF

Decentralized Information Gathering

February 15, 2021 13:30 - 32 minutes - 37.7 MB

Mikko Lauri, Post Doctoral researcher at the University of Hamburg, Germany, comes on the show today to discuss the work Information Gathering in Decentralized POMDPs by Policy Graph Improvements. Follow Mikko: @mikko_lauri Github https://laurimi.github.io/

Leaderless Consensus

February 05, 2021 17:47 - 27 minutes - 31.4 MB

Balaji Arun, a PhD Student in the Systems of Software Research Group at Virginia Tech, joins us today to discuss his research of distributed systems through the paper “Taming the Contention in Consensus-based Distributed Systems.”  Works Mentioned “Taming the Contention in Consensus-based Distributed Systems”  by Balaji Arun, Sebastiano Peluso, Roberto Palmieri, Giuliano Losa, and Binoy Ravindran https://www.ssrg.ece.vt.edu/papers/tdsc20-author-version.pdf “Fast Paxos” by Lesl...

Automatic Summarization

January 29, 2021 16:00 - 27 minutes - 25.6 MB

Maartje ter Hoeve, PhD Student at the University of Amsterdam, joins us today to discuss her research in automated summarization through the paper “What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization.”  Works Mentioned  “What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization.” by Maartje der Hoeve, Juilia Kiseleva, and Maarten de Rijke Contact Email: [email protected] Twitter: https://twitter.com/maartjeterhoeve Website: https:...

Gerrymandering

January 22, 2021 16:00 - 34 minutes - 39.1 MB

Brian Brubach, Assistant Professor in the Computer Science Department at Wellesley College, joins us today to discuss his work “Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives". WORKS MENTIONED: Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives by Brian Brubach, Aravind Srinivasan, and Shawn Zhao

Even Cooperative Chess is Hard

January 15, 2021 18:02 - 23 minutes - 26.5 MB

Aside from victory questions like “can black force a checkmate on white in 5 moves?” many novel questions can be asked about a game of chess. Some questions are trivial (e.g. “How many pieces does white have?") while more computationally challenging questions can contribute interesting results in computational complexity theory. In this episode, Josh Brunner, Master's student in Theoretical Computer Science at MIT, joins us to discuss his recent paper Complexity of Retrograde and Helpmate ...

Consecutive Votes in Paxos

January 11, 2021 14:00 - 30 minutes - 34.5 MB

Eil Goldweber, a graduate student at the University of Michigan, comes on today to share his work in applying formal verification to systems and a modification to the Paxos protocol discussed in the paper Significance on Consecutive Ballots in Paxos. Works Mentioned : Previous Episode on Paxos  https://dataskeptic.com/blog/episodes/2020/distributed-consensus Paper: On the Significance on Consecutive Ballots in Paxos by: Eli Goldweber, Nuda Zhang, and Manos Kapritsos Thanks to our sp...

Visual Illusions Deceiving Neural Networks

January 01, 2021 14:00 - 33 minutes - 30.9 MB

Today on the show we have Adrian Martin, a Post-doctoral researcher from the University of Pompeu Fabra in Barcelona, Spain. He comes on the show today to discuss his research from the paper “Convolutional Neural Networks can be Deceived by Visual Illusions.” Works Mentioned in Paper: “Convolutional Neural Networks can be Decieved by Visual Illusions.” by Alexander Gomez-Villa, Adrian Martin, Javier Vazquez-Corral, and Marcelo Bertalmio Examples: Snake Illusions https://www.illusions...

Earthquake Detection with Crowd-sourced Data

December 25, 2020 16:21 - 29 minutes - 27 MB

Have you ever wanted to hear what an earthquake sounds like? Today on the show we have Omkar Ranadive, Computer Science Masters student at NorthWestern University, who collaborates with Suzan van der Lee, an Earth and Planetary Sciences professor at Northwestern University, on the crowd-sourcing project Earthquake Detective.  Email Links: Suzan: [email protected]  Omkar: [email protected] Works Mentioned:  Paper: Applying Machine Learning to Crowd-sourced D...

Byzantine Fault Tolerant Consensus

December 22, 2020 13:00 - 35 minutes - 40.7 MB

Byzantine fault tolerance (BFT) is a desirable property in a distributed computing environment. BFT means the system can survive the loss of nodes and nodes becoming unreliable. There are many different protocols for achieving BFT, though not all options can scale to large network sizes. Ted Yin joins us to explain BFT, survey the wide variety of protocols, and share details about HotStuff.

Alpha Fold

December 11, 2020 17:45 - 23 minutes - 21.3 MB

Kyle shared some initial reactions to the announcement about Alpha Fold 2's celebrated performance in the CASP14 prediction.  By many accounts, this exciting result means protein folding is now a solved problem. Thanks to our sponsors! Brilliant is a great last-minute gift idea! Give access to 60 + interactive courses including Quantum Computing and Group Theory. There's something for everyone at Brilliant. They have award-winning courses, taught by teachers, researchers and professionals ...

Arrow's Impossibility Theorem

December 04, 2020 16:39 - 26 minutes - 30.1 MB

Above all, everyone wants voting to be fair. What does fair mean and how can we measure it? Kenneth Arrow posited a simple set of conditions that one would certainly desire in a voting system. For example, unanimity - if everyone picks candidate A, then A should win! Yet surprisingly, under a few basic assumptions, this theorem demonstrates that no voting system exists which can satisfy all the criteria. This episode is a discussion about the structure of the proof and some of its implic...

Face Mask Sentiment Analysis

November 27, 2020 18:56 - 41 minutes - 37.7 MB

As the COVID-19 pandemic continues, the public (or at least those with Twitter accounts) are sharing their personal opinions about mask-wearing via Twitter. What does this data tell us about public opinion? How does it vary by demographic? What, if anything, can make people change their minds? Today we speak to, Neil Yeung and Jonathan Lai, Undergraduate students in the Department of Computer Science at the University of Rochester, and Professor of Computer Science, Jiebo-Luoto to discuss ...

Counting Briberies in Elections

November 20, 2020 16:26 - 37 minutes - 43.4 MB

Niclas Boehmer, second year PhD student at Berlin Institute of Technology, comes on today to discuss the computational complexity of bribery in elections through the paper “On the Robustness of Winners: Counting Briberies in Elections.” Links Mentioned: https://www.akt.tu-berlin.de/menue/team/boehmer_niclas/ Works Mentioned: “On the Robustness of Winners: Counting Briberies in Elections.” by Niclas Boehmer, Robert Bredereck, Piotr Faliszewski. Rolf Niedermier Thanks to our sponsors: ...

Sybil Attacks on Federated Learning

November 13, 2020 18:25 - 31 minutes - 36.1 MB

Clement Fung, a Societal Computing PhD student at Carnegie Mellon University, discusses his research in security of machine learning systems and a defense against targeted sybil-based poisoning called FoolsGold. Works Mentioned: The Limitations of Federated Learning in Sybil Settings Twitter: @clemfung Website: https://clementfung.github.io/ Thanks to our sponsors: Brilliant - Online learning platform. Check out Geometry Fundamentals! Visit Brilliant.org/dataskeptic for 20% off...

Differential Privacy at the US Census

November 06, 2020 16:13 - 29 minutes - 34 MB

Simson Garfinkel, Senior Computer Scientist for Confidentiality and Data Access at the US Census Bureau, discusses his work modernizing the Census Bureau disclosure avoidance system from private to public disclosure avoidance techniques using differential privacy. Some of the discussion revolves around the topics in the paper Randomness Concerns When Deploying Differential Privacy.   WORKS MENTIONED: “Calibrating Noise to Sensitivity in Private Data Analysis” by Cynthia Dwork, Frank McS...

Distributed Consensus

October 30, 2020 05:36 - 27 minutes - 31.7 MB

Computer Science research fellow of Cambridge University, Heidi Howard discusses Paxos, Raft, and distributed consensus in distributed systems alongside with her work “Paxos vs. Raft: Have we reached consensus on distributed consensus?” She goes into detail about the leaders in Paxos and Raft and how The Raft Consensus Algorithm actually inspired her to pursue her PhD. Paxos vs Raft paper: https://arxiv.org/abs/2004.05074 Leslie Lamport paper “part-time Parliament” https://lamport.a...

ACID Compliance

October 23, 2020 13:00 - 23 minutes - 27.2 MB

Linhda joins Kyle today to talk through A.C.I.D. Compliance (atomicity, consistency, isolation, and durability). The presence of these four components can ensure that a database’s transaction is completed in a timely manner. Kyle uses examples such as google sheets, bank transactions, and even the game rummy cube.   Thanks to this week's sponsors: Monday.com - Their Apps Challenge is underway and available at monday.com/dataskeptic Brilliant - Check out their Quantum Computing Course, ...

National Popular Vote Interstate Compact

October 16, 2020 15:24 - 30 minutes - 35 MB

Patrick Rosenstiel joins us to discuss the The National Popular Vote.

Defending the p-value

October 12, 2020 13:00 - 30 minutes - 34.4 MB

Yudi Pawitan joins us to discuss his paper Defending the P-value.

Retraction Watch

October 05, 2020 15:00 - 32 minutes - 36.7 MB

Ivan Oransky joins us to discuss his work documenting the scientific peer-review process at retractionwatch.com.  

Crowdsourced Expertise

September 21, 2020 14:00 - 27 minutes - 31.8 MB

Derek Lim joins us to discuss the paper Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform.  

The Spread of Misinformation Online

September 14, 2020 14:00 - 35 minutes - 40.7 MB

Neil Johnson joins us to discuss the paper The online competition between pro- and anti-vaccination views.

Twitter Mentions

@sami_r_yousif 2 Episodes
@leerosevere 2 Episodes
@halfak 1 Episode
@boreshkin 1 Episode
@tomlevenson 1 Episode
@mark_azurecat 1 Episode
@randal_olson 1 Episode
@karthick_sh 1 Episode
@andersdrachen 1 Episode
@iamzareenf 1 Episode
@rajiinio 1 Episode
@chengtao_chu 1 Episode
@antoine77340 1 Episode
@samuelmehr 1 Episode
@rajcs4 1 Episode
@anderssandberg 1 Episode
@celestiaward 1 Episode
@akalatian 1 Episode
@niftyc 1 Episode
@maverickpramit 1 Episode