Data Skeptic
533 episodes - English - Latest episode: about 1 hour ago - ★★★★★ - 477 ratingsThe Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed
Episodes
ARiMA is not Sufficient
August 30, 2021 13:00 - 22 minutes - 25.8 MBChongshou Li, Associate Professor at Southwest Jiaotong University in China, joins us today to talk about his work Why are the ARIMA and SARIMA not Sufficient.
Comp Engine
August 23, 2021 13:00 - 36 minutes - 41.3 MBBen Fulcher, Senior Lecturer at the School of Physics at the University of Sydney in Australia, comes on today to talk about his project Comp Engine. Follow Ben on Twitter: @bendfulcher For posts about time series analysis : @comptimeseries comp-engine.org
Detecting Ransomware
August 16, 2021 13:00 - 31 minutes - 35.9 MBNitin Pundir, PhD candidate at University Florida and works at the Florida Institute for Cybersecurity Research, comes on today to talk about his work “RanStop: A Hardware-assisted Runtime Crypto-Ransomware Detection Technique.” FICS Research Lab - https://fics.institute.ufl.edu/ LinkedIn - https://www.linkedin.com/in/nitin-pundir470/
GANs in Finance
August 09, 2021 13:00 - 23 minutes - 26.5 MBFlorian Eckerli, a recent graduate of Zurich University of Applied Sciences, comes on the show today to discuss his work Generative Adversarial Networks in Finance: An Overview.
Predicting Urban Land Use
August 02, 2021 13:00 - 27 minutes - 31 MBToday on the show we have Daniel Omeiza, a doctoral student in the computer science department of the University of Oxford, who joins us to talk about his work Efficient Machine Learning for Large-Scale Urban Land-Use Forecasting in Sub-Saharan Africa.
Opportunities for Skillful Weather Prediction
July 26, 2021 13:00 - 34 minutes - 31.3 MBToday on the show we have Elizabeth Barnes, Associate Professor in the department of Atmospheric Science at Colorado State University, who joins us to talk about her work Identifying Opportunities for Skillful Weather Prediction with Interpretable Neural Networks. Find more from the Barnes Research Group on their site. Weather is notoriously difficult to predict. Complex systems are demanding of computational power. Further, the chaotic nature of, well, nature, makes accurate forecasting e...
Predicting Stock Prices
July 19, 2021 13:00 - 34 minutes - 39.2 MBToday on the show we have Andrea Fronzetti Colladon (@iandreafc), currently working at the University of Perugia and inventor of the Semantic Brand Score, joins us to talk about his work studying human communication and social interaction. We discuss the paper Look inside. Predicting Stock Prices by Analyzing an Enterprise Intranet Social Network and Using Word Co-Occurrence Networks.
N-Beats
July 12, 2021 15:04 - 34 minutes - 39.2 MBToday on the show we have Boris Oreshkin @boreshkin, a Senior Research Scientist at Unity Technologies, who joins us today to talk about his work N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. Works Mentioned: N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting By Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio https://arxiv.org/abs/1905.10437 Social Media Linkedin Twitter
Translation Automation
July 06, 2021 01:48 - 36 minutes - 33.1 MBToday we are back with another episode discussing AI in the work field. AI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Carl Stimson, a Freelance Japanese to English translator, comes on the show to talk about his work in translation and his perspective about how AI will change translation in the future.
Time Series at the Beach
June 28, 2021 13:00 - 23 minutes - 26.3 MBShane Ross, Professor of Aerospace and Ocean Engineering at Virginia Tech University, comes on today to talk about his work “Beach-level 24-hour forecasts of Florida red tide-induced respiratory irritation.”
Automatic Identification of Outlier Galaxy Images
June 21, 2021 19:11 - 36 minutes - 41.7 MBLior Shamir, Associate Professor of Computer Science at Kansas University, joins us today to talk about the recent paper Automatic Identification of Outliers in Hubble Space Telescope Galaxy Images. Follow Lio on Twitter @shamir_lior
Do We Need Deep Learning in Time Series
June 16, 2021 16:10 - 29 minutes - 33.5 MBShereen Elsayed and Daniela Thyssens, both are PhD Student at Hildesheim University in Germany, come on today to talk about the work “Do We Really Need Deep Learning Models for Time Series Forecasting?”
Detecting Drift
June 11, 2021 00:05 - 27 minutes - 31.3 MBSam Ackerman, Research Data Scientist at IBM Research Labs in Haifa, Israel, joins us today to talk about his work Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time. Check out Sam's IBM statistics/ML blog at: http://www.research.ibm.com/haifa/dept/vst/ML-QA.shtml
Darts Library for Time Series
May 31, 2021 14:52 - 25 minutes - 28.8 MBJulien Herzen, PhD graduate from EPFL in Switzerland, comes on today to talk about his work with Unit 8 and the development of the Python Library: Darts.
Forecasting Principles and Practice
May 24, 2021 14:57 - 31 minutes - 36.2 MBWelcome to Timeseries! Today’s episode is an interview with Rob Hyndman, Professor of Statistics at Monash University in Australia, and author of Forecasting: Principles and Practices.
Prequisites for Time Series
May 21, 2021 18:36 - 8 minutes - 14.6 MBToday's experimental episode uses sound to describe some basic ideas from time series. This episode includes lag, seasonality, trend, noise, heteroskedasticity, decomposition, smoothing, feature engineering, and deep learning.
Orders of Magnitude
May 07, 2021 18:55 - 33 minutes - 76 MBToday’s show in two parts. First, Linhda joins us to review the episodes from Data Skeptic: Pilot Season and give her feedback on each of the topics. Second, we introduce our new segment “Orders of Magnitude”. It’s a statistical game show in which participants must identify the true statistic hidden in a list of statistics which are off by at least an order of magnitude. Claudia and Vanessa join as our first contestants. Below are the sources of our questions. Heights https://en.wiki...
They're Coming for Our Jobs
May 03, 2021 16:00 - 43 minutes - 40 MBAI has, is, and will continue to facilitate the automation of work done by humans. Sometimes this may be an entire role. Other times it may automate a particular part of their role, scaling their effectiveness. Unless progress in AI inexplicably halts, the tasks done by humans vs. machines will continue to evolve. Today’s episode is a speculative conversation about what the future may hold. Co-Host of Squaring the Strange Podcast, Caricature Artist, and an Academic Editor, Celestia Ward jo...
Pandemic Machine Learning Pitfalls
April 26, 2021 07:00 - 40 minutes - 36.9 MBToday on the show Derek Driggs, a PhD Student at the University of Cambridge. He comes on to discuss the work Common Pitfalls and Recommendations for Using Machine Learning to Detect and Prognosticate for COVID-19 Using Chest Radiographs and CT Scans. Help us vote for the next theme of Data Skeptic! Vote here: https://dataskeptic.com/vote
Flesch Kincaid Readability Tests
April 19, 2021 07:50 - 20 minutes - 18.7 MBGiven a document in English, how can you estimate the ease with which someone will find they can read it? Does it require a college-level of reading comprehension or is it something a much younger student could read and understand? While these questions are useful to ask, they don't admit a simple answer. One option is to use one of the (essentially identical) two Flesch Kincaid Readability Tests. These are simple calculations which provide you with a rough estimate of the reading ease....
Fairness Aware Outlier Detection
April 09, 2021 15:30 - 39 minutes - 36.2 MBToday on the show we have Shubhranshu Shekar, a Ph. D Student at Carnegie Mellon University, who joins us to talk about his work, FAIROD: Fairness-aware Outlier Detection.
Life May be Rare
April 05, 2021 14:24 - 43 minutes - 39.6 MBToday on the show Dr. Anders Sandburg, Senior Research Fellow at the Future of Humanity Institute at Oxford University, comes on to share his work “The Timing of Evolutionary Transitions Suggest Intelligent Life is Rare.” Works Mentioned: Paper: “The Timing of Evolutionary Transitions Suggest Intelligent Life is Rare.”by Andrew E Snyder-Beattie, Anders Sandberg, K Eric Drexler, Michael B Bonsall Twitter: @anderssandburg
Social Networks
March 29, 2021 14:21 - 49 minutes - 45.6 MBMayank Kejriwal, Research Professor at the University of Southern California and Researcher at the Information Sciences Institute, joins us today to discuss his work and his new book Knowledge, Graphs, Fundamentals, Techniques and Applications by Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekley. Works Mentioned “Knowledge, Graphs, Fundamentals, Techniques and Applications”by Mayank Kejriwal, Craig A. Knoblock, and Pedro Szekley
The QAnon Conspiracy
March 22, 2021 14:39 - 43 minutes - 40.2 MBQAnon is a conspiracy theory born in the underbelly of the internet. While easy to disprove, these cryptic ideas captured the minds of many people and (in part) paved the way to the 2021 storming of the US Capital. This is a contemporary conspiracy which came into existence and grew in a very digital way. This makes it possible for researchers to study this phenomenon in a way not accessible in previous conspiracy theories of similar popularity. This episode is not so much a debunking ...
Benchmarking Vision on Edge vs Cloud
March 15, 2021 12:00 - 47 minutes - 54.8 MBKarthick Shankar, Masters Student at Carnegie Mellon University, and Somali Chaterji, Assistant Professor at Purdue University, join us today to discuss the paper "JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads" Works Mentioned: https://ieeexplore.ieee.org/abstract/document/9284314 “JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads.” by: Karthick Shanka...
Goodhart's Law in Reinforcement Learning
March 05, 2021 13:00 - 37 minutes - 42.5 MBHal Ashton, a PhD student from the University College of London, joins us today to discuss a recent work Causal Campbell-Goodhart’s law and Reinforcement Learning. "Only buy honey from a local producer." - Hal Ashton Works Mentioned: “Causal Campbell-Goodhart’s law and Reinforcement Learning”by Hal AshtonBook “The Book of Why”by Judea PearlPaper Thanks to our sponsor! When your business is ready to make that next hire, find the right person with LinkedIn Jobs. Just visit Li...
Video Anomaly Detection
March 01, 2021 14:00 - 24 minutes - 22 MBYuqi Ouyang, in his second year of PhD study at the University of Warwick in England, joins us today to discuss his work “Video Anomaly Detection by Estimating Likelihood of Representations.”Works Mentioned: Video Anomaly Detection by Estimating Likelihood of Representations https://arxiv.org/abs/2012.01468 by: Yuqi Ouyang, Victor Sanchez
Fault Tolerant Distributed Gradient Descent
February 22, 2021 14:30 - 36 minutes - 33 MBNirupam Gupta, a Computer Science Post Doctoral Researcher at EDFL University in Switzerland, joins us today to discuss his work “Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent.” Works Mentioned: https://arxiv.org/abs/2101.12316 Byzantine Fault-Tolerance in Peer-to-Peer Distributed Gradient-Descent by Nirupam Gupta and Nitin H. Vaidya Conference Details: https://georgetown.zoom.us/meeting/register/tJ0sc-2grDwjEtfnLI0zPnN-GwkDvJdaOxXF
Decentralized Information Gathering
February 15, 2021 13:30 - 32 minutes - 37.7 MBMikko Lauri, Post Doctoral researcher at the University of Hamburg, Germany, comes on the show today to discuss the work Information Gathering in Decentralized POMDPs by Policy Graph Improvements. Follow Mikko: @mikko_lauri Github https://laurimi.github.io/
Leaderless Consensus
February 05, 2021 17:47 - 27 minutes - 31.4 MBBalaji Arun, a PhD Student in the Systems of Software Research Group at Virginia Tech, joins us today to discuss his research of distributed systems through the paper “Taming the Contention in Consensus-based Distributed Systems.” Works Mentioned “Taming the Contention in Consensus-based Distributed Systems” by Balaji Arun, Sebastiano Peluso, Roberto Palmieri, Giuliano Losa, and Binoy Ravindran https://www.ssrg.ece.vt.edu/papers/tdsc20-author-version.pdf “Fast Paxos” by Lesl...
Automatic Summarization
January 29, 2021 16:00 - 27 minutes - 25.6 MBMaartje ter Hoeve, PhD Student at the University of Amsterdam, joins us today to discuss her research in automated summarization through the paper “What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization.” Works Mentioned “What Makes a Good Summary? Reconsidering the Focus of Automatic Summarization.” by Maartje der Hoeve, Juilia Kiseleva, and Maarten de Rijke Contact Email: [email protected] Twitter: https://twitter.com/maartjeterhoeve Website: https:...
Gerrymandering
January 22, 2021 16:00 - 34 minutes - 39.1 MBBrian Brubach, Assistant Professor in the Computer Science Department at Wellesley College, joins us today to discuss his work “Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives". WORKS MENTIONED: Meddling Metrics: the Effects of Measuring and Constraining Partisan Gerrymandering on Voter Incentives by Brian Brubach, Aravind Srinivasan, and Shawn Zhao
Even Cooperative Chess is Hard
January 15, 2021 18:02 - 23 minutes - 26.5 MBAside from victory questions like “can black force a checkmate on white in 5 moves?” many novel questions can be asked about a game of chess. Some questions are trivial (e.g. “How many pieces does white have?") while more computationally challenging questions can contribute interesting results in computational complexity theory. In this episode, Josh Brunner, Master's student in Theoretical Computer Science at MIT, joins us to discuss his recent paper Complexity of Retrograde and Helpmate ...
Consecutive Votes in Paxos
January 11, 2021 14:00 - 30 minutes - 34.5 MBEil Goldweber, a graduate student at the University of Michigan, comes on today to share his work in applying formal verification to systems and a modification to the Paxos protocol discussed in the paper Significance on Consecutive Ballots in Paxos. Works Mentioned : Previous Episode on Paxos https://dataskeptic.com/blog/episodes/2020/distributed-consensus Paper: On the Significance on Consecutive Ballots in Paxos by: Eli Goldweber, Nuda Zhang, and Manos Kapritsos Thanks to our sp...
Visual Illusions Deceiving Neural Networks
January 01, 2021 14:00 - 33 minutes - 30.9 MBToday on the show we have Adrian Martin, a Post-doctoral researcher from the University of Pompeu Fabra in Barcelona, Spain. He comes on the show today to discuss his research from the paper “Convolutional Neural Networks can be Deceived by Visual Illusions.” Works Mentioned in Paper: “Convolutional Neural Networks can be Decieved by Visual Illusions.” by Alexander Gomez-Villa, Adrian Martin, Javier Vazquez-Corral, and Marcelo Bertalmio Examples: Snake Illusions https://www.illusions...
Earthquake Detection with Crowd-sourced Data
December 25, 2020 16:21 - 29 minutes - 27 MBHave you ever wanted to hear what an earthquake sounds like? Today on the show we have Omkar Ranadive, Computer Science Masters student at NorthWestern University, who collaborates with Suzan van der Lee, an Earth and Planetary Sciences professor at Northwestern University, on the crowd-sourcing project Earthquake Detective. Email Links: Suzan: [email protected] Omkar: [email protected] Works Mentioned: Paper: Applying Machine Learning to Crowd-sourced D...
Byzantine Fault Tolerant Consensus
December 22, 2020 13:00 - 35 minutes - 40.7 MBByzantine fault tolerance (BFT) is a desirable property in a distributed computing environment. BFT means the system can survive the loss of nodes and nodes becoming unreliable. There are many different protocols for achieving BFT, though not all options can scale to large network sizes. Ted Yin joins us to explain BFT, survey the wide variety of protocols, and share details about HotStuff.
Alpha Fold
December 11, 2020 17:45 - 23 minutes - 21.3 MBKyle shared some initial reactions to the announcement about Alpha Fold 2's celebrated performance in the CASP14 prediction. By many accounts, this exciting result means protein folding is now a solved problem. Thanks to our sponsors! Brilliant is a great last-minute gift idea! Give access to 60 + interactive courses including Quantum Computing and Group Theory. There's something for everyone at Brilliant. They have award-winning courses, taught by teachers, researchers and professionals ...
Arrow's Impossibility Theorem
December 04, 2020 16:39 - 26 minutes - 30.1 MBAbove all, everyone wants voting to be fair. What does fair mean and how can we measure it? Kenneth Arrow posited a simple set of conditions that one would certainly desire in a voting system. For example, unanimity - if everyone picks candidate A, then A should win! Yet surprisingly, under a few basic assumptions, this theorem demonstrates that no voting system exists which can satisfy all the criteria. This episode is a discussion about the structure of the proof and some of its implic...
Face Mask Sentiment Analysis
November 27, 2020 18:56 - 41 minutes - 37.7 MBAs the COVID-19 pandemic continues, the public (or at least those with Twitter accounts) are sharing their personal opinions about mask-wearing via Twitter. What does this data tell us about public opinion? How does it vary by demographic? What, if anything, can make people change their minds? Today we speak to, Neil Yeung and Jonathan Lai, Undergraduate students in the Department of Computer Science at the University of Rochester, and Professor of Computer Science, Jiebo-Luoto to discuss ...
Counting Briberies in Elections
November 20, 2020 16:26 - 37 minutes - 43.4 MBNiclas Boehmer, second year PhD student at Berlin Institute of Technology, comes on today to discuss the computational complexity of bribery in elections through the paper “On the Robustness of Winners: Counting Briberies in Elections.” Links Mentioned: https://www.akt.tu-berlin.de/menue/team/boehmer_niclas/ Works Mentioned: “On the Robustness of Winners: Counting Briberies in Elections.” by Niclas Boehmer, Robert Bredereck, Piotr Faliszewski. Rolf Niedermier Thanks to our sponsors: ...
Sybil Attacks on Federated Learning
November 13, 2020 18:25 - 31 minutes - 36.1 MBClement Fung, a Societal Computing PhD student at Carnegie Mellon University, discusses his research in security of machine learning systems and a defense against targeted sybil-based poisoning called FoolsGold. Works Mentioned: The Limitations of Federated Learning in Sybil Settings Twitter: @clemfung Website: https://clementfung.github.io/ Thanks to our sponsors: Brilliant - Online learning platform. Check out Geometry Fundamentals! Visit Brilliant.org/dataskeptic for 20% off...
Differential Privacy at the US Census
November 06, 2020 16:13 - 29 minutes - 34 MBSimson Garfinkel, Senior Computer Scientist for Confidentiality and Data Access at the US Census Bureau, discusses his work modernizing the Census Bureau disclosure avoidance system from private to public disclosure avoidance techniques using differential privacy. Some of the discussion revolves around the topics in the paper Randomness Concerns When Deploying Differential Privacy. WORKS MENTIONED: “Calibrating Noise to Sensitivity in Private Data Analysis” by Cynthia Dwork, Frank McS...
Distributed Consensus
October 30, 2020 05:36 - 27 minutes - 31.7 MBComputer Science research fellow of Cambridge University, Heidi Howard discusses Paxos, Raft, and distributed consensus in distributed systems alongside with her work “Paxos vs. Raft: Have we reached consensus on distributed consensus?” She goes into detail about the leaders in Paxos and Raft and how The Raft Consensus Algorithm actually inspired her to pursue her PhD. Paxos vs Raft paper: https://arxiv.org/abs/2004.05074 Leslie Lamport paper “part-time Parliament” https://lamport.a...
ACID Compliance
October 23, 2020 13:00 - 23 minutes - 27.2 MBLinhda joins Kyle today to talk through A.C.I.D. Compliance (atomicity, consistency, isolation, and durability). The presence of these four components can ensure that a database’s transaction is completed in a timely manner. Kyle uses examples such as google sheets, bank transactions, and even the game rummy cube. Thanks to this week's sponsors: Monday.com - Their Apps Challenge is underway and available at monday.com/dataskeptic Brilliant - Check out their Quantum Computing Course, ...
National Popular Vote Interstate Compact
October 16, 2020 15:24 - 30 minutes - 35 MBPatrick Rosenstiel joins us to discuss the The National Popular Vote.
Defending the p-value
October 12, 2020 13:00 - 30 minutes - 34.4 MBYudi Pawitan joins us to discuss his paper Defending the P-value.
Retraction Watch
October 05, 2020 15:00 - 32 minutes - 36.7 MBIvan Oransky joins us to discuss his work documenting the scientific peer-review process at retractionwatch.com.
Crowdsourced Expertise
September 21, 2020 14:00 - 27 minutes - 31.8 MBDerek Lim joins us to discuss the paper Expertise and Dynamics within Crowdsourced Musical Knowledge Curation: A Case Study of the Genius Platform.
The Spread of Misinformation Online
September 14, 2020 14:00 - 35 minutes - 40.7 MBNeil Johnson joins us to discuss the paper The online competition between pro- and anti-vaccination views.