Data Skeptic artwork

Data Skeptic

532 episodes - English - Latest episode: 5 days ago - ★★★★★ - 477 ratings

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Science Technology machinelearning skepticism datamining datascience science statistics
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Measuring Trust in Robots with Likert Scales

April 03, 2023 18:05 - 47 minutes - 54.5 MB

We are joined by two guests today, Mariah, a Ph.D. student in the CORE Robotics Lab at Georgia Tech, and Matthew Gombolay, the Director of the CORE Robotics Lab. They both discuss practices for measuring a respondent’s perception in a survey.

CAREER Prediction

March 27, 2023 15:36 - 40 minutes - 35.3 MB

Ever wondered what your next career would be? Today, Keyon Vafa, a computer science Ph.D. student at Columbia University, joins us to discuss his latest research on developing a machine-learning model for career prediction. Keyon extensively spoke about how the model was developed and the possibilities it brings.

The Panel Study of Income Dynamics

March 21, 2023 06:13 - 34 minutes - 31.2 MB

Noura Insolera, a Research Investigator with the Panel Study of Income Dynamics (PSID), joins us to share how PSID conducts longitudinal household surveys. She also shared some interesting findings from their data exploration, particularly on the observation and trends in food insecurity.

Survey Design Working Session

March 14, 2023 18:10 - 1 hour - 56.5 MB

Susan Gerbic joins Kyle to review some of the surveys Data Skeptic has launch, draft a new survey about podcast listening habits, and then review the results of that survey. You can see those results at the link below. https://survey.dataskeptic.com/survey/result/1675102237053 Watch the videos Susan mentioned on her Youtube page at the link below. https://www.youtube.com/playlist?list=PL7VAuaQDhPTVaLeI1IcpYph5lH19xA1u4

Bot Detection and Dyadic Surveys

March 06, 2023 18:08 - 35 minutes - 40.5 MB

The use of social bots to fill out online surveys is becoming prevalent. Today, we speak with Sara Bybee, a postdoctoral research scholar at the University of Utah. Sara shares from her research, how she detected social bots, the strategies to curb them, and how underrepresented groups can be more represented in surveys.

Reproducible ESP Testing

February 20, 2023 14:00 - 47 minutes - 54.4 MB

Our guest today is Zoltán Kekecs, a Ph.D. holder in Behavioural Science. Zoltán highlights the problem of low replicability in journal papers and illustrates how researchers can better ensure complete replication of their research and findings. He used Bem’s experiment as an example, extensively talking about his methodology and results.

A Survey of Data Science Methodologies

February 13, 2023 17:43 - 24 minutes - 29.4 MB

On the show, Iñigo Martinez, a Ph.D. student at the University of Navarra shares his survey results which investigated how data practitioners perform data science projects. He revealed the methodologies typically used by data practitioners and the success factors in data science projects.

Opinion Dynamics Models

February 06, 2023 20:17 - 35 minutes - 40.9 MB

On the show today, Dino Carpentras, a post-doctoral researcher at the Computational Social Science group at ETH Zürich joins us to discuss how opinion dynamics models are built and validated. He explained how quantifying opinions is complex, and strategies to develop robust models for measuring and predicting public opinions.

Casual Affective Triggers

January 30, 2023 19:58 - 35 minutes - 41.2 MB

Crafting survey questions is one thing but getting your audience to fill it is yet another. On the show today, we speak with Alexander Nolte, an Associate Professor at the University of Tartu. Alexander discussed the use of Casual Affective Triggers (CAT) to incentivize people to accept survey invitations and improve the completion rate. He revealed the impact of CATs on survey response rates from a study he conducted.

Conversational Surveys

January 23, 2023 14:00 - 39 minutes - 45.9 MB

Traditional surveys have straight-jacket questions to be answered, thus restricting the information that can be gotten. Today, Ziang Xiao, a Postdoc Researcher in the FATE group at Microsoft Research Montréal, talks about conversational surveys, a type of survey that asks questions based on preceding answers. He discussed the benefits of conversational surveys and some of the challenges it poses.

Do Results Generalize for Privacy and Security Surveys

January 17, 2023 20:39 - 40 minutes - 27.9 MB

Today, Jenny Tang, a Ph.D. student of societal computing at Carnegie Mellon University discusses her work on the generalization of privacy and security surveys on platforms such as Amazon MTurk and Prolific. Jenny shared the drawbacks of using such online platforms, the discrepancies observed about the samples drawn, and key insights from her results.

4 out of 5 Data Scientists Agree

January 10, 2023 16:46 - 28 minutes - 33 MB

This episode kicks off the new season of the show, Data Skeptic: Surveys.  Linhda rejoins the show for a conversation with Kyle about her experience taking surveys and what questions she has for the season.  Lastly, Kyle announces the launch of survey.dataskeptic.com, a new site we're launching to gather your opinions.  Please take a moment and share your thoughts!

Crowdfunded Board Games

December 26, 2022 14:00 - 34 minutes - 39.5 MB

It may be intuitive to think crowdfunding a project drives its innovation and novelty, but there are no empirical studies that prove this. On the show, Johannes Wachs shares his research that sought to determine whether crowdfunding truly drives innovation. He used board games as a case study and shared the results he found.

Russian Election Interference Effectiveness

December 19, 2022 14:05 - 41 minutes - 48.1 MB

There were reports of Russia’s interference in the 2016 US elections. In today’s episode, Koustuv Saha, a researcher at Microsoft Research walks us through the effect of targeted ads for political campaigns. Using practical examples, he discusses how targeted ads can propagate fake news, its ripple effects on electioneering, and how to find a sweet spot with targeted ads.

Placement Laundering Fraud

December 15, 2022 17:38 - 32 minutes - 37.7 MB

There is an unsung kind of ad fraud brewing in the ad tech space — placement laundering fraud. On the show, Jeff Kline discusses what placement laundering fraud is, how it can be identified, and possible solutions to it. Listen to learn more.

Data Clean Rooms

December 12, 2022 17:54 - 31 minutes - 36.4 MB

Bosko Milekic, the Co-founder of Optable, a data collaboration platform for the media and advertising industry, joins us today. Bosko talked about the clean rooms, the technology driving data privacy during collaboration. He discussed why clean rooms are gaining widespread adoption, and how users can exploit Optable’s clean room platform for a secured data-sharing experience.

Dark Patterns in Site Design

December 05, 2022 17:04 - 34 minutes - 39.9 MB

Kerstin Bongard-Blanchy is a Research Associate at the University of Luxembourg. She joins us to discuss her study that investigated dark patterns in web designs. She discussed the results, the effect of dark patterns effect on users, whether an average user can detect them, and the way forward to a more ethical web space.

Internet Advertising Bureau Media Lab

December 03, 2022 17:19 - 37 minutes - 35.8 MB

We are joined by Anthony Katsur, the CEO of IAB Tech Lab. Anthony discusses standards within the ad tech industry. He explained how IAB Tech Lab set and propagates global standards, actions to ensure compliance from advertisers, and industry trends for a more privacy-centric ad tech space.

Your Mouse Reveals Your Gender and Age

November 28, 2022 14:00 - 39 minutes - 45.7 MB

When we navigate a webpage, it is fairly easy for our mouse movement to be tracked and collected. Today, Luis Leiva, a Professor of Computer Science discusses how these mouse tracking data can be used to predict age, gender and user attention. He also discusses the privacy concerns with mouse tracking data and possible ways it can be curtailed.

Measuring Web Search Behavior

November 21, 2022 15:56 - 36 minutes - 33.1 MB

On the show, Aleksandra Urman and Mykola Makhortykh join us to discuss their work on the comparative analysis of web search behavior using web tracking data. They shared interesting results from their analysis, bordering around the user preferences for search engines, demographic patterns, and differences between how men and women surf the net.

StrategyQA and Big Bench

November 18, 2022 03:39 - 41 minutes - 48.2 MB

Did Aristotle Use a Laptop?  That's a question from the StrategyQA benchmark which highlights the stretch goals for current artificial intelligence systems.  Answering a question like that requires several cognitive steps and reasoning.  Constructing a dataset of similarly challenging questions is a major undertaking.  On today's episode, Mor Geva returns to share details about the creation of StrategyQA and the larger Big Bench dataset it has been included in.

Ad Blockers Effect on News Consumption

November 14, 2022 16:17 - 38 minutes - 30.1 MB

While at first glance, the use of ad blockers drops the revenue of news publishers, this may not be completely true. On the show today, Shunyao Yan, an Assistant Professor in Marketing at Leavey School of Business, Santa Clara University, discussed the effect of ad blockers on news consumption and how ad blockers can potentially be helpful for news publishers.

Your Consent is Worth 75 Euros a Year

November 07, 2022 15:00 - 24 minutes - 27.5 MB

People who do not want their data tracked and shared online can pay a token for a cookie paywall. But are the websites keeping to their side of the bargain? Victor Morel, a Postdoc candidate at the Chalmers University of Technology joins us to discuss his work around auditing the activities of cookie paywalls. He discussed the findings from his analysis and proffers some solutions to making cookie paywalls more transparent.

Automated Email Generation for Targeted Attacks

October 31, 2022 15:52 - 45 minutes - 51.6 MB

The advancement of generative language models has been a force for good, but also for evil. On the show, Avisha Das, a post-doctoral scholar at the University of Texas Health Center, joins us to discuss how attackers use machine learning to create unsuspecting phishing emails. She also discussed how she used RNN for automated email generation, with the goal of defeating statistical detectors. 

Tribal Marketing

October 24, 2022 15:33 - 37 minutes - 32.3 MB

Peter Gloor, a Research Scientist at the MIT Center for Collective Intelligence, takes us on a new world of tribe classification. He extensively discussed the need for such classification on the internet and how he built a machine learning model that does it. Listen to find out more!

Nano-targetted Facebook Ads

October 17, 2022 12:55 - 44 minutes - 39.8 MB

Debiasing GPT-3 Job Ads

October 10, 2022 13:00 - 48 minutes - 43.6 MB

We hear about the impeccable achievements of GPT-3 models, but such large generative models come with their bias. On the show today, Conrad Borchers, a Ph.D. student in Human-Computer Interaction, joins us to discuss the bias in GPT-3 for job ads and how such large models can be de-biased. Listen to learn more!

ML Ops in Production

October 06, 2022 21:03 - 41 minutes - 37.2 MB

Moses Guttman from Clear ML joins us to share insights about how organizations leveraging machine learning keep their programs on track.  While many parallels exist between the software development life cycle (SWLC) and the machine learning development life cycle, successful deployments of ML in production have demonstrated that a unique set of tools is required.  Moses and I discuss the emergence of ML Ops, success stories, and how modern teams leverage tools like Clear ML's open source sol...

Ad Network Tomography

October 03, 2022 13:00 - 35 minutes - 28.3 MB

Data sharing in the ad tech space has largely been a black box system. While it is obvious the data is being collected, the data sharing process is obscure to users. On the show today, Maaz Bin Musa and Rishab, both researchers at the University of Iowa, speak about the importance of data transparency and their tool, ATOM for data transparency. Listen to find out how ATOM uncovers data-sharing relationships in the ad-tech space.

First Party Tracking Cookies

September 26, 2022 13:00 - 35 minutes - 31.8 MB

When you accept cookies on a website, you cannot tell whether the cookies are used for tracking your personal data or not. Shaoor Munir’s machine learning model does that. On the show today, the Ph.D student at the University of California, discussed the world of first-party cookies and how he developed a machine learning model that predicts whether a first-party cookie is used for tracking purposes.

The Harms of Targeted Weight Loss Ads

September 19, 2022 13:00 - 34 minutes - 40.9 MB

Liza Gak, a Ph.D. student at UC Berkeley, joins us to discuss her research on harmful weight loss advertising. She discussed how weight loss ads are not fact-checked, and how they typically target the most vulnerable. She extensively discussed her interview process, data analysis, and results. Listen for more!

Podcast Advertising

September 12, 2022 13:00 - 35 minutes - 35.6 MB

Growing your podcast to the point of monetization is not a walk in the park. Today, Rob Walch, the VP of Podcast Relations at Libsyn talks about podcast advertising. He discussed how advertising works, how to grow your audience and some blueprints to being a successful podcaster. Listen for more.

Fairness in e-Commerce Search

September 05, 2022 14:41 - 40 minutes - 35.7 MB

When we search for products in e-commerce stores, we do not care what goes on under the hood to generate the results. However, there may be an intentional algorithmic effort to gravitate us toward a particular product. On the show, today, Abhisek Dash and Saptarshi Ghosh discuss their research on fairness in the search result of Amazon smart speakers.

Fraudulent Amazon Reviewers

August 29, 2022 13:00 - 41 minutes - 37.8 MB

Chances are that you have bought a product online majorly because of the reviews you saw. Unfortunately, not all reviews are genuine. Today, Rajvardhan Oak shares some insight from his research on fraudulent Amazon reviews. He explained the inner workings of fraudulent reviews and revealed key insights from his qualitative and quantitative study.

Ad Targeting in Amazon Smart Speakers

August 22, 2022 13:14 - 32 minutes - 29.7 MB

While we give attention to textual data on the web, many do not know the unique power of echo interactions with smart devices for ad targeting. Today, our guest, Umar Iqbal joins us to discuss his study on using Amazon Smart Speakers for ad targeting. He gave interesting revelations about how voice data is captured and analysed for ad purposes. Listen to find out more.

Adwords with Unknown Budgets

August 15, 2022 13:00 - 34 minutes - 31.3 MB

Rajan Udwani, an Assistant Professor at the University of California Berkeley joins us to discuss his work on AdWords with unknown budgets. He discussed the previous approaches to ad allocation, as well as his maiden approach that introduced randomization for better results. Listen for more.

ML Ops Best Practices

August 12, 2022 12:00 - 30 minutes - 34.5 MB

Today, we are joined by Piotr Niedźwiedź, Founder and CEO of Neptune.ai. Piotr discusses common MLOps activities by data science teams and how they can take advantage of Neptune.ai for better experiment tracking and efficiency. Listen for more!

Affiliate Marketing Rabbithole

August 08, 2022 12:46 - 52 minutes - 47.9 MB

Affiliate marketing creates an opportunity for marketers to gain a commission by promoting a product or service.  Cookies are typically used for tracking and the advertiser whose product or service is being featured pays the marketing only on transactions. Today's episode covers those approaches and is also a story of conflict between two large companies and how one affiliate marketer got caught in the middle.

Monetization of Youtube Conspiracy Theorists

August 01, 2022 13:00 - 54 minutes - 49.5 MB

Cameron Ballard joins us today to discuss his work around YouTube conspiracy theories. He revealed interesting observations about conspiracy theories on YouTube including how predatory ads are most common in conspiracy theory videos and how YouTube’s algorithm subtly works for predatory ads. 

User Perceptions of Problematic Ads

July 25, 2022 13:00 - 37 minutes - 34.9 MB

Eric Zeng joins us to discuss his study around understanding bad ads and efforts that can be taken to limit bad ads online. He discussed how he and his co authors scrapped a large amount of ad data, applied a machine learning algorithm, and commensurate statistical results.

Political Digital Advertising Analysis

July 21, 2022 18:15 - 35 minutes - 30.6 MB

NaLette Brodnax, a political scientist and an Assistant Professor in the McCourt School of Public Policy at Georgetown University joins us to discuss her work on analyzing digital advertisements for political campaigns. She used data for electoral campaigns on Facebook to answer questions that help us better understand how digital ads affect the outcome of elections.   Click here for additional show notes! Thanks to our sponsor! https://neptune.ai/ Log, store, query, display, organize...

Fraud Detection in Crowdfunding Campaigns

July 18, 2022 15:13 - 35 minutes - 28.9 MB

Artificial Intelligence and Auction Design

July 11, 2022 12:59 - 43 minutes - 41.3 MB

Privacy Preference Signals

July 04, 2022 13:00 - 33 minutes - 30.3 MB

Have you ever wondered what goes on under the hood when you accept a website’s cookies? Today, Maximilian Hils, a PhD student in Computer Science, at the University of Innsbruck, Austria, dissects the ad tech industry and the standards put in place to protect users’ data. He also shares his thoughts on the use of VPNs as well as other tools that help shield your data from prying eyes on the internet. Click here for additional show notes Thanks to our sponsor: https://clear.ml/ ClearML i...

Neural Architecture Search for CTR Prediction

June 27, 2022 15:19 - 28 minutes - 41.4 MB

Ravi Krishna joins us today to talk about his recent work on a differentiable NAS framework for ads CTR prediction. He discussed what CTR prediction is about and why his NAS framework helps in building neural networks for better ads recommendation. Listen to learn about methodology, related literature and his results. Click for additional show notes Thanks to our sponsor: https://astrato.io Astrato is a modern BI and analytics platform built for the Snowflake Data Cloud. A next-generati...

Algorithmic PPC Management

June 21, 2022 22:10 - 43 minutes - 40.2 MB

Effectively managing a large budget of pay per click advertising demands software solutions. When spending multi-million dollar budgets on hundreds of thousands of keywords, an effective algorithmic strategy is required to optimize marketing objectives. In this episode, Nathan Janos joins us to share insights from his work in the ad tech industry. Click for additional show notes Thanks to our sponsor! https://wandb.com/ The developer-first MLOps platform. Build better models faster wi...

Data Skeptic: Ad Tech

June 18, 2022 03:44 - 42 minutes - 40 MB

Increasingly, people get most if not all of the information they consume online. Alongside the web sites, videos, apps, and other destinations, we’re consistently served advertisements alongside the organic content we search for or discover. Targetted ads make it possible for you to discover relevant new products you might otherwise not have heard about. Targetting can also open a pandora’s box of ethical considerations. Online advertising is a complex network of automated systems. Algorithm...

The Reliability of Mobile Phone Data

June 13, 2022 05:31 - 49 minutes - 56.7 MB

Our mobile phones generate an incredible amount of data inbound and outbound. In today’s episode, Nishant Kishore, a PhD graduate of Harvard University in Infectious Disease Epidemiology, explains how mobility data from mobile phones can be captured and analysed to understand the spread of infectious diseases. Click here for additional show notes Thanks to our sponsor! https://neptune.ai/ Log, store, query, display, organize, and compare all your model metadata in a single place

Haywire Algorithms

June 06, 2022 13:00 - 33 minutes - 38.4 MB

The pandemic changed how we lived. And this had a ripple effect on the performance of machine learning models. Ravi Parikh joins us today to discuss how the pandemic has affected the performance of machine learning models in clinical care and some actionable steps to fix it. Click here for additional show notes Thanks to our sponsor: Astera Centerprise is a no-code data integration platform that allows users to build ETL/ELT pipelines for modern data warehousing and analytics.

School Reopening Analysis

May 30, 2022 14:00 - 33 minutes - 38.1 MB

Carly Lupton-Smith joins us today to speak about her research which investigated the consistency between household and county measures of school reopening. Carly is a doctoral researcher in Biostatistics at Johns Hopkins Bloomberg School of Public Health. Listen to know about her findings. Click here for additional show notes on our website! Thanks to our sponsor! ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML work...

Twitter Mentions

@sami_r_yousif 2 Episodes
@leerosevere 2 Episodes
@halfak 1 Episode
@boreshkin 1 Episode
@tomlevenson 1 Episode
@mark_azurecat 1 Episode
@randal_olson 1 Episode
@karthick_sh 1 Episode
@andersdrachen 1 Episode
@iamzareenf 1 Episode
@rajiinio 1 Episode
@chengtao_chu 1 Episode
@antoine77340 1 Episode
@samuelmehr 1 Episode
@rajcs4 1 Episode
@anderssandberg 1 Episode
@celestiaward 1 Episode
@akalatian 1 Episode
@niftyc 1 Episode
@maverickpramit 1 Episode