Data Skeptic
532 episodes - English - Latest episode: 5 days ago - ★★★★★ - 477 ratingsThe Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Homepage Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed
Episodes
Measuring Trust in Robots with Likert Scales
April 03, 2023 18:05 - 47 minutes - 54.5 MBWe are joined by two guests today, Mariah, a Ph.D. student in the CORE Robotics Lab at Georgia Tech, and Matthew Gombolay, the Director of the CORE Robotics Lab. They both discuss practices for measuring a respondent’s perception in a survey.
CAREER Prediction
March 27, 2023 15:36 - 40 minutes - 35.3 MBEver wondered what your next career would be? Today, Keyon Vafa, a computer science Ph.D. student at Columbia University, joins us to discuss his latest research on developing a machine-learning model for career prediction. Keyon extensively spoke about how the model was developed and the possibilities it brings.
The Panel Study of Income Dynamics
March 21, 2023 06:13 - 34 minutes - 31.2 MBNoura Insolera, a Research Investigator with the Panel Study of Income Dynamics (PSID), joins us to share how PSID conducts longitudinal household surveys. She also shared some interesting findings from their data exploration, particularly on the observation and trends in food insecurity.
Survey Design Working Session
March 14, 2023 18:10 - 1 hour - 56.5 MBSusan Gerbic joins Kyle to review some of the surveys Data Skeptic has launch, draft a new survey about podcast listening habits, and then review the results of that survey. You can see those results at the link below. https://survey.dataskeptic.com/survey/result/1675102237053 Watch the videos Susan mentioned on her Youtube page at the link below. https://www.youtube.com/playlist?list=PL7VAuaQDhPTVaLeI1IcpYph5lH19xA1u4
Bot Detection and Dyadic Surveys
March 06, 2023 18:08 - 35 minutes - 40.5 MBThe use of social bots to fill out online surveys is becoming prevalent. Today, we speak with Sara Bybee, a postdoctoral research scholar at the University of Utah. Sara shares from her research, how she detected social bots, the strategies to curb them, and how underrepresented groups can be more represented in surveys.
Reproducible ESP Testing
February 20, 2023 14:00 - 47 minutes - 54.4 MBOur guest today is Zoltán Kekecs, a Ph.D. holder in Behavioural Science. Zoltán highlights the problem of low replicability in journal papers and illustrates how researchers can better ensure complete replication of their research and findings. He used Bem’s experiment as an example, extensively talking about his methodology and results.
A Survey of Data Science Methodologies
February 13, 2023 17:43 - 24 minutes - 29.4 MBOn the show, Iñigo Martinez, a Ph.D. student at the University of Navarra shares his survey results which investigated how data practitioners perform data science projects. He revealed the methodologies typically used by data practitioners and the success factors in data science projects.
Opinion Dynamics Models
February 06, 2023 20:17 - 35 minutes - 40.9 MBOn the show today, Dino Carpentras, a post-doctoral researcher at the Computational Social Science group at ETH Zürich joins us to discuss how opinion dynamics models are built and validated. He explained how quantifying opinions is complex, and strategies to develop robust models for measuring and predicting public opinions.
Casual Affective Triggers
January 30, 2023 19:58 - 35 minutes - 41.2 MBCrafting survey questions is one thing but getting your audience to fill it is yet another. On the show today, we speak with Alexander Nolte, an Associate Professor at the University of Tartu. Alexander discussed the use of Casual Affective Triggers (CAT) to incentivize people to accept survey invitations and improve the completion rate. He revealed the impact of CATs on survey response rates from a study he conducted.
Conversational Surveys
January 23, 2023 14:00 - 39 minutes - 45.9 MBTraditional surveys have straight-jacket questions to be answered, thus restricting the information that can be gotten. Today, Ziang Xiao, a Postdoc Researcher in the FATE group at Microsoft Research Montréal, talks about conversational surveys, a type of survey that asks questions based on preceding answers. He discussed the benefits of conversational surveys and some of the challenges it poses.
Do Results Generalize for Privacy and Security Surveys
January 17, 2023 20:39 - 40 minutes - 27.9 MBToday, Jenny Tang, a Ph.D. student of societal computing at Carnegie Mellon University discusses her work on the generalization of privacy and security surveys on platforms such as Amazon MTurk and Prolific. Jenny shared the drawbacks of using such online platforms, the discrepancies observed about the samples drawn, and key insights from her results.
4 out of 5 Data Scientists Agree
January 10, 2023 16:46 - 28 minutes - 33 MBThis episode kicks off the new season of the show, Data Skeptic: Surveys. Linhda rejoins the show for a conversation with Kyle about her experience taking surveys and what questions she has for the season. Lastly, Kyle announces the launch of survey.dataskeptic.com, a new site we're launching to gather your opinions. Please take a moment and share your thoughts!
Crowdfunded Board Games
December 26, 2022 14:00 - 34 minutes - 39.5 MBIt may be intuitive to think crowdfunding a project drives its innovation and novelty, but there are no empirical studies that prove this. On the show, Johannes Wachs shares his research that sought to determine whether crowdfunding truly drives innovation. He used board games as a case study and shared the results he found.
Russian Election Interference Effectiveness
December 19, 2022 14:05 - 41 minutes - 48.1 MBThere were reports of Russia’s interference in the 2016 US elections. In today’s episode, Koustuv Saha, a researcher at Microsoft Research walks us through the effect of targeted ads for political campaigns. Using practical examples, he discusses how targeted ads can propagate fake news, its ripple effects on electioneering, and how to find a sweet spot with targeted ads.
Placement Laundering Fraud
December 15, 2022 17:38 - 32 minutes - 37.7 MBThere is an unsung kind of ad fraud brewing in the ad tech space — placement laundering fraud. On the show, Jeff Kline discusses what placement laundering fraud is, how it can be identified, and possible solutions to it. Listen to learn more.
Data Clean Rooms
December 12, 2022 17:54 - 31 minutes - 36.4 MBBosko Milekic, the Co-founder of Optable, a data collaboration platform for the media and advertising industry, joins us today. Bosko talked about the clean rooms, the technology driving data privacy during collaboration. He discussed why clean rooms are gaining widespread adoption, and how users can exploit Optable’s clean room platform for a secured data-sharing experience.
Dark Patterns in Site Design
December 05, 2022 17:04 - 34 minutes - 39.9 MBKerstin Bongard-Blanchy is a Research Associate at the University of Luxembourg. She joins us to discuss her study that investigated dark patterns in web designs. She discussed the results, the effect of dark patterns effect on users, whether an average user can detect them, and the way forward to a more ethical web space.
Internet Advertising Bureau Media Lab
December 03, 2022 17:19 - 37 minutes - 35.8 MBWe are joined by Anthony Katsur, the CEO of IAB Tech Lab. Anthony discusses standards within the ad tech industry. He explained how IAB Tech Lab set and propagates global standards, actions to ensure compliance from advertisers, and industry trends for a more privacy-centric ad tech space.
Your Mouse Reveals Your Gender and Age
November 28, 2022 14:00 - 39 minutes - 45.7 MBWhen we navigate a webpage, it is fairly easy for our mouse movement to be tracked and collected. Today, Luis Leiva, a Professor of Computer Science discusses how these mouse tracking data can be used to predict age, gender and user attention. He also discusses the privacy concerns with mouse tracking data and possible ways it can be curtailed.
Measuring Web Search Behavior
November 21, 2022 15:56 - 36 minutes - 33.1 MBOn the show, Aleksandra Urman and Mykola Makhortykh join us to discuss their work on the comparative analysis of web search behavior using web tracking data. They shared interesting results from their analysis, bordering around the user preferences for search engines, demographic patterns, and differences between how men and women surf the net.
StrategyQA and Big Bench
November 18, 2022 03:39 - 41 minutes - 48.2 MBDid Aristotle Use a Laptop? That's a question from the StrategyQA benchmark which highlights the stretch goals for current artificial intelligence systems. Answering a question like that requires several cognitive steps and reasoning. Constructing a dataset of similarly challenging questions is a major undertaking. On today's episode, Mor Geva returns to share details about the creation of StrategyQA and the larger Big Bench dataset it has been included in.
Ad Blockers Effect on News Consumption
November 14, 2022 16:17 - 38 minutes - 30.1 MBWhile at first glance, the use of ad blockers drops the revenue of news publishers, this may not be completely true. On the show today, Shunyao Yan, an Assistant Professor in Marketing at Leavey School of Business, Santa Clara University, discussed the effect of ad blockers on news consumption and how ad blockers can potentially be helpful for news publishers.
Your Consent is Worth 75 Euros a Year
November 07, 2022 15:00 - 24 minutes - 27.5 MBPeople who do not want their data tracked and shared online can pay a token for a cookie paywall. But are the websites keeping to their side of the bargain? Victor Morel, a Postdoc candidate at the Chalmers University of Technology joins us to discuss his work around auditing the activities of cookie paywalls. He discussed the findings from his analysis and proffers some solutions to making cookie paywalls more transparent.
Automated Email Generation for Targeted Attacks
October 31, 2022 15:52 - 45 minutes - 51.6 MBThe advancement of generative language models has been a force for good, but also for evil. On the show, Avisha Das, a post-doctoral scholar at the University of Texas Health Center, joins us to discuss how attackers use machine learning to create unsuspecting phishing emails. She also discussed how she used RNN for automated email generation, with the goal of defeating statistical detectors.
Tribal Marketing
October 24, 2022 15:33 - 37 minutes - 32.3 MBPeter Gloor, a Research Scientist at the MIT Center for Collective Intelligence, takes us on a new world of tribe classification. He extensively discussed the need for such classification on the internet and how he built a machine learning model that does it. Listen to find out more!
Nano-targetted Facebook Ads
October 17, 2022 12:55 - 44 minutes - 39.8 MBDebiasing GPT-3 Job Ads
October 10, 2022 13:00 - 48 minutes - 43.6 MBWe hear about the impeccable achievements of GPT-3 models, but such large generative models come with their bias. On the show today, Conrad Borchers, a Ph.D. student in Human-Computer Interaction, joins us to discuss the bias in GPT-3 for job ads and how such large models can be de-biased. Listen to learn more!
ML Ops in Production
October 06, 2022 21:03 - 41 minutes - 37.2 MBMoses Guttman from Clear ML joins us to share insights about how organizations leveraging machine learning keep their programs on track. While many parallels exist between the software development life cycle (SWLC) and the machine learning development life cycle, successful deployments of ML in production have demonstrated that a unique set of tools is required. Moses and I discuss the emergence of ML Ops, success stories, and how modern teams leverage tools like Clear ML's open source sol...
Ad Network Tomography
October 03, 2022 13:00 - 35 minutes - 28.3 MBData sharing in the ad tech space has largely been a black box system. While it is obvious the data is being collected, the data sharing process is obscure to users. On the show today, Maaz Bin Musa and Rishab, both researchers at the University of Iowa, speak about the importance of data transparency and their tool, ATOM for data transparency. Listen to find out how ATOM uncovers data-sharing relationships in the ad-tech space.
First Party Tracking Cookies
September 26, 2022 13:00 - 35 minutes - 31.8 MBWhen you accept cookies on a website, you cannot tell whether the cookies are used for tracking your personal data or not. Shaoor Munir’s machine learning model does that. On the show today, the Ph.D student at the University of California, discussed the world of first-party cookies and how he developed a machine learning model that predicts whether a first-party cookie is used for tracking purposes.
The Harms of Targeted Weight Loss Ads
September 19, 2022 13:00 - 34 minutes - 40.9 MBLiza Gak, a Ph.D. student at UC Berkeley, joins us to discuss her research on harmful weight loss advertising. She discussed how weight loss ads are not fact-checked, and how they typically target the most vulnerable. She extensively discussed her interview process, data analysis, and results. Listen for more!
Podcast Advertising
September 12, 2022 13:00 - 35 minutes - 35.6 MBGrowing your podcast to the point of monetization is not a walk in the park. Today, Rob Walch, the VP of Podcast Relations at Libsyn talks about podcast advertising. He discussed how advertising works, how to grow your audience and some blueprints to being a successful podcaster. Listen for more.
Fairness in e-Commerce Search
September 05, 2022 14:41 - 40 minutes - 35.7 MBWhen we search for products in e-commerce stores, we do not care what goes on under the hood to generate the results. However, there may be an intentional algorithmic effort to gravitate us toward a particular product. On the show, today, Abhisek Dash and Saptarshi Ghosh discuss their research on fairness in the search result of Amazon smart speakers.
Fraudulent Amazon Reviewers
August 29, 2022 13:00 - 41 minutes - 37.8 MBChances are that you have bought a product online majorly because of the reviews you saw. Unfortunately, not all reviews are genuine. Today, Rajvardhan Oak shares some insight from his research on fraudulent Amazon reviews. He explained the inner workings of fraudulent reviews and revealed key insights from his qualitative and quantitative study.
Ad Targeting in Amazon Smart Speakers
August 22, 2022 13:14 - 32 minutes - 29.7 MBWhile we give attention to textual data on the web, many do not know the unique power of echo interactions with smart devices for ad targeting. Today, our guest, Umar Iqbal joins us to discuss his study on using Amazon Smart Speakers for ad targeting. He gave interesting revelations about how voice data is captured and analysed for ad purposes. Listen to find out more.
Adwords with Unknown Budgets
August 15, 2022 13:00 - 34 minutes - 31.3 MBRajan Udwani, an Assistant Professor at the University of California Berkeley joins us to discuss his work on AdWords with unknown budgets. He discussed the previous approaches to ad allocation, as well as his maiden approach that introduced randomization for better results. Listen for more.
ML Ops Best Practices
August 12, 2022 12:00 - 30 minutes - 34.5 MBToday, we are joined by Piotr Niedźwiedź, Founder and CEO of Neptune.ai. Piotr discusses common MLOps activities by data science teams and how they can take advantage of Neptune.ai for better experiment tracking and efficiency. Listen for more!
Affiliate Marketing Rabbithole
August 08, 2022 12:46 - 52 minutes - 47.9 MBAffiliate marketing creates an opportunity for marketers to gain a commission by promoting a product or service. Cookies are typically used for tracking and the advertiser whose product or service is being featured pays the marketing only on transactions. Today's episode covers those approaches and is also a story of conflict between two large companies and how one affiliate marketer got caught in the middle.
Monetization of Youtube Conspiracy Theorists
August 01, 2022 13:00 - 54 minutes - 49.5 MBCameron Ballard joins us today to discuss his work around YouTube conspiracy theories. He revealed interesting observations about conspiracy theories on YouTube including how predatory ads are most common in conspiracy theory videos and how YouTube’s algorithm subtly works for predatory ads.
User Perceptions of Problematic Ads
July 25, 2022 13:00 - 37 minutes - 34.9 MBEric Zeng joins us to discuss his study around understanding bad ads and efforts that can be taken to limit bad ads online. He discussed how he and his co authors scrapped a large amount of ad data, applied a machine learning algorithm, and commensurate statistical results.
Political Digital Advertising Analysis
July 21, 2022 18:15 - 35 minutes - 30.6 MBNaLette Brodnax, a political scientist and an Assistant Professor in the McCourt School of Public Policy at Georgetown University joins us to discuss her work on analyzing digital advertisements for political campaigns. She used data for electoral campaigns on Facebook to answer questions that help us better understand how digital ads affect the outcome of elections. Click here for additional show notes! Thanks to our sponsor! https://neptune.ai/ Log, store, query, display, organize...
Fraud Detection in Crowdfunding Campaigns
July 18, 2022 15:13 - 35 minutes - 28.9 MBArtificial Intelligence and Auction Design
July 11, 2022 12:59 - 43 minutes - 41.3 MBPrivacy Preference Signals
July 04, 2022 13:00 - 33 minutes - 30.3 MBHave you ever wondered what goes on under the hood when you accept a website’s cookies? Today, Maximilian Hils, a PhD student in Computer Science, at the University of Innsbruck, Austria, dissects the ad tech industry and the standards put in place to protect users’ data. He also shares his thoughts on the use of VPNs as well as other tools that help shield your data from prying eyes on the internet. Click here for additional show notes Thanks to our sponsor: https://clear.ml/ ClearML i...
Neural Architecture Search for CTR Prediction
June 27, 2022 15:19 - 28 minutes - 41.4 MBRavi Krishna joins us today to talk about his recent work on a differentiable NAS framework for ads CTR prediction. He discussed what CTR prediction is about and why his NAS framework helps in building neural networks for better ads recommendation. Listen to learn about methodology, related literature and his results. Click for additional show notes Thanks to our sponsor: https://astrato.io Astrato is a modern BI and analytics platform built for the Snowflake Data Cloud. A next-generati...
Algorithmic PPC Management
June 21, 2022 22:10 - 43 minutes - 40.2 MBEffectively managing a large budget of pay per click advertising demands software solutions. When spending multi-million dollar budgets on hundreds of thousands of keywords, an effective algorithmic strategy is required to optimize marketing objectives. In this episode, Nathan Janos joins us to share insights from his work in the ad tech industry. Click for additional show notes Thanks to our sponsor! https://wandb.com/ The developer-first MLOps platform. Build better models faster wi...
Data Skeptic: Ad Tech
June 18, 2022 03:44 - 42 minutes - 40 MBIncreasingly, people get most if not all of the information they consume online. Alongside the web sites, videos, apps, and other destinations, we’re consistently served advertisements alongside the organic content we search for or discover. Targetted ads make it possible for you to discover relevant new products you might otherwise not have heard about. Targetting can also open a pandora’s box of ethical considerations. Online advertising is a complex network of automated systems. Algorithm...
The Reliability of Mobile Phone Data
June 13, 2022 05:31 - 49 minutes - 56.7 MBOur mobile phones generate an incredible amount of data inbound and outbound. In today’s episode, Nishant Kishore, a PhD graduate of Harvard University in Infectious Disease Epidemiology, explains how mobility data from mobile phones can be captured and analysed to understand the spread of infectious diseases. Click here for additional show notes Thanks to our sponsor! https://neptune.ai/ Log, store, query, display, organize, and compare all your model metadata in a single place
Haywire Algorithms
June 06, 2022 13:00 - 33 minutes - 38.4 MBThe pandemic changed how we lived. And this had a ripple effect on the performance of machine learning models. Ravi Parikh joins us today to discuss how the pandemic has affected the performance of machine learning models in clinical care and some actionable steps to fix it. Click here for additional show notes Thanks to our sponsor: Astera Centerprise is a no-code data integration platform that allows users to build ETL/ELT pipelines for modern data warehousing and analytics.
School Reopening Analysis
May 30, 2022 14:00 - 33 minutes - 38.1 MBCarly Lupton-Smith joins us today to speak about her research which investigated the consistency between household and county measures of school reopening. Carly is a doctoral researcher in Biostatistics at Johns Hopkins Bloomberg School of Public Health. Listen to know about her findings. Click here for additional show notes on our website! Thanks to our sponsor! ClearML is an open-source MLOps solution users love to customize, helping you easily Track, Orchestrate, and Automate ML work...