Journal Club artwork

Journal Club

22 episodes - English - Latest episode: almost 4 years ago -

Welcome to a brand new show from Data Skeptic entitled "Journal Club".

Each episode will feature a regular panel and one revolving guest seat. The group will discuss a few topics related to data science and focus on one featured scholarly paper which is discussed in detail.

Mathematics Science Education computerscience machinelearning modelinterpretability
Homepage Google Podcasts Overcast Castro Pocket Casts RSS feed

Episodes

Deep Fakes in a Court Room, Mass COVID-19 Testing with Biosensors, and BLEURT

July 06, 2020 17:26 - 38 minutes - 43.6 MB

We are back with our regular panel this week! Starting off we have Lan who brings us the article "Biosensors May Hold the Key to Mass Coronavirus Testing." Which talks about tech startups beginning to develop chips that signal the presence of the coronavirus RNA, antibodies, and antigens. George brings us a blog post all about BLEURT, titled "BLEURT: Learning Robust Metrics for Text Generation." Last but not least, Kyle discusses the main paper this week! He brought us a paper discussing Dee...

Covid-19 Misinformation, GPT-3, and Movement Pruning

July 01, 2020 14:19 - 41 minutes - 47.9 MB

We're back with a special guest panelist Leonardo Apolonio! He brings us the main paper this week titled "Movement Pruning: Adaptive Sparsity by Fine-Tuning." George shows us a blog post discussing GPT-3. Lan introduces us to an article about misinformation related to Covid-19. Last but not least, Kyle also has a topic about Covid-19 addressing contact tracing apps! 

Open Source AI for Everyone, Diagnosing Blindness and Histogram Reweighting

June 24, 2020 19:06 - 32 minutes - 36.7 MB

Another week, another episode! We are back again with our regular panelists. George brings us a clinical field study with an AI that is being used to diagnose blindness. Lan discusses the article titled "AI Infrastructure for Everyone, Now Open Sourced." Last but not least, Kyle brings us our paper for the week. He brings us the paper "Extending Machine Learning Classification Capabilities with Histogram Reweighting."  

Chip Design, Teaching Google, and Fooling LIME and SHAP

June 16, 2020 18:31 - 32 minutes - 37.4 MB

This weeks episode we have the regular panel back together! George brought us the blog post from Google AI, "Chip Design with Deep Reinforcement Learning." Kyle brings us a news item from CNET, "How People with Down Syndrome are Improving Google Assistant." Lan brings us the paper this week! She discusses the paper "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods." All works mentioned will be linked in the show notes. 

Hateful Memes, Carbon Emissions, and Detecting Ear Infections with Neural Networks

June 10, 2020 20:32 - 44 minutes - 50.5 MB

This week on Journal Club we have another panelist! Jesus Rogel-Salazar joins us this week to discuss the paper Automatic Detection of Tympanic Membrane and Middle Ear Infection. Kyle talks about the relationship between Covid-19 and Carbon Emissions. George tells us about the new Hateful Memes Challenge from Facebook. Lan joins us to talk about Google's AI Explorables.  All mentioned work can be found in the show notes.     

Animal Olympics, Whatsapp, and Models for Healthcare

June 03, 2020 20:08 - 41 minutes - 47.6 MB

This week we have a guest joining us, Francisco J. Azuaje G! He brings us the paper "How to Develop Machine Learning Models for Healthcare." Lan discusses "Animal AI Olympics," a reinforcement learning competition inspired by animal cognition. Kyle talks about WhatsApp and discusses the article "Why New Contact Tracing Apps Have A Critical WhatsApp-Sized Problem." Last but not least: George! He brings us his blog post about comparing TF-IDF and BERT vectorisation for speaker prediction.  A...

Deeply Tough Framework, Grammar for Agents, and Too Much Screen Time?

May 26, 2020 21:24 - 34 minutes - 39.4 MB

Today on the show Kyle discusses research which suggests that time on screens has little impact on kids' social skills. Lan talks about DeeplyTough a deep learning framework targeting the protein pocket matching problem - try to answer whether a pair of protein pockets can bind to the same ligand.George's paper this week is about defining a grammar for interpretable agents. By basing this formalism on a corpus of human explanation dialogues the authors hope to produce a more "grounded" proto...

Chemical Space, AI Microscope, and Panda or Gibbon?

May 19, 2020 14:09 - 31 minutes - 35.7 MB

George talks about OpenAI's Microscope, a collection of visualisations of the neurons and layers in 6 famous vision models. This library hopes to make analysis of these models a community effort.  Lan talks about Exploring chemical space with AI and how that may change pharmaceutical drug discovery and development.  Kyle leads a discussion about the paper "Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions" which shows another control that an adversarial att...

Encryption Keys, Connect Four, and Data Nutrition Labels

May 14, 2020 13:30 - 41 minutes - 47.1 MB

Today George takes inspiration and the gym environment from Kaggle's ConnectX competition and shows off and attempt to design an interpretable Connect 4 Agent with DQN! Lan discusses the paper "The Dataset Nutrition Label," which is a framework to facilitate higher data quality standards by Sarah Holland and co-authors, from the Assembly program at the Berkman Klein Center at Harvard University & MIT Media Lab. Last but not least, Kyles leads the panel in a discussion about encryption keys! ...

ML Cancer Diagnosis, Robot Assistants, and Watermarking Data

May 06, 2020 18:49 - 31 minutes - 35.5 MB

Today George talks about the use of Machine Learning to diagnose Cancer from a blood test. By sampling 'cell-free-DNA' this test is capable of identifying 50 different types of Cancer and the localized tissue of origin with a >90% accuracy. Lan leads a discussion of what robots and researchers in robotics may be able to contribute towards fighting the COVID-19 pandemic. Last but not least, Kyle leads the panel in a discussion about watermarking data!   

Humanitarian AI, PyTorch Models, and Saliency Maps

April 30, 2020 23:50 - 27 minutes - 31.6 MB

George's paper this week is Sanity Checks for Saliency Maps. This work takes stock of a group of techniques that generate local interpretability - and assesses their trustworthiness through two 'sanity checks'. From this analysis, Adebayo et al demonstrate that a number of these tools are invariant to the model's weights and could lead a human observer into confirmation bias. Kyle discusses AI and brings the question: How can AI help in a humanitarian crisis? Last but not least, Lan brings u...

Adversarial Examples, Protein Folding, and Shapley Values

April 28, 2020 20:28 - 45 minutes - 52.6 MB

George dives into his blog post experimenting with Scott Lundberg's SHAP library. By training an XGBoost model on a dataset about academic attainment and alcohol consumption can we develop a global interpretation of the underlying relationships? Lan leads the discussion of the paper Adversarial Examples Are Not Bugs, They Are Features by Ilyas and colleagues. This papers proposes a new perspective on adversarial susceptibility of machine learning models by teasing apart the 'robust' and the ...

Adversarial Examples Protein Folding and Shapley Values

April 22, 2020 23:07 - 45 minutes - 52.6 MB

George dives into his blog post experimenting with Scott Lundberg's SHAP library. By training an XGBoost model on a dataset about academic attainment and alcohol consumption can we develop a global interpretation of the underlying relationships? Lan leads the discussion of the paper Adversarial Examples Are Not Bugs, They Are Features by Ilyas and colleagues. This papers proposes a new perspective on adversarial susceptibility of machine learning models by teasing apart the 'robust' and th...

Tools For Misusing GPT2, Tensorflow, and ML Unfairness

April 15, 2020 12:00 - 25 minutes - 29.7 MB

Today on the show, George leads a discussion about the Giant Language Test Room.  Lan presents a news item about Setting Fairness Goals with TensorFlow Constrained Optimization Library. This library lets users configure and train machine learning problems based on multiple different metrics, making it easy to formulate and solve many problems of interest to the fairness community. Last but not least, Kyle discusses ML Unfairness, Juvenile Recidivism in Catalonia.

Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

April 08, 2020 15:39 - 40 minutes - 46.7 MB

Today on the show, Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT?  George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Last but not least, Kyle discusses the work "Learning Imp...

Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

April 08, 2020 15:39 - 40 minutes - 46.7 MB

Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT? George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Kyle: Learning Important Features Through Propagating Activation Differen...

Dopamine Deep Q Networks and Hey Alexa!

April 01, 2020 21:42 - 36 minutes - 41.4 MB

Journal Club Lan presents a blog post from Google Deepmind about Dopamine and temporal difference learning. This is the story of a fruitful collaboration between Neuroscience and AI researchers that found the activity of dopamine neurons in the mouse ventral tegmental area during a learnt probablistic reward task was consistent with distributional temporal-difference reinforcement learning. That's a mouthful, go read it yourself! Kyle: Hey Alexa! Sorry I fooled you ... George presents ...

Dopamine, Deep Q Networks, and Hey Alexa!

April 01, 2020 21:42 - 36 minutes - 41.4 MB

Today on the show, Lan presents a blog post from Google Deepmind about Dopamine and temporal difference learning. This is the story of a fruitful collaboration between Neuroscience and AI researchers that found the activity of dopamine neurons in the mouse ventral tegmental area during a learnt probabilistic reward task was consistent with distributional temporal-difference reinforcement learning. That's a mouthful, go read it yourself! George presents his first attempts at designing an Auto...

AlphaGo, COVID-19 Contact Tracing, and New Data Set

March 27, 2020 01:39 - 31 minutes - 36.4 MB

George led a discussion about AlphaGo - The Movie | Full Documentary. Lan informed us about the COVID-19 Open Research Dataset. Kyle shared some thoughts about the paper Beyond R_0: the importance of contact tracing when predicting epidemics.

Google's New Data Engine, Activation Atlas, and LIME

March 22, 2020 17:10 - 38 minutes - 44.1 MB

George discusses Google's Dataset Search leaving its closed beta program, and what potential applications it will have for businesses, scholars, and hobbyists. Alex brings an article about Activation Atlases and we discusses the applicability to machine learning interpretability. Lan leads a discussion about the paper Attention is not Explanation from Sarthak Jain and Byron C. Wallace. It explores the relationship between attention weights and feature importance scores (spoilers in the t...

Albert, Seinfeld, and Explainable AI

March 22, 2020 17:06 - 36 minutes - 41.6 MB

Kyle discusses Google's recent open sourcing of ALBERT, a variant of the famous BERT model for natural language processing. ALBERT is more compact and uses fewer parameters.  George leads a discussion about the paper Explainable Artificial Intelligence: Understanding, visualizing, and interpreting deep learning models by Samek, Wiegand, and Muller. This work introduces two tools for generating local interpretability and a novel metric to objectively compare the quality of explanations. Last ...

Chess Transformer, Kaggle Scandal, and Interpretability Zoo

March 12, 2020 15:58 - 44 minutes - 50.4 MB

Welcome to a brand new show from Data Skeptic entitled "Journal Club". Each episode will feature a regular panel and one revolving guest seat. The group will discuss a few topics related to data science and focus on one featured scholarly paper which is discussed in detail. Lan tells the story of a transformer learning to play chess. The experiment was to fine-tune a GPT-2 transformer model using a 2.4M corpus of chess games in standard notation, then to see if it can 'play chess' by gene...