Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

Journal Club

English - April 08, 2020 15:39 - 40 minutes - 46.7 MB
Mathematics Science Education computerscience machinelearning modelinterpretability Homepage Download Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Dopamine Deep Q Networks and Hey Alexa!

Next Episode: Tools For Misusing GPT2, Tensorflow, and ML Unfairness

Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT?

George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data.

Kyle: Learning Important Features Through Propagating Activation Differences