Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients

Journal Club

English - April 08, 2020 15:39 - 40 minutes - 46.7 MB
Mathematics Science Education computerscience machinelearning modelinterpretability Homepage Download Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Dopamine Deep Q Networks and Hey Alexa!

Next Episode: Tools For Misusing GPT2, Tensorflow, and ML Unfairness

Today on the show, Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT? George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Last but not least, Kyle discusses the work "Learning Important Features Through Propagating Activation Differences."