Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients
Journal Club
English - April 08, 2020 15:39 - 40 minutes - 46.7 MBMathematics Science Education computerscience machinelearning modelinterpretability Homepage Download Google Podcasts Overcast Castro Pocket Casts RSS feed
Previous Episode: Dopamine Deep Q Networks and Hey Alexa!
Next Episode: Tools For Misusing GPT2, Tensorflow, and ML Unfairness
Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT?
George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data.
Kyle: Learning Important Features Through Propagating Activation Differences