Dark Secrets of Bert, Radioactive Data, and Vanishing Gradients
Journal Club
English - April 08, 2020 15:39 - 40 minutes - 46.7 MBMathematics Science Education computerscience machinelearning modelinterpretability Homepage Download Google Podcasts Overcast Castro Pocket Casts RSS feed
Previous Episode: Dopamine Deep Q Networks and Hey Alexa!
Next Episode: Tools For Misusing GPT2, Tensorflow, and ML Unfairness
Today on the show, Lan presents a blog post revealing the Dark secrets of BERT. This work uses telling visualizations of self-attention patterns before and after fine-tuning to probe: what happens in the fine-tuned BERT? George brings a novel technique to the show, "radioactive data" - a marriage of data and steganography. This work from Facebook AI Research gives us the ability to know exactly who's been training models on our data. Last but not least, Kyle discusses the work "Learning Important Features Through Propagating Activation Differences."