Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)

Yannic Kilcher Videos (Audio Only)

English - March 02, 2022 06:50 - 44 minutes - 41.4 MB - ★★★★★ - 1 rating
Technology Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)

Next Episode: Competition-Level Code Generation with AlphaCode (Paper Review)

#wikipedia #reinforcementlearning #languagemodels

Original paper review here: https://youtu.be/XHGh19Hbx48

Machel Reid and Yutaro Yamada join me to discuss their recent paper on langauge model pre-training for decision transformers in offline reinforcement learning.

OUTLINE:

0:00 - Intro

1:00 - Brief paper, setup & idea recap

7:30 - Main experimental results & high standard deviations

10:00 - Why is there no clear winner?

13:00 - Why are bigger models not a lot better?

14:30 - What’s behind the name ChibiT?

15:30 - Why is iGPT underperforming?

19:15 - How are tokens distributed in Reinforcement Learning?

22:00 - What other domains could have good properties to transfer?

24:20 - A deeper dive into the models' attention patterns

33:30 - Codebase, model sizes, and compute requirements

37:30 - Scaling behavior of pre-trained models

40:05 - What did not work out in this project?

42:00 - How can people get started and where to go next?

Paper: https://arxiv.org/abs/2201.12122

Code: https://github.com/machelreid/can-wik...

My Video on Decision Transformer: https://youtu.be/-buULmf7dec

Abstract:

Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.

Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu

Links:

TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

YouTube: https://www.youtube.com/c/yannickilcher

Twitter: https://twitter.com/ykilcher

Discord: https://discord.gg/4H8xxDF

BitChute: https://www.bitchute.com/channel/yann...

LinkedIn: https://www.linkedin.com/in/ykilcher

BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

SubscribeStar: https://www.subscribestar.com/yannick...

Patreon: https://www.patreon.com/yannickilcher

Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Twitter Mentions

@ykilcher