#80- Layer pruning and Mixture of Depths.

Life with AI

English - April 18, 2024 08:05 - 13 minutes - 18.7 MB - ★★★★★ - 1 rating
Technology Homepage Download Apple Podcasts Google Podcasts Overcast Castro Pocket Casts RSS feed

Previous Episode: #79- LoRA and QLoRA.

Next Episode: #81- Llama 3.

Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.

I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.

I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.

Paper MoD: ⁠https://arxiv.org/pdf/2404.02258.pdf⁠

Paper layer pruning: ⁠https://arxiv.org/pdf/2403.17887v1.pdf⁠

Instagram of the podcast: https://www.instagram.com/podcast.lifewithai

Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai