LLM Efficient Speculative Decoding - Search Videos

🌵 Speculative Speculative DecodingWhat if your draft model could speculate while the target model is still verifying? That's the idea behind Speculative Speculative Decoding (SSD). I've been… | Maxime Labonne | 15 comments

🌵 Speculative Speculative DecodingWhat if your draft model could speculate while the target model is still verifying? That's the idea behind Speculative Speculative Decoding (SSD). I've been… | Maxime Labonne | 15 comments

15 views2 months ago

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

Speculative Decoding — Think Fast⚡, Then Think Right✅

Speculative Decoding — Think Fast⚡, Then Think Right✅

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Secret AI Architecture Runs LLMs on Mobile 4x Faster #Shorts

Secret AI Architecture Runs LLMs on Mobile 4x Faster #Shorts

1 views1 week ago

YouTubeCollapsedLatents

Multi-Token Prediction (MTP): Accelerating Local Models with no Quality Loss

Multi-Token Prediction (MTP): Accelerating Local Models with no Quality Loss

1.4K views1 week ago

YouTubeOnchain AI Garage

Why LLMs Don't 'Hallucinate' — A Lecture on Spherically Constrained Stochasticity and DIF

Why LLMs Don't 'Hallucinate' — A Lecture on Spherically Constrained Stochasticity and DIF

43K views2 weeks ago

YouTubeGoju Tech Talk

What is Speculative Decoding ?

38 views2 weeks ago

YouTubeDeepManim

Don't use speculative decoding until you watch this

7 views1 month ago

YouTubeDigitalOcean

Recurrent Transformer: Better LLM Decoding

31 views3 weeks ago

YouTubeAI Research Roundup

Speculation is all you need: Intro to Speculative Decoding for High Performance Inference

753 views2 months ago

600 Toks/Second Gemma4-26B —The Setting That Actually Wins (vLLM + Dflash Speculative Decoding)

3.4K views2 weeks ago

YouTubeTech-Practice

Speculative Decoding: 2-3x Faster LLMs for Free

1 views1 month ago

YouTubeThe AI Century

Speculative Speculative Decoding: How to Parallelize Drafting and ... for 2x Faster LLM Inference

178 views2 months ago

[IDSL Seminar'26] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

DFlash: Faster LLM Inference with Speculative Decoding

100 views1 week ago

Speculative Decoding • LLM Acceleration Patterns

1 views1 month ago

YouTubeTechnical Interview Essentials A–Z

5 AI Terms Devs Are Quietly Searching More — April 2026

194 views4 weeks ago

YouTubeColony-AI

Speculative Decoding & Inference Speed — 2-3x Faster LLMs With Zero Quality Loss

YouTubeJeff Heidelberger

Beam Search vs Greedy Decoding: LLM Tradeoffs for Production and Interviews

7 views1 month ago

Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculati

10K views2 weeks ago

x.comAvi Chawla

SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | Proceedings of the Tenth ACM/IEEE Symposium on Edge Computing

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs | Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Speculative Speculative Decoding for Faster LLM Inference

2.1K views2 months ago

YouTubeRajistics - data science, AI, and machine learning

LLM Jargons Explained

2K viewsMar 3, 2024

YouTubeSachin Kalsi

Speculative Decoding Explained

7.8K viewsDec 21, 2023

YouTubeTrelis Research

LLM Decoding Strategies Explained!

836 viewsApr 13, 2025

YouTubeBeyond Tokens

3. Decoding LLM Models

139 views3 months ago

YouTubeRajeevK.S.Official

Google's STATIC: Faster Constrained Decoding!

37 views2 months ago

YouTubeThe AI Opus

Set Block Decoding: Faster LLM Inference

60 views8 months ago

YouTubeAI Research Roundup

See more