All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
arXiv Preprint arXiv 2505 21136
Openvino Docker Quick Start
Vllm GitHub Windows
Ai Agent with LLM Project
Uim2lm
KV Gokkun Reduced
K80 LLM
Inference
LLM
Split Inference
What Is
Speculative Execution
LLM
Paged Attention Breakthrough
RVC LLM
UI
Sqampling in Lmmqs
Capacity Estimate
LLM
Decoding
Llsd File in Word
LLM
in a Nut Shell
LLM
Speed Comparison
LLM
Flow Router
Deep Plunge Modeling
Intellect 1
LLM
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
arXiv Preprint arXiv 2505 21136
Openvino Docker Quick Start
Vllm GitHub Windows
Ai Agent with LLM Project
Uim2lm
KV Gokkun Reduced
K80 LLM
Inference
LLM
Split Inference
What Is
Speculative Execution
LLM
Paged Attention Breakthrough
RVC LLM
UI
Sqampling in Lmmqs
Capacity Estimate
LLM
Decoding
Llsd File in Word
LLM
in a Nut Shell
LLM
Speed Comparison
LLM
Flow Router
Deep Plunge Modeling
Intellect 1
LLM
🌵 Speculative Speculative DecodingWhat if your draft model could speculate while the target model is still verifying? That's the idea behind Speculative Speculative Decoding (SSD). I've been… | Maxime Labonne | 15 comments
15 views
2 months ago
linkedin.com
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
Aug 1, 2024
qualcomm.com
Speculative Decoding — Think Fast⚡, Then Think Right✅
Apr 13, 2025
substack.com
Faster LLMs: Accelerate Inference with Speculative Decoding
11 months ago
ibm.com
2:09
Secret AI Architecture Runs LLMs on Mobile 4x Faster #Shorts
1 views
1 week ago
YouTube
CollapsedLatents
17:15
Multi-Token Prediction (MTP): Accelerating Local Models with no Quality Loss
1.4K views
1 week ago
YouTube
Onchain AI Garage
1:42:48
Why LLMs Don't 'Hallucinate' — A Lecture on Spherically Constrained Stochasticity and DIF
43K views
2 weeks ago
YouTube
Goju Tech Talk
3:08
What is Speculative Decoding ?
38 views
2 weeks ago
YouTube
DeepManim
7:09
Don't use speculative decoding until you watch this
7 views
1 month ago
YouTube
DigitalOcean
4:13
Recurrent Transformer: Better LLM Decoding
31 views
3 weeks ago
YouTube
AI Research Roundup
40:19
Speculation is all you need: Intro to Speculative Decoding for High Performance Inference
753 views
2 months ago
YouTube
Modal
8:27
600 Toks/Second Gemma4-26B —The Setting That Actually Wins (vLLM + Dflash Speculative Decoding)
3.4K views
2 weeks ago
YouTube
Tech-Practice
5:04
Speculative Decoding: 2-3x Faster LLMs for Free
1 views
1 month ago
YouTube
The AI Century
23:40
Speculative Speculative Decoding: How to Parallelize Drafting and ... for 2x Faster LLM Inference
178 views
2 months ago
YouTube
Xiaol.x
13:54
[IDSL Seminar'26] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
6 days ago
YouTube
IDSL
0:23
DFlash: Faster LLM Inference with Speculative Decoding
100 views
1 week ago
YouTube
OnlyCS
0:31
Speculative Decoding • LLM Acceleration Patterns
1 views
1 month ago
YouTube
Technical Interview Essentials A–Z
0:48
5 AI Terms Devs Are Quietly Searching More — April 2026
194 views
4 weeks ago
YouTube
Colony-AI
12:45
Speculative Decoding & Inference Speed — 2-3x Faster LLMs With Zero Quality Loss
2 weeks ago
YouTube
Jeff Heidelberger
10:41
Beam Search vs Greedy Decoding: LLM Tradeoffs for Production and Interviews
7 views
1 month ago
YouTube
Wei Sun
0:26
Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculati
10K views
2 weeks ago
x.com
Avi Chawla
SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving | Proceedings of the Tenth ACM/IEEE Symposium on Edge Computing
3 months ago
acm.org
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs | Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
2 months ago
acm.org
1:23
Speculative Speculative Decoding for Faster LLM Inference
2.1K views
2 months ago
YouTube
Rajistics - data science, AI, and machine learning
2:04
LLM Jargons Explained
2K views
Mar 3, 2024
YouTube
Sachin Kalsi
37:34
Speculative Decoding Explained
7.8K views
Dec 21, 2023
YouTube
Trelis Research
12:20
LLM Decoding Strategies Explained!
836 views
Apr 13, 2025
YouTube
Beyond Tokens
1:32
3. Decoding LLM Models
139 views
3 months ago
YouTube
RajeevK.S.Official
0:14
Google's STATIC: Faster Constrained Decoding!
37 views
2 months ago
YouTube
The AI Opus
2:55
Set Block Decoding: Faster LLM Inference
60 views
8 months ago
YouTube
AI Research Roundup
See more
More like this
Feedback