Fast LLM Inference - Optimized Task Plan

I hope to implement some acceleration technologies for Large Language Models (LLMs) because I enjoy doing this myself and love the challenge of bringing research papers into real-world applications.

If there are any technologies you'd like to develop or discuss, feel free to reach out. Thanks!

I'm excited to dive deeper into AI research!

Updates Log

2024

2024/12/16: Add the Medusa-1 Training Script v2
2024/12/15: Add the Medusa-1 Training Script
2024/12/12: Update the KV Cache support for Speculative Decoding
2024/12/04: Add the Kangaroo Training Script v2
2024/11/26: Add the Kangaroo Training Script
2024/11/22: Update the Target Model Keep Generation Mechanism experiment
2024/11/18: Update the Self-Speculative Decoding experiment results of google--gemma-2-9b-it.
2024/11/12: Reviewing implementation challenges for Self-Speculative Decoding and evaluating model compatibility for improved efficiency.
2024/11/10: Initial setup for Self-Speculative Decoding completed; data pipeline in place for testing draft-and-verify.
2024/11/08: Speculative Decoding successfully implemented. Verified improved inference time with no noticeable accuracy degradation.

Pending Decisions

Batched Speculative Decoding:
Prompt lookup decoding: Determine timeline after reviewing initial implementations.
UAG Integration: Assess when to integrate after Medusa and Kangaroo are in place.

TODO List

November 2024

Additional Enhancements

TBD | Implement Batched Speculative Decoding from The Synergy of Speculative Decoding and Batching in Serving Large Language Models
TBD | Implement prompt lookup decoding from prompt-lookup-decoding GitHub
TBD | Implement UAG (Universal Assisted Generation) from Universal Assisted Generation Blog

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
batched_speculative_decoding		batched_speculative_decoding
cuda_optimization		cuda_optimization
experiments		experiments
kangaroo		kangaroo
medusa		medusa
prompt_lookup_decoding		prompt_lookup_decoding
sampling		sampling
self_speculative_decoding		self_speculative_decoding
speculative_decoding		speculative_decoding
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fast LLM Inference - Optimized Task Plan

Updates Log

2024

Pending Decisions

TODO List

November 2024

Additional Enhancements

About

Uh oh!

Releases

Packages

Languages

ccs96307/fast-llm-inference

Folders and files

Latest commit

History

Repository files navigation

Fast LLM Inference - Optimized Task Plan

Updates Log

2024

Pending Decisions

TODO List

November 2024

Additional Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages