Speculative Decoding Draft Models Collection Collection of OpenVINO optimized efficient draft models for speculative decoding • 2 items • Updated 2 days ago • 6
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 35
Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length Paper • 2111.09645 • Published Nov 18, 2021