--- title: LLMLingua emoji: 📝 colorFrom: red colorTo: yellow sdk: gradio sdk_version: 4.36.0 app_file: app.py pinned: false license: mit --- Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference
| Project Page | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua Demo | LLMLingua-2 Demo |
## News - 🦚 We're excited to announce the release of **LLMLingua-2**, boasting a 3x-6x speed improvement over LLMLingua! For more information, check out our [paper](https://arxiv.org/abs/2403.), visit the [project page](https://llmlingua.com/llmlingua-2.html), and explore our [demo](https://huggingface.co./spaces/microsoft/LLMLingua-2). - 🤳 Talk slides are available in [AI Time Jan, 24](https://drive.google.com/file/d/1fzK3wOvy2boF7XzaYuq2bQ3jFeP1WMk3/view?usp=sharing). - 🖥 EMNLP'23 slides are available in [Session 5](https://drive.google.com/file/d/1GxQLAEN8bBB2yiEdQdW4UKoJzZc0es9t/view) and [BoF-6](https://drive.google.com/file/d/1LJBUfJrKxbpdkwo13SgPOqugk-UjLVIF/view). - 📚 Check out our new [blog post](https://medium.com/@iofu728/longllmlingua-bye-bye-to-middle-loss-and-save-on-your-rag-costs-via-prompt-compression-54b559b9ddf7) discussing RAG benefits and cost savings through prompt compression. See the script example [here](https://github.com/microsoft/LLMLingua/blob/main/examples/Retrieval.ipynb). - 🎈 Visit our [project page](https://llmlingua.com/) for real-world case studies in RAG, Online Meetings, CoT, and Code. - 👨🦯 Explore our ['./examples'](https://github.com/microsoft/LLMLingua/blob/main/examples) directory for practical applications, including [RAG](https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb), [Online Meeting](https://github.com/microsoft/LLMLingua/blob/main/examples/OnlineMeeting.ipynb), [CoT](https://github.com/microsoft/LLMLingua/blob/main/examples/CoT.ipynb), [Code](https://github.com/microsoft/LLMLingua/blob/main/examples/Code.ipynb), and [RAG using LlamaIndex](https://github.com/microsoft/LLMLingua/blob/main/examples/RAGLlamaIndex.ipynb). - 👾 LongLLMLingua is now part of the [LlamaIndex pipeline](https://github.com/run-llama/llama_index/blob/main/llama_index/postprocessor/longllmlingua.py), a widely-used RAG framework. ## TL;DR LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss. - [LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825/) (EMNLP 2023)