209 6 9

Nicolas Patry

Narsil

https://github.com/Narsil/

AI & ML interests

None yet

Recent Activity

posted an update 13 days ago

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config ! 3x more tokens. By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments. 13x faster On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Daniël de Kok for the beast data structure. Zero config That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios. Read more: https://huggingface.co./docs/text-generation-inference/conceptual/chunking

new activity 15 days ago

huggingchat/chat-ui:Your feedback on HuggingChat

upvoted a paper about 2 months ago

GPT-4o System Card

View all activity

Articles

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Feb 1, 2022

• 6

Organizations

Narsil's activity

posted an update 13 days ago

Post

936

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co./docs/text-generation-inference/conceptual/chunking

reacted to alex-abb's post with 🔥 6 months ago

Post

4813

Hi everyone!
I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.

alex-abb/LLM_Feeling_Analyzer

4 replies

reacted to mitkox's post with ❤️ 6 months ago

Post

3414

I've made an on device AI comparison between open source, Apple Intelligence, and Microsoft Copilot+ PC. This OS and applications level integration will bring GenAI to everyone, be it consumers or businesses, over the next year.

Communities and BigTech hold divergent visions regarding the problems they aim to solve, ways to lock in users and enterprises, as well as their commercialization and GTM strategies.

I'm aware that this table has the potential to expand into an epic 30-page saga during an in-depth analysis, but hey, it's a beginning. Do you think I should throw in a few more comparisons? I'm all ears for your thoughts and critiques!

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it

1 reply

reacted to dvilasuero's post with 🔥🤗 7 months ago

Post

8073

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

replied to flashback29's post 7 months ago

Are you sure you're using the appropriate token ?
Does it still happen ?

If it still persists, the error is really likely to come from the token being not the one you expect.
If it's really not that, we can double check things.

posted an update 7 months ago

Post

1864

text-generation-inference v2.0.3 is out.

Main new features:
- Falcon2 support
- PaliGemma support
- New faster speculation method from IBM !

https://github.com/huggingface/text-generation-inference/releases

reacted to jeffboudier's post with 🚀 8 months ago

Post

1686

TGI v2.0.2 is out!
- New models (idefics2, phi3)
- Cleaner VLM support in the openai layer
- Upgraded to pytorch 2.3.0

https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

Kudos @Narsil @olivierdehaene @drbh and so many contributors!

posted an update 8 months ago

Post

1273

text-generation-inference 2.0.2 is out.

- Native support for Idefics2, with much better efficiency than llava 1.6 (next) !

Phi3, Increase VLM support in the openai layer.

Release notes https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

reacted to VictorSanh's post with 🔥 8 months ago

Post

2785

Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs. 🏆
2nd in its category (<10B parameters and open weights)!

While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data.

Leaderboard: opencompass/open_vlm_leaderboard
Model: HuggingFaceM4/idefics2-8b

9 replies

Nicolas Patry

AI & ML interests

Recent Activity

Articles

Hugging Face partners with Wiz Research to Improve AI Security

Safetensors audited as really safe and becoming the default

Optimization story: Bloom inference

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Organizations

Narsil's activity