π Supercharge your LLM apps with Langfuse on Hugging Face Spaces!
Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production
Now available as a Docker Space directly on the HF Hub! π€
π Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks 1β£ One-click deployment: on Spaces with persistent storage and integrated OAuth π Simple Prompt Management: Version, edit, and update without redeployment β Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality π Dataset Creation: Build datasets directly from production data to enhance future performance
Kudos to the Langfuse team for this collab and the awesome, open-first product theyβre building! π @marcklingen@Clemo@MJannik
Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development. The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6
π Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:
- β‘ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well - π Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection - π§ Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k). - π‘ Whatβs next? Iβm preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.
After some heated discussion π₯, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community π₯
Six predictions for AI in 2025 (and a review of how my 2024 predictions turned out):
- There will be the first major public protest related to AI - A big company will see its market cap divided by two or more because of AI - At least 100,000 personal AI robots will be pre-ordered - China will start to lead the AI race (as a consequence of leading the open-source AI race). - There will be big breakthroughs in AI for biology and chemistry. - We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price β (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs β with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry β for video π΄for time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI β Monetary π΄Environmental (π’)
- A popular media will be mostly AI-generated β with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment πcurrently 7M of AI builders on Hugging Face
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
- SmolVLM generates tokens 7.5 to 16 times faster than Qwen2-VL! π€― - Other models at this size crash a laptop, but SmolVLM comfortably generates 17 tokens/sec on a macbook! π - SmolVLM can be fine-tuned on a Google collab! Or process millions of documents with a consumer GPU! - SmolVLM even outperforms larger models in video benchmarks, despite not even being trained on videos!
This is no Woodstock AI but will be fun nonetheless haha. Iβll be hosting a live workshop with team members next week about the Enterprise Hugging Face hub.
1,000 spots available first-come first serve with some surprises during the stream!
Maybe like me you have always wanted a super easy way to compare llama3.2-1B vs. llama3.2-3B? or the same model with different temperatures?
Trying and comparing warm Inference API models has never been easier! Just go to https://hf.co/playground, set your token and you're ready to go. We'll keep improving, feedback welcome π