QvQ-72B-Previewπ an open weight model for visual reasoning just released by Alibaba_Qwen team Qwen/qvq-676448c820912236342b9888 β¨ Combines visual understanding & language reasoning. β¨ Scores 70.3 on MMMU β¨ Outperforms Qwen2-VL-72B-Instruct in complex problem-solving
Megrez-3B-Omni π₯ an on-device multimodal LLM by Infinigence AI, another startup emerging from the Tsinghua University ecosystem. Model: Infinigence/Megrez-3B-Omni Demo: Infinigence/Megrez-3B-Omni β¨Supports analysis of image, text, and audio modalities β¨Leads in bilingual speech ( English & Chinese ) input, multi-turn conversations, and voice-based queries β¨Outperforms in scene understanding and OCR across major benchmarks
LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend. We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.
3 replies
Β·
reacted to julien-c's
post with β€οΈπ₯15 days ago
After some heated discussion π₯, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community π₯
Last week was crazy in OS AI, with important models and datasets releases every day.
Here are the most important ones I've pinned:
π Cohere relased GLobal-MMLU, a multilingual version of MMLU, to evaluate AI models' world knowledge in many languages!
π¦ Meta released Llama-3.3-70B-Instruct, a 70B model that's on par with Llama-3.1-405B-Instruct, GPT-4o and Claude. Probably my new go-to for agentic workflows.
π FishAudio released fish-speech-1.5, multilingual text to speech model
π¨ Microsoft Research released TRELLIS, an extremely impressive image-to-3D model, which you can try here: JeffreyXiang/TRELLIS
π Yesterday, Hugging Face release FineWeb 2, a new version that extends the previous FineWeb to over 1000 languages, including extended coverage in Russina, Mandarin, German, Japanese, Spanish, French, so a huge, high-quality dataset of > 3 trillion words! HuggingFaceFW/fineweb-2
Now let's go build to make this week as productive as last one!
Open Preference Dataset for Text-to-Image Generation by the π€ Community
Open Image Preferences is an Apache 2.0 licensed dataset for text-to-image generation. This dataset contains 10K text-to-image preference pairs across common image generation categories, while using different model families and varying prompt complexities.
We applied the same data-driven approach that led to SOTA English performance inπ· FineWeb to thousands of languages.
π₯ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.
The dataset is released under the permissive π ODC-By 1.0 license, and the π» code to reproduce it and our evaluations is public.
We will very soon announce a big community project, and are working on a π blogpost walking you through the entire dataset creation process. Stay tuned!
HunyuanVideo πΉ The new open video generation model by Tencent! π tencent/HunyuanVideo zh-ai-community/video-models-666afd86cfa4e4dd1473b64c β¨ 13B parameters: Probably the largest open video model to date β¨ Unified architecture for image & video generation β¨ Powered by advanced features: MLLM Text Encoder, 3D VAE, and Prompt Rewrite β¨ Delivers stunning visuals, diverse motion, and unparalleled stability π Fully open with code & weights
Zhipu AI, the Chinese generative AI startup behind CogVideo, just launched their first productized AI Agent - AutoGLM π₯ π https://agent.aminer.cn
With simple text or voice commands, it: β¨ Simulates phone operations effortlessly β¨ Autonomously handles 50+ step tasks β¨ Seamlessly operates across apps
Powered by Zhipu's "Decoupled Interface" and "Self-Evolving Learning Framework" to achieve major performance gains in Phone Use and Web Browser Use!
Meanwhile, GLM4-Edge is now on Hugging Face hubπ π THUDM/glm-edge-6743283c5809de4a7b9e0b8b Packed with advanced dialogue + multimodal models: π± 1.5B / 2B models: Built for mobile & in-car systems π» 4B / 5B models: Optimized for PCs