Post
3348
What an eventful day in Open Source LLMs today:
Mistral released Codestral Mamba π
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!
Model checkpoint: https://huggingface.co./mistralai/mamba-codestral-7B-v0.1
Hugging Face dropped SmolLM π€
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!
Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966
Mistral released Mathstral 7B β
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license
Model checkpoint: https://huggingface.co./mistralai/mathstral-7B-v0.1
Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! π€
What's your favourite from the release today?
Mistral released Codestral Mamba π
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!
Model checkpoint: https://huggingface.co./mistralai/mamba-codestral-7B-v0.1
Hugging Face dropped SmolLM π€
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!
Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966
Mistral released Mathstral 7B β
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license
Model checkpoint: https://huggingface.co./mistralai/mathstral-7B-v0.1
Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! π€
What's your favourite from the release today?