Here's a quick walk through of the first drop of material that works toward the use case:
- a fundamental introduction to reinforcement learning. Answering questions like, βwhat is a reward?β and βhow do we create an environment for a language model?β
- Then it focuses on Deepseek R1 by walking through the paper and highlighting key aspects. This is an old school way to learn ML topics, but it always works.
- Next, it takes to you Transformers Reinforcement Learning and demonstrates potential reward functions you could use. This is cool because it uses Marimo notebooks to visualise the reward.
- Finally, Maxime walks us through a real training notebook that uses GRPO to reduce generation length. Iβm really into this because it works and Maxime took the time to validate it share assets and logging from his own runs for you to compare with.
Maximeβs work and notebooks have been a major part of the open source community over the last few years. I, like everyone, have learnt so much from them.
We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at
- What makes llama-index stand-out - How the LlamaHub is used for integrations - Creating QueryEngine components - Using agents and tools - Agentic and multi-agent workflows
The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.
This week we are releasing the first framework unit in the course and itβs on smolagents. This is what the unit covers:
- why should you use smolagents vs another library? - how to build agents that use code - build multiagents systems - use vision language models for browser use
The team has been working flat out on this for a few weeks. Led by @sergiopaniego and supported by smolagents author @m-ric.
AGENTS + FINETUNING! This week Hugging Face learn has a whole pathway on finetuning for agentic applications. You can follow these two courses to get knowledge on levelling up your agent game beyond prompts:
NEW COURSE! Weβre cooking hard on Hugging Face courses, and itβs not just agents. The NLP course is getting the same treatment with a new chapter on Supervised Fine-Tuning!
I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with: - inspiration - best practices - finding cool tools
This first unit of the course sets you up with all the fundamentals to become a pro in agents.
- What's an AI Agent? - What are LLMs? - Messages and Special Tokens - Understanding AI Agents through the Thought-Action-Observation Cycle - Thought, Internal Reasoning and the Re-Act Approach - Actions, Enabling the Agent to Engage with Its Environment - Observe, Integrating Feedback to Reflect and Adapt
π Why do I love it? Because it facilitates teaching and learning!
Over the past few months I've engaged with (no joke) thousands of students based on SmolLM.
- People have inferred, fine-tuned, aligned, and evaluated this smol model. - People used they're own machines and they've used free tools like colab, kaggle, and spaces. - People tackled use cases in their job, for fun, in their own language, and with their friends.
Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDBβs features is vector similarity search which can be used with or without an index.