We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at
- What makes llama-index stand-out - How the LlamaHub is used for integrations - Creating QueryEngine components - Using agents and tools - Agentic and multi-agent workflows
The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.
I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with: - inspiration - best practices - finding cool tools
Datasets on the Hugging Face Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDBโs features is vector similarity search which can be used with or without an index.
You can now use the Synthetic Data Generator with your own domain-specific seed data to generate a dataset for fine-tuning retrieval or reranking models.
You can now use the "Synthetic Data Generator" at a much larger scale with your preferred inference engine: Ollama, vLLM, TGI, and serverless inference! ๐ฅ
Introducing the Synthetic Data Generator, a user-friendly application that takes a no-code approach to creating custom datasets with Large Language Models (LLMs). The best part: A simple step-by-step process, making dataset creation a non-technical breeze, allowing anyone to create datasets and models in minutes and without any code.
Open Preference Dataset for Text-to-Image Generation by the ๐ค Community
Open Image Preferences is an Apache 2.0 licensed dataset for text-to-image generation. This dataset contains 10K text-to-image preference pairs across common image generation categories, while using different model families and varying prompt complexities.
This is amazing for cheap models fine-tunes without the hassle of actual deployment! TIL: LoRA fine-tunes for models on the Hub can directly be used for inference!