--- license: llama3.2 tags: - unsloth - query-expansion datasets: - s-emanuilov/query-expansion base_model: - meta-llama/Llama-3.2-3B-Instruct --- # Query Expansion Dataset - based on Llama-3.2-3B Fine-tuned Llama-3.2-3B model for generating search query expansions. Part of a collection of query expansion models available in different architectures and sizes. ## Overview **Task:** Search query expansion **Base model:** [Llama-3.2-3B-Instruct](https://huggingface.co./meta-llama/Llama-3.2-3B-Instruct) **Training data:** [Query Expansion Dataset](https://huggingface.co./datasets/s-emanuilov/query-expansion) Query Expansion Model ## Variants ### LoRA adaptors - [Qwen2.5-3B](https://huggingface.co./s-emanuilov/query-expansion-Qwen2.5-3B) - [Qwen2.5-7B](https://huggingface.co./s-emanuilov/query-expansion-Qwen2.5-7B) ### GGUF variants - [Qwen2.5-3B-GGUF](https://huggingface.co./s-emanuilov/query-expansion-Qwen2.5-3B-GGUF) - [Qwen2.5-7B-GGUF](https://huggingface.co./s-emanuilov/query-expansion-Qwen2.5-7B-GGUF) - [Llama-3.2-3B-GGUF](https://huggingface.co./s-emanuilov/query-expansion-Llama-3.2-3B-GGUF) Each GGUF model is available in several quantization formats: F16, Q8_0, Q5_K_M, Q4_K_M, Q3_K_M ## Details This model is designed for enhancing search and retrieval systems by generating semantically relevant query expansions. It could be useful for: - Advanced RAG systems - Search enhancement - Query preprocessing - Low-latency query expansion ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from unsloth import FastLanguageModel # Model configuration MODEL_NAME = "s-emanuilov/query-expansion-Llama-3.2-3B" MAX_SEQ_LENGTH = 2048 DTYPE = "float16" LOAD_IN_4BIT = True # Load model and tokenizer model, tokenizer = FastLanguageModel.from_pretrained( model_name=MODEL_NAME, max_seq_length=MAX_SEQ_LENGTH, dtype=DTYPE, load_in_4bit=LOAD_IN_4BIT, ) # Enable faster inference FastLanguageModel.for_inference(model) # Define prompt template PROMPT_TEMPLATE = """Below is a search query. Generate relevant expansions and related terms that would help broaden and enhance the search results. ### Query: {query} ### Expansions: {output}""" # Prepare input query = "apple stock" inputs = tokenizer( [PROMPT_TEMPLATE.format(query=query, output="")], return_tensors="pt" ).to("cuda") # Generate with streaming output from transformers import TextStreamer streamer = TextStreamer(tokenizer) output = model.generate( **inputs, streamer=streamer, max_new_tokens=128, ) ``` ## Example **Input:** "apple stock" **Expansions:** - "apple market" - "apple news" - "apple stock price" - "apple stock forecast" ## Citation If you find my work helpful, feel free to give me a citation. ``` ```