AwA-1.5B-Q6_K-GGUF / README.md
Triangle104's picture
Update README.md
4bfea37 verified
metadata
base_model: Spestly/AwA-1.5B
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - qwen2
  - trl
  - llama-cpp
  - gguf-my-repo
license: apache-2.0
language:
  - en
library_name: transformers

Triangle104/AwA-1.5B-Q6_K-GGUF

This model was converted to GGUF format from Spestly/AwA-1.5B using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.


Model details:

AwA (Answers with Athena) is my portfolio project, showcasing a cutting-edge Chain-of-Thought (CoT) reasoning model. I created AwA to excel in providing detailed, step-by-step answers to complex questions across diverse domains. This model represents my dedication to advancing AI’s capability for enhanced comprehension, problem-solving, and knowledge synthesis.

Key Features

Chain-of-Thought Reasoning: AwA delivers step-by-step breakdowns of solutions, mimicking logical human thought processes.

Domain Versatility: Performs exceptionally across a wide range of domains, including mathematics, science, literature, and more.

Adaptive Responses: Adjusts answer depth and complexity based on input queries, catering to both novices and experts.

Interactive Design: Designed for educational tools, research assistants, and decision-making systems.

Intended Use Cases

Educational Applications: Supports learning by breaking down complex problems into manageable steps.

Research Assistance: Generates structured insights and explanations in academic or professional research.

Decision Support: Enhances understanding in business, engineering, and scientific contexts.

General Inquiry: Provides coherent, in-depth answers to everyday questions.

Type: Chain-of-Thought (CoT) Reasoning Model

Base Architecture: Adapted from [qwen2]

Parameters: [1.54B]

Fine-tuning: Specialized fine-tuning on Chain-of-Thought reasoning datasets to enhance step-by-step explanatory capabilities.

Ethical Considerations

Bias Mitigation: I have taken steps to minimise biases in the training data. However, users are encouraged to cross-verify outputs in sensitive contexts.

Limitations: May not provide exhaustive answers for niche topics or domains outside its training scope.

User Responsibility: Designed as an assistive tool, not a replacement for expert human judgment.

Usage

Option A: Local

Using locally with the Transformers library

Use a pipeline as a high-level helper

from transformers import pipeline

messages = [ {"role": "user", "content": "Who are you?"}, ] pipe = pipeline("text-generation", model="Spestly/AwA-1.5B") pipe(messages)

Option B: API & Space

You can use the AwA HuggingFace space or the AwA API (Coming soon!) Roadmap

More AwA model sizes e.g 7B and 14B
Create AwA API via spestly package

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Triangle104/AwA-1.5B-Q6_K-GGUF --hf-file awa-1.5b-q6_k.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Triangle104/AwA-1.5B-Q6_K-GGUF --hf-file awa-1.5b-q6_k.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Triangle104/AwA-1.5B-Q6_K-GGUF --hf-file awa-1.5b-q6_k.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Triangle104/AwA-1.5B-Q6_K-GGUF --hf-file awa-1.5b-q6_k.gguf -c 2048