Triangle104
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -29,12 +29,15 @@ Key Features
|
|
29 |
β‘ Lightweight and Efficient
|
30 |
-
|
31 |
Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
|
|
|
32 |
Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
|
|
|
33 |
Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
|
34 |
|
35 |
π Long-Context Understanding
|
36 |
-
|
37 |
Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
|
|
|
38 |
Token Generation: Can generate up to 8K tokens of output.
|
39 |
|
40 |
π Multilingual Support
|
@@ -46,15 +49,21 @@ Japanese, Korean, Vietnamese, Thai, Arabic, and more.
|
|
46 |
π Structured Data & Outputs
|
47 |
-
|
48 |
Structured Data Interpretation: Processes structured formats like tables and JSON.
|
|
|
49 |
Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
|
50 |
|
51 |
Details
|
52 |
-
|
53 |
Base Model: Qwen/Qwen2.5-3B-Instruct
|
|
|
54 |
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
|
|
55 |
Parameters: 3.09B total (2.77B non-embedding).
|
|
|
56 |
Layers: 36
|
|
|
57 |
Attention Heads: 16 for Q, 2 for KV.
|
|
|
58 |
Context Length: Up to 32,768 tokens.
|
59 |
|
60 |
Applications
|
@@ -62,10 +71,15 @@ Applications
|
|
62 |
Athena 3B is designed for a variety of real-world applications:
|
63 |
|
64 |
Conversational AI: Build fast, responsive, and lightweight chatbots.
|
|
|
65 |
Code Generation: Generate, debug, or explain code snippets.
|
|
|
66 |
Mathematical Problem Solving: Assist with calculations and reasoning.
|
|
|
67 |
Document Processing: Summarize and analyze moderately large documents.
|
|
|
68 |
Multilingual Applications: Support for global use cases with diverse language requirements.
|
|
|
69 |
Structured Data: Process and generate structured data, such as tables and JSON.
|
70 |
|
71 |
Quickstart
|
|
|
29 |
β‘ Lightweight and Efficient
|
30 |
-
|
31 |
Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
|
32 |
+
|
33 |
Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
|
34 |
+
|
35 |
Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
|
36 |
|
37 |
π Long-Context Understanding
|
38 |
-
|
39 |
Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
|
40 |
+
|
41 |
Token Generation: Can generate up to 8K tokens of output.
|
42 |
|
43 |
π Multilingual Support
|
|
|
49 |
π Structured Data & Outputs
|
50 |
-
|
51 |
Structured Data Interpretation: Processes structured formats like tables and JSON.
|
52 |
+
|
53 |
Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
|
54 |
|
55 |
Details
|
56 |
-
|
57 |
Base Model: Qwen/Qwen2.5-3B-Instruct
|
58 |
+
|
59 |
Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
60 |
+
|
61 |
Parameters: 3.09B total (2.77B non-embedding).
|
62 |
+
|
63 |
Layers: 36
|
64 |
+
|
65 |
Attention Heads: 16 for Q, 2 for KV.
|
66 |
+
|
67 |
Context Length: Up to 32,768 tokens.
|
68 |
|
69 |
Applications
|
|
|
71 |
Athena 3B is designed for a variety of real-world applications:
|
72 |
|
73 |
Conversational AI: Build fast, responsive, and lightweight chatbots.
|
74 |
+
|
75 |
Code Generation: Generate, debug, or explain code snippets.
|
76 |
+
|
77 |
Mathematical Problem Solving: Assist with calculations and reasoning.
|
78 |
+
|
79 |
Document Processing: Summarize and analyze moderately large documents.
|
80 |
+
|
81 |
Multilingual Applications: Support for global use cases with diverse language requirements.
|
82 |
+
|
83 |
Structured Data: Process and generate structured data, such as tables and JSON.
|
84 |
|
85 |
Quickstart
|