Triangle104 commited on
Commit
2c25403
Β·
verified Β·
1 Parent(s): bd2bf16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -29,12 +29,15 @@ Key Features
29
  ⚑ Lightweight and Efficient
30
  -
31
  Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
 
32
  Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
 
33
  Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
34
 
35
  πŸ“– Long-Context Understanding
36
  -
37
  Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
 
38
  Token Generation: Can generate up to 8K tokens of output.
39
 
40
  🌍 Multilingual Support
@@ -46,15 +49,21 @@ Japanese, Korean, Vietnamese, Thai, Arabic, and more.
46
  πŸ“Š Structured Data & Outputs
47
  -
48
  Structured Data Interpretation: Processes structured formats like tables and JSON.
 
49
  Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
50
 
51
  Details
52
  -
53
  Base Model: Qwen/Qwen2.5-3B-Instruct
 
54
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
 
55
  Parameters: 3.09B total (2.77B non-embedding).
 
56
  Layers: 36
 
57
  Attention Heads: 16 for Q, 2 for KV.
 
58
  Context Length: Up to 32,768 tokens.
59
 
60
  Applications
@@ -62,10 +71,15 @@ Applications
62
  Athena 3B is designed for a variety of real-world applications:
63
 
64
  Conversational AI: Build fast, responsive, and lightweight chatbots.
 
65
  Code Generation: Generate, debug, or explain code snippets.
 
66
  Mathematical Problem Solving: Assist with calculations and reasoning.
 
67
  Document Processing: Summarize and analyze moderately large documents.
 
68
  Multilingual Applications: Support for global use cases with diverse language requirements.
 
69
  Structured Data: Process and generate structured data, such as tables and JSON.
70
 
71
  Quickstart
 
29
  ⚑ Lightweight and Efficient
30
  -
31
  Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
32
+
33
  Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
34
+
35
  Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
36
 
37
  πŸ“– Long-Context Understanding
38
  -
39
  Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
40
+
41
  Token Generation: Can generate up to 8K tokens of output.
42
 
43
  🌍 Multilingual Support
 
49
  πŸ“Š Structured Data & Outputs
50
  -
51
  Structured Data Interpretation: Processes structured formats like tables and JSON.
52
+
53
  Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
54
 
55
  Details
56
  -
57
  Base Model: Qwen/Qwen2.5-3B-Instruct
58
+
59
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
60
+
61
  Parameters: 3.09B total (2.77B non-embedding).
62
+
63
  Layers: 36
64
+
65
  Attention Heads: 16 for Q, 2 for KV.
66
+
67
  Context Length: Up to 32,768 tokens.
68
 
69
  Applications
 
71
  Athena 3B is designed for a variety of real-world applications:
72
 
73
  Conversational AI: Build fast, responsive, and lightweight chatbots.
74
+
75
  Code Generation: Generate, debug, or explain code snippets.
76
+
77
  Mathematical Problem Solving: Assist with calculations and reasoning.
78
+
79
  Document Processing: Summarize and analyze moderately large documents.
80
+
81
  Multilingual Applications: Support for global use cases with diverse language requirements.
82
+
83
  Structured Data: Process and generate structured data, such as tables and JSON.
84
 
85
  Quickstart