Triangle104 commited on
Commit
1efb71a
·
verified ·
1 Parent(s): 8c80314

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -28,6 +28,64 @@ base_model: microsoft/phi-4
28
  This model was converted to GGUF format from [`microsoft/phi-4`](https://huggingface.co/microsoft/phi-4) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
29
  Refer to the [original model card](https://huggingface.co/microsoft/phi-4) for more details on the model.
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## Use with llama.cpp
32
  Install llama.cpp through brew (works on Mac and Linux)
33
 
 
28
  This model was converted to GGUF format from [`microsoft/phi-4`](https://huggingface.co/microsoft/phi-4) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
29
  Refer to the [original model card](https://huggingface.co/microsoft/phi-4) for more details on the model.
30
 
31
+ ---
32
+ Model details:
33
+ -
34
+ Developers
35
+ -
36
+ Microsoft Research
37
+
38
+ Description
39
+ -
40
+ phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.
41
+
42
+ phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures
43
+
44
+ Architecture
45
+ -
46
+ 14B parameters, dense decoder-only Transformer model
47
+
48
+ Inputs
49
+ -
50
+ Text, best suited for prompts in the chat format
51
+
52
+ Context length
53
+ -
54
+ 16K tokens
55
+
56
+ GPUs
57
+ -
58
+ 1920 H100-80G
59
+
60
+ Training time
61
+ -
62
+ 21 days
63
+
64
+ Training data
65
+ -
66
+ 9.8T tokens
67
+
68
+ Outputs
69
+ -
70
+ Generated text in response to input
71
+
72
+ Dates
73
+ -
74
+ October 2024 – November 2024
75
+
76
+ Status
77
+ -
78
+ Static model trained on an offline dataset with cutoff dates of June 2024 and earlier for publicly available data
79
+
80
+ Release date
81
+ -
82
+ December 12, 2024
83
+
84
+ License
85
+ -
86
+ MIT
87
+
88
+ ---
89
  ## Use with llama.cpp
90
  Install llama.cpp through brew (works on Mac and Linux)
91