shimmyshimmer
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -58,9 +58,6 @@ Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source
|
|
58 |
Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.
|
59 |
In addition, its training process is remarkably stable.
|
60 |
Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.
|
61 |
-
<p align="center">
|
62 |
-
<img width="80%" src="figures/benchmark.png">
|
63 |
-
</p>
|
64 |
|
65 |
## 2. Model Summary
|
66 |
|
|
|
58 |
Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.
|
59 |
In addition, its training process is remarkably stable.
|
60 |
Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.
|
|
|
|
|
|
|
61 |
|
62 |
## 2. Model Summary
|
63 |
|