Training run to compare Mixture-of-Depths, Bitnet

Wandb Report

image/png"

4 Models trained for 100k steps on Dolma

  • OLMo-50M - 50M parameter model
  • OLMo-50M-bitlinear - 50M parameter bitnet model
  • OLMo-50M-mod - 50M parameter mixture-of-depths model
  • OLMo-50M-mod-bitlinear - 50M parameter mixture-of-depths bitnet model

Repo has zip files which include training states and other files for each model. I am not the author of the mixture-of-depths implementation, it can be found here This is the first run. A few things might be broken, still a work in progress

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train 0-hero/BitMoD