mmnga
/

DeepSeek-V3-slice-jp64

Model card Files Files and versions Community

DeepSeek-V3-slice-jp64 / README.md

mmnga's picture

Update README.md

cb13c9b verified 26 days ago

|

history blame contribute delete

1.72 kB

	---
	license: other
	language:
	- ja
	base_model:
	- deepseek-ai/DeepSeek-V3
	---
	# DeepSeek-V3-slice-jp64

	## 実験モデルです
	本モデルは [DeepSeek-V3](https://huggingface.co./deepseek-ai/DeepSeek-V3) をベースに、日本語の例文を元に頻出する MoE (Mixture of Experts) の各レイヤーごとのexpertsを厳選して再構成したモデルです。
	元のモデルでは 256 のexpertsを搭載していますが、日本語出力における安定性とパフォーマンスのバランスを重視し、各層で頻出する 64 のexpertsを使用するように調整しています。

	### 例文出力時の各layerごとのexpertsの頻出分布
	![](layer_topk_idx_distribution_bubble.png)
	---

	## ライセンス
	ご使用前にライセンスファイルをご確認ください。
	[DeepSeek-V3](https://huggingface.co./deepseek-ai/DeepSeek-V3) こちらのライセンスをそのまま使用しています。

	## 特徴

	- MoEモデルのexpertsから、日本語の例文出力をして各layerごとに頻出する64のexpertをして組み直したモデルです。
	- 16ではまともに動かず、32では安定しなかったため64expertsにしています。
	- scripts/layer_topk_idx_distribution.json
	- 各layerごとに頻出順に128のexpertのrankが記録されています。
	- scripts/deepseek_slice.py
	- 元モデル（bf16）から、64のexpertを使用したモデル（bf16）を作成します。
	- scripts/model_test.py
	- モデル実行用テスト用のスクリプトです。コメントアウトされている例文を元に頻出するexpertを計測しています

	---

	## 使い方
	`scripts/model_test.py`に実行コードあります