QuantFactory/sarashina2.1-1b-sft-GGUF

This is quantized version of Aratako/sarashina2.1-1b-sft created using llama.cpp

Original Model Card

sarashina2.1-1b-sft

sbintuitions/sarashina2.1-1bを手元のデータでInstruction Tuningしたモデルです。

This model is built with Qwen.

使い方

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("Aratako/sarashina2.1-1b-sft", device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("Aratako/sarashina2.1-1b-sft")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
  {"role": "user", "content": "ソフトバンク株式会社について教えてください。"}
]


input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids,
                            max_new_tokens=1024,
                            do_sample=False,
                            streamer=streamer,
                            eos_token_id=tokenizer.eos_token_id,
                           )

ソフトバンク株式会社は、日本の大手通信会社の一つで、主に携帯電話やインターネットサービス、そして様々なデジタル製品やサービスを提供しています。

1. **事業内容**:
   - 携帯電話事業: ソフトバンクモバイル（現・ソフトバンク）として、携帯電話やスマートフォンの販売を行っています。
   - インターネットサービス: ソフトバンクBB（現・ソフトバンクBB）として、インターネット接続サービスやISP事業を行っています。
   - デジタル製品: スマートフォンやタブレット、ウェアラブルデバイスなどの開発・販売も行っています。
   - クラウドサービス: ソフトバンククラウドとして、クラウドサービスを提供しています。
   - 金融サービス: ソフトバンクファイナンスとして、クレジットカードやローンなどの金融サービスを提供しています。

2. **グループ会社**:
   - ソフトバンクグループ: ソフトバンク株式会社の親会社であり、通信事業だけでなく、テクノロジー、メディア、金融など多岐にわたる事業を展開しています。
   - ソフトバンクBB: インターネット接続サービスやISP事業を行う子会社です。
   - ソフトバンクモバイル: 携帯電話事業を行う子会社です。
   - ソフトバンクコマース&サービス: コマース事業を行う子会社です。
   - ソフトバンクテクノロジー: ITソリューション事業を行う子会社です。

3. **特徴**:
   - ソフトバンクは、通信事業だけでなく、テクノロジーやメディア、金融など多岐にわたる分野で事業を展開しており、その規模と影響力は非常に大きいです。
   - 「ソフトバンクグループ」として、通信事業だけでなく、テクノロジー、メディア、金融など多岐にわたる事業を展開しています。
   - 「Pepper」などのロボット開発や、AI（人工知能）技術を活用したサービス開発にも力を入れています。

4. **社会貢献活動**:
   - ソフトバンクは、社会貢献活動にも積極的に取り組んでおり、教育支援や災害時の救援活動など、様々な形で社会に貢献しています。

ソフトバンクは、通信業界だけでなく、幅広い分野で革新的なサービスを提供し、日本の社会や経済に大きな影響を与えています。

応答例

入力（elyza/ELYZA-tasks-100より）

<|im_start|>user
仕事の熱意を取り戻すためのアイデアを5つ挙げてください。<|im_end|>
<|im_start|>assistant

出力

もちろん、以下にいくつかのアイデアを提案します：

1. 目標設定: 明確な目標を設定し、それを達成するための具体的なステップを計画します。これにより、仕事に対するモチベーションが高まります。

2. 休憩を取る: 長時間働くと集中力が低下します。定期的に休憩を取り、リフレッシュすることが重要です。

3. 新しいスキルを学ぶ: 新しい技術や知識を習得することで、自己成長を感じ、それが仕事への情熱を再び引き出すことができます。

4. 健康的なライフスタイルを維持する: 適度な運動や十分な睡眠は、精神的な健康を維持し、エネルギーレベルを高めます。

5. ネットワーキング: 同僚や他の専門家との交流を通じて新たな視点を得ることができます。これはあなたの視野を広げ、新たな機会を見つける手助けになります。

ライセンス

元モデルからSarashina Model NonCommercial License Agreementを継承します。利用にあたっては様々な制約があるので十分にご注意ください。

注意

本モデルの製作者は元モデルの製作者であるSB Intuitions株式会社とは一切関係ありません。本モデルに何か問題があった場合は製作者である私宛にご連絡ください。
モデルの安全性は検証されていません。ご了承ください。

See axolotl config

axolotl version: 0.5.2

base_model: sbintuitions/sarashina2.1-1b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

hub_model_id: Aratako/sarashina2.1-1b-sft
hub_strategy: "end"
push_dataset_to_hub:
hf_use_auth_token: true

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_cross_entropy: false
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true

load_in_8bit: false
load_in_4bit: false
strict: false

chat_template: chatml

datasets:
  - path: Aratako/Magpie-Tanuki-Qwen2.5-72B-Answered
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/magpie-qwen2.5-32b-reasoning-100k-formatted
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: Aratako/Open-Platypus-Japanese-masked-formatted
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: kanhatakeyama/wizardlm8x22b-logical-math-coding-sft_additional-ja
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3
    split: 20240806filtered
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: llm-jp/magpie-sft-v1.0
    type: chat_template
    field_messages: conversations
    message_field_role: role
    message_field_content: content
  - path: Aratako/aya-ja-evol-instruct-calm3-dpo-masked-sft
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/aya-ja-nemotron-dpo-masked-sft
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k
    split: "train[0:50000]"
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/orca-agentinstruct-1M-v1-selected-2
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content
  - path: Aratako/Synthetic-JP-EN-Coding-Dataset-801k-50k
    type: chat_template
    field_messages: messages
    message_field_role: role
    message_field_content: content

shuffle_merged_datasets: true
dataset_prepared_path: /workspace/data/fft-data-sarashina
val_set_size: 0.002
output_dir: /workspace/data/1b-fft-out

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:

wandb_project: 1b-fft
wandb_entity: aratako-lm
wandb_watch:
wandb_name: fft-attempt-1
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 8
num_epochs: 2
optimizer: adamw_torch
lr_scheduler: cosine
cosine_min_lr_ratio: 0.1
learning_rate: 0.00002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: false
early_stopping_patience:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

save_strategy: steps
save_steps: 100
save_total_limit: 1

warmup_steps: 20
eval_steps: 100
eval_batch_size: 1
eval_table_size:
eval_max_new_tokens:
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero1.json
weight_decay: 0.01
fsdp:
fsdp_config:
special_tokens:
  pad_token: <pad>

tokens:
  - "<|im_start|>"
  - "<|im_end|>"

sarashina2.1-1b-sft

This model is a fine-tuned version of sbintuitions/sarashina2.1-1b on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.9366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 4
total_train_batch_size: 256
total_eval_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 20
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
1.2935	0.0015	1	1.4733
0.985	0.1515	100	1.0491
0.9131	0.3029	200	1.0156
0.9174	0.4544	300	0.9935
0.9257	0.6058	400	0.9806
0.869	0.7573	500	0.9694
0.8874	0.9087	600	0.9608
0.8041	1.0594	700	0.9557
0.8348	1.2109	800	0.9512
0.8353	1.3624	900	0.9466
0.8145	1.5138	1000	0.9432
0.8057	1.6653	1100	0.9400
0.838	1.8167	1200	0.9381
0.8446	1.9682	1300	0.9366

Framework versions

Transformers 4.46.3
Pytorch 2.3.1+cu121
Datasets 3.1.0
Tokenizers 0.20.3

QuantFactory
/

sarashina2.1-1b-sft-GGUF

QuantFactory/sarashina2.1-1b-sft-GGUF

Original Model Card

sarashina2.1-1b-sft

使い方

応答例

入力（elyza/ELYZA-tasks-100より）

出力

ライセンス

注意

sarashina2.1-1b-sft

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for QuantFactory/sarashina2.1-1b-sft-GGUF

Evaluation results