ruGPT-3.5-13B / chain of thought

LoRA адаптер для ruGPT3.5-13B обученный на датасете evilfreelancer/ru-chain-of-thought-sharegpt данный датасет представляет из себя перевод на русский датасета isaiahbjork/chain-of-thought-sharegpt при помощи модели utrobinmv/t5_translate_en_ru_zh_small_1024 прикладываю скрипт перевода на Gist.

Конфигурация: https://github.com/EvilFreelancer/impruver/blob/main/configs/ruGPT35_13B_lora_cot.yaml

Адаптер обучался на 1x RTX 4090, для этого потребовалось примерно 20Gb VRAM и заняло 19m.

output_dir: ./models/ruGPT35_13B_lora_cot
train_path: ./train.ruGPT35_13B_cot.jsonl
val_path: ./val.ruGPT35_13B_cot.jsonl

datasets:
  - name: evilfreelancer/ru-chain-of-thought-sharegpt
    converter: impruver.conversations_to_messages

model:
  class: transformers.AutoModelForCausalLM
  name: ai-forever/ruGPT-3.5-13B
  load_in_4bit: true
  load_in_8bit: false
  dtype: bf16

lora:
  r: 16
  lora_alpha: 16
  lora_dropout: 0.05
  bias: none
  target_modules: [ c_attn ]
  task_type: CAUSAL_LM

tokenizer:
  class: transformers.AutoTokenizer
  name: ai-forever/ruGPT-3.5-13B
  max_tokens_count: 1200

trainer:
  eval_strategy: steps
  save_strategy: steps
  eval_steps: 100
  save_steps: 100
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 1
  gradient_accumulation_steps: 5
  logging_steps: 1
  learning_rate: 0.0002
  num_train_epochs: 2
  lr_scheduler_type: cosine
  warmup_steps: 16
  optim: adamw_8bit
  metric_for_best_model: eval_loss
  load_best_model_at_end: true
  save_total_limit: 2
  seed: 42
  remove_unused_columns: false
  max_grad_norm: 1.0
  weight_decay: 0.08
  torch_compile: false
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for evilfreelancer/ruGPT3.5-13B-lora-chain-of-thought

Adapter
(3)
this model

Dataset used to train evilfreelancer/ruGPT3.5-13B-lora-chain-of-thought