Edit model card
Configuration Parsing Warning: In config.json: "architectures" must be an array
Configuration Parsing Warning: In config.json: "model_type" is not allowed to be empty
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co./docs/hub/model-cards#model-card-metadata)

Model Card: Custom Language Model

Overview

This model was trained using the WikiText-103 dataset to generate text based on input prompts.

Dataset

Dataset Used: WikiText-103

Source: Hugging Face Datasets

Dataset Details: The WikiText-103 dataset is a collection of over 100 million tokens extracted from the set of verified "Good" and "Featured" articles on Wikipedia. It is designed for language modeling and other text generation tasks.

Data Cleaning

To ensure high-quality input for training, the dataset underwent the following cleaning steps:

  1. Removal of non-standard characters and punctuation.
  2. Tokenization using BERT's tokenizer.
  3. Lowercasing all text.
  4. Filtering out any overly short or long sequences to maintain a consistent input size.

Neural Network Definition

The neural network used for this model is based on a transformer architecture with the following specifications:

  • Model Type: BERT-based transformer
  • Number of Layers: 5
  • Dropout: Applied at each layer to prevent overfitting
  • Optimizer: AdamW with a learning rate of 5e-5
  • Loss Function: Cross-entropy loss for language modeling

Training Details

The model was trained on an L4 GPU with the following resources:

  • CPU Cores: 16
  • System RAM: 62.8 GB
  • GPU RAM: 22.5 GB
  • Disk: 201.2 GB

Training Configuration:

  • Batch Size: Dynamic, adjusted based on GPU RAM availability
  • Epochs: 50
  • Initial Learning Rate: 5e-5

Training Results

The training involved several experiments with different batch sizes and epochs. The final training loss was plotted to visualize the model's performance.

Usage

To use this model, you can load it from Hugging Face and generate text as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("RicardoPoleo/DL_LLM_from_scratch_2")
model = AutoModelForCausalLM.from_pretrained("RicardoPoleo/DL_LLM_from_scratch_2")

input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .