trojblue/distill-q-align-aesthetic-siglip2-base
This model is a fine-tuned version of google/siglip2-base-patch16-512, specialized for predicting image aesthetics in anime illustrations based on Q-Align's quality metrics.
It achieves the following performance on the evaluation set:
- Loss: 0.0225
- Mse: 0.0225
- Rmse: 0.1502
NOTE: the Q-align model was not well-suited for anime aesthetics, and more or less have a "western" feel to it. The distilled version of Q-Align quality score may be slightly preferrable for actual usages.
Model Description
This model is a distilled version of the original Q-Align model, using Siglip2-base as its backbone. It is designed for simpler, faster, and more practical deployment compared to the original model. Benefits include:
- Easier integration (no special transformer version required).
- Smaller model size and efficient batching.
- Produces results very close to the original Q-Align model.
Intended Uses & Limitations
This model is designed specifically for Image Quality Analysis (IQA) tasks on anime illustrations.
It distinguishes image quality from aesthetics, making it suitable for image filtering, recommendations, and curation tasks where quality assessment is essential.
Training and Evaluation Data
The model was trained using approximately 5.8 million anime illustrations sourced equally from Danbooru and Twitter. The training procedure involved:
- Generating aesthetic predictions using the original Q-Align model.
- Training the Siglip2-base model to mimic these predictions.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 320
- eval_batch_size: 256
- seed: 1337
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 1.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Mse | Rmse |
---|---|---|---|---|---|
0.0405 | 0.1634 | 2400 | 0.0348 | 0.0348 | 0.1867 |
0.0371 | 0.3268 | 4800 | 0.0297 | 0.0297 | 0.1724 |
0.0369 | 0.4902 | 7200 | 0.0311 | 0.0311 | 0.1764 |
0.0315 | 0.6536 | 9600 | 0.0237 | 0.0237 | 0.1540 |
0.033 | 0.8170 | 12000 | 0.0228 | 0.0228 | 0.1510 |
0.0303 | 0.9805 | 14400 | 0.0224 | 0.0224 | 0.1498 |
Framework versions
- Transformers 4.49.0
- Pytorch 2.1.0+cu121
- Datasets 3.3.2
- Tokenizers 0.21.0
- Downloads last month
- 2
Model tree for trojblue/distill-q-align-aesthetic-siglip2-base
Base model
google/siglip2-base-patch16-512