Model Card for Model ID

Model Description

This model was finetuned for the "Feather in Focus!" Kaggle competition of the Information Studies Master's Applied Machine Learning course at the University of Amsterdam. The goal of the competition was to apply novel approaches to achieve the highest possible accuracy on a bird classification task with 200 classes. We were given a labeled dataset of 3926 images and an unlabeled dataset of 4000 test images. Out of 32 groups and 1083 submissions, we achieved the #1 accuracy on the test set with a score of 0.87950.

Training Details

The model we are finetuning, microsoft/swin-large-patch4-window12-384-in22k, was pre-trained on imagenet-21k, see https://huggingface.co./microsoft/swin-large-patch4-window12-384-in22k.

Preprocessing

Data augmentation was applied to the training data in a custom Torch dataset class. Because of the size of the dataset, images were not replaced but were duplicated and augmented. The only augmentations applied were HorizontalFlips and Rotations (10 degrees) to align with the relatively homogeneous dataset.

Finetuning

Finetuning was done on some 50 different models including different VTs and CNNs. All models were trained for 10 epochs with the best model, based on the evaluation acccuracy, saved every epoch.

Finetuning Data

The finetuning data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. We finetuned the model on 3533 samples of the labeled dataset we were given, stratified on the label (7066 including augmented images).

Finetuning Hyperparameters

Hyperparameter	Value
Optimizer	AdamW
Learning Rate	1e-4
Batch Size	32
Epochs	2
Weight Decay	*
Class Weight	*
Label Smoothing	*
Scheduler	*
Mixed Precision	Torch AMP

*parameters were intentionally not set because of poor results

Evaluation Data

The evaluation data is a subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/. We evaluated the model on 393 samples of the labeled dataset we were given, stratified on the label.

Testing Data

The testing data is a subset of an unlabeled subset of the cub-200-2011 dataset, http://www.vision.caltech.edu/datasets/cub_200_2011/ of 4000 images. After model finetuning the best model, based on the evaluation data, would be loaded. This model would then be used to predict the labels of the unlabeled test set. These predicted labels were submitted to the Kaggle competition via CSV which returned the test accuracy.

Poster

*novel approaches were not applied when finetuning the final model as they did not improve accuracy.

Emiel
/

cub-200-bird-classifier-swin