Model Card for Drexel Metadata Generator

This model was designed to generate metadata for images of [museum] fish specimens. In addition to the metadata and quality metrics achieved with our initial model (detailed in Automatic Metadata Generation for Fish Specimen Image Collections, bioRXiv), this updated model also generates various geometric and statistical properties on the mask generated over the biological specimen presented. Some examples of the new analytical features include convex area, eccentricity, perimeter, and skew. The updates to our model further improve on the accuracy, and time and labor cost over human generation.

Model Details

Model Description

This model is based on Facebook AI Research’s (FAIR) detectron tool (the implementation of the Mask R-CNN architecture). There are five object classes identifiable by the model: fish, fish eyes, rulers, and the numbers two and three on rulers, as shown in Fig. 2, below. Of note is its capability to return pixel-by-pixel masks over detected objects.

Figure 2
Figure 2.

Furthermore, detectron places no restriction on the number of object classes, can classify an arbitrary number of objects within a given image, and it is relatively straightforward to train it on COCO format datasets. Objects that have a confidence score of at least 30% are maintained for analysis. See the tables below for the number of class instances in the full (aggregate), INHS, and UWZM training datasets.

Table 1: Aggregate training dataset Table 2: INHS Training Dataset Table 3: UWZM Training Dataset
Class Number of Instances Class Number of Instances Class Number of Instances
Fish 391 Fish 312 Fish 79
Ruler 1095 Ruler 1016 Ruler 79
Eye 550 Eye 471 Eye 79
Two 194 Two 115 Two 79
Three 194 Three 115 Three 79

See the Glossary below for a detailed list of the properties generated by the model.

  • Developed by: Joel Pepper and Kevin Karnani
  • Model type: Pytorch pickle file (.pth)
  • License: MIT
  • Finetuned from model: detectron2 v0.6

Model Sources

Uses

Direct Use

Object detection is currently being performed on 5 detection classes (fish, fish eyes, rulers, and the twos and threes found on rulers). The current setup is performed on the INHS and UWZM biological specimen image repositories.

Current Criteria

  1. Image must contain a fish species (no eels, seashells, butterflies, seahorses, snakes, etc).
  2. Image must contain only 1 of each class (except eyes).
  3. Specimen body must lie alone the image plane from a side view.
  4. Ruler must be consistent (only two ruler types, INHS + UWZM, were used in training set).
  5. Fish must not be obscured by another object (petri dish for example).
  6. Whole body of fish must be present (no heads, tails, or standalone features).
  7. Fish body must not be folded and should have no curvature.
  8. These do not need to be adhered to if properly set up/modified for a specific use case.

Bias, Risks, and Limitations

  • This model was trained solely for use on fish specimens.
  • The model can detect and process multiple fish within a single image, although the capability is not extensively tested.
  • The model was only trained on rectangular, machine printed tags that are aligned with the image (i.e. tags placed at an angle may not be handled correctly).

The authors have declared that no conflict of interest exists.

How to Get Started with the Model

Dependencies

Every dependency is stored in a Pipfile. To set this up, run the following commands:

pip install pipenv
pipenv install

There may be OS dependent installations one may need to perform.

Running

To generate the metadata, run the following command:

pipenv run python3 gen_metadata.py [file_or_dir_name]

Usage:

gen_metadata.py [-h] [--device {cpu,cuda}] [--outfname OUTFNAME] [--maskfname MASKFNAME] [--visfname VISFNAME]
                       file_or_directory [limit]

The limit parameter will limit the number of files processed in the directory. The limit positional argument is only applicable when passing a directory.

Device Configuration

By default gen_metadata.py requires a GPU (cuda). To use a CPU instead pass the --device cpu argument to gen_metadata.py.

Single File Usage

The following three arguments are only supported when processing a single image file:

  • --outfname <filename> - When passed the script will save the output metadata JSON to <filename> instead of printing to the console (the default behavior when processing one file).
  • --maskfname <filename> - Enables logic to save an output mask to <filename> for the single input file.
  • --visfname <filename> - Changes the script to save the output visualization to <filename> instead of the hard coded location.

These arguments are meant to simplify adding gen_metadata.py to a workflow that process files individually.

Running with Singularity

A Docker container is automatically built for each drexel_metadata release. This container has the requirements installed and includes the model file. To run the singularity container for a specific version follow this pattern:

singularity run docker://ghcr.io/hdr-bgnn/drexel_metadata:<release> gen_metadata.py ...

Training Details

Training Data

Labeled by hand using makesense.ai (Skalski, P.: Make Sense. https://github.com/SkalskiP/make-sense/ (2019)).

Initially, we had 64 examples of each class from the UWZM collection in the training set. One issue that we encountered was the lack of catfish (notorus genus) in the training set, which led to a high count of undetected eyes in the testing set. Visually it is difficult even for humans to determine the location of catfish eyes given that they are either very close to the color of the skin or do not look like normal fish eyes. Thus, 15 catfish images from each image dataset were added to the training set.

Training Procedure

Setup:

  1. Create a COCO JSON training set using the images and labels.
    1. This is done currently using makesense in their polygon object detection mode.
    2. The labels currently used are: fish, ruler, eye, two, three in that exact order.
    3. Save as a COCO JSON after performing manual segmentations and labeling. Then, place in datasets.
  2. In the config directory, create a JSON file with a key name of the image directory on your local system, and then a value of an array of dataset names in the datasets folder.
    1. For multiple image collections, have multiple keys.
    2. For multiple datasets for the same collection, append to the respective value array.
    3. For example: {"image_folder_name": ["dataset_name.json"]}.
  3. In the train script, set the conf variable at the top of the main() function to load the JSON file name created in the previous step.
  4. Create a text file named overall_prefix.txt file in the config folder. This file should have the absolute path to the directory in which all the image repositories will be stored.
    1. Currently it is /usr/local/bgnn/. There are various image folders like tulane, inhs_filtered, uwzm_filtered, etc.
  5. To edit the learning rate, batch size, or any other base training configuration, edit the base training configurations file.
  6. To edit the number of iterations, dropoffs, or any model specific configurations, edit the model training configurations file.

Finally, to train the model, run the following command:

pipenv run python3 train_model.py

Preprocessing

  • Manual image preprocessing is not necessary. Some versions of the code do however contrast enhance the images internally (see Citation)

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Goal

To develop a tool to check the validity of metadata associated with an image, and generate things that are missing. Also includes various geometric and statistical properties on the mask generated over the biological specimen presented.

Metadata Generation

The metadata generated is extremely specific to our use case. In addition, we perform additional image processing techniques to improve our accuracies that may not work for other use cases. These include:

  1. Image scaling when a fish is detected but not an eye, in an attempt to lower missing eyes.
  2. Selection of highest confidence fish bounding box given our criterion of single fish in an image.
  3. Contrast enhancement (CLAHE)

The metadata generated produces various statistical and geometric properties of a biological specimen image or collection in a JSON format. When a single file is passed, the data is yielded to the console (stdout). When a directory is passed, the data is stored in a JSON file.

Environmental Impact

Extremely minimal as a regular workstation computer was used for this paper.

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

  • Desktop computer with an Intel(R) Xeon(R) W-2175 CPU and an Nvidia Quadro RTX 4000 GPU.

Citation

Karnani, K., Pepper, J., Bakiş, Y. et al. Computational metadata generation methods for biological specimen image collections. Int J Digit Libr (2022). https://doi.org/10.1007/s00799-022-00342-1

Associated Publication

J. Pepper, J. Greenberg, Y. Bakiş, X. Wang, H. Bart and D. Breen, "Automatic Metadata Generation for Fish Specimen Image Collections," 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2021, pp. 31-40, doi: 10.1109/JCDL52503.2021.00015.

BibTeX:

@article{KPB2022,
  title    = "Computational metadata generation methods for biological specimen image collections",
  author   = "Karnani, Kevin and Pepper, Joel and Bak{\i}{\c s}, Yasin and Wang, Xiaojun and Bart, Jr, Henry and
              Breen, David E and Greenberg, Jane",
  journal  = "International Journal on Digital Libraries",
  year     =  2022,
  url      = "https://doi.org/10.1007/s00799-022-00342-1",
  doi      = "10.1007/s00799-022-00342-1"
}

Glossary

Properties Generated

Property Association Type Explanation
has_fish Overall Image Boolean Whether a fish was found in the image.
fish_count Overall Image Integer The quantity of fish present.
has_ruler Overall Image Boolean Whether a ruler was found in the image.
ruler_bbox Overall Image 4 Tuple The bounding box of the ruler (if found).
scale* Overall Image Float The scale of the image in pixelscm\frac{\mathrm{pixels}}{\mathrm{cm}}.
bbox Per Fish 4 Tuple The top left and bottom right coordinates of the bounding box for a fish.
background.mean Per Fish Float The mean intensity of the background within a given fish's bounding box.
background.std Per Fish Float The standard deviation of the background within a given fish's bounding box.
foreground.mean Per Fish Float The mean intensity of the foreground within a given fish's bounding box.
foreground.std Per Fish Float The standard deviation of the foreground within a given fish's bounding box.
contrast* Per Fish Float The contrast between foreground and background intensities within a given fish's bounding box.
centroid Per Fish 4 Tuple The centroid of a given fish's bitmask.
primary_axis* Per Fish 2D Vector The unit length primary axis (eigenvector) for the bitmask of a given fish.
clock_value* Per Fish Integer Fish's primary axis converted into an integer "clock value" between 1 and 12.
oriented_length* Per Fish Float The length of the fish bounding box in centimeters.
mask Per Fish 2D Matrix The bitmask of a fish in 0's and 1's.
pixel_analysis_failed Per Fish Boolean Whether the pixel analysis process failed for a given fish. If true, detectron's mask and bounding box were used for metadata generation.
score Per Fish Float The percent confidence score output by detectron for a given fish.
has_eye Per Fish Boolean Whether an eye was found for a given fish.
eye_center Per Fish 2 Tuple The centroid of a fish's eye.
side* Per Fish String The side (i.e. 'left' or 'right') of the fish that is facing the camera (dependent on finding its eye).
area Per Fish Float Area of fish in cm2\mathrm{cm^2}.
cont_length Per Fish Float The longest contiguous length of the fish in centimeters.
cont_width Per Fish Float The longest contiguous width of the fish in centimeters.
convex_area Per Fish Float Area of convex hull image (smallest convex polygon that encloses the fish) in cm2\mathrm{cm^2}.
eccentricity Per Fish Float Ratio of the focal distance over the major axis length of the ellipse that has the same second-moments as the fish.
extent Per Fish Float Ratio of pixels of fish to pixels in the total bounding box. Computed as arearowscols\frac{\mathrm{area}}{\mathrm{rows} * \mathrm{cols}}
feret_diameter_max Per Fish Float The longest distance between points around the fish’s convex hull contour.
kurtosis Per Fish 2D Vector The sharpness of the peaks of the frequency-distribution curve of mask pixel coordinates.
major_axis_length Per Fish Float The length of the major axis of the ellipse that has the same normalized second central moments as the fish.
mask.encoding Per Fish String The 8-way Freeman Encoding of the outline of the fish.
mask.start_coord Per Fish 2D Vector The starting coordinate of the Freeman encoded mask.
minor_axis_length Per Fish Float The length of the minor axis of the ellipse that has the same normalized second central moments as the fish.
oriented_width Per Fish Float The width of the fish bounding box in centimeters.
perimeter Per Fish Float The approximation of the contour in centimeters as a line through the centers of border pixels using 8-connectivity.
skew Per Fish 2D Vector The measure of asymmetry of the frequency-distribution curve of mask pixel coordinates.
solidity Per Fish Float The ratio of pixels in the fish to pixels of the convex hull image.
std Per Fish Float The standard deviation of the mask pixel coordinate distribution.

More Information

Research supported by NSF Office of Advanced Cyberinfrastructure (OAC) #1940233 and #1940322, with additional support from NSF Award #2118240. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Authors and Affiliations:

Computer Science Department, Drexel University, Philadelphia, PA, USA

Kevin Karnani, Joel Pepper (corresponding author) & David E. Breen

Biodiversity Research Institute, Tulane University, New Orleans, LA, USA

Yasin Bakiş, Xiaojun Wang & Henry Bart Jr.

Information Science Department, Drexel University, Philadelphia, PA, USA

Jane Greenberg

Model Card Authors

Elizabeth Campolongo and Joel Pepper

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Space using imageomics/Drexel-metadata-generator 1