AI Image Processing Toolkit

A collection of specialized scripts for AI image processing, dataset preparation, and model training workflows.

🛠️ Scripts Overview

`wdv3`

An image tagging script using the WD V3 tagger models by SmilingWolf based on this repo. Supports multiple model architectures (ViT, SwinV2, ConvNext) and can process both single images and directories recursively.

Features

Multiple model architecture support
Batch processing capabilities
Adjustable confidence thresholds
CUDA acceleration with FP16 support
JXL image format support

`train_functions`

A set of ZSH functions for managing AI model training workflows:

Script execution management
Training variable setup
Git repository state tracking
Output directory management
Automatic cleanup of empty outputs

`git-wrapper`

Enhanced Git functionality for dataset management:

Automatic submodule handling
LFS integration for JXL files
Dataset-specific Git attributes management

`check4sig`

Dataset caption file watermark detection utility:

Scans .caption files for watermark-related text
Batch processing support
Interactive editing with nvim
Recursive directory scanning

`gallery-dl`

Directory-aware wrapper for gallery-dl:

Automatically changes to ~/datasets directory
Maintains consistent download locations
Preserves original command functionality

`joy`

Advanced image captioning system by fancyfeast called JoyCaption using CLIP and LLM

Multiple caption styles (descriptive, training prompts, art critic, etc.)
Custom image adapters
Tag-based caption generation
Batch processing support

`png2mp4`

Training progress visualization tool:

Converts PNG sequences to MP4
Customizable frame rates and durations
Step counter overlay support
Multiple sample handling

`xyplot`

Image comparison grid generator:

Supports multiple image formats
Customizable grid layouts
Optional row/column labels
Automatic image padding and alignment

`concat_captions`

Utility for combining multiple caption files:

Merges .caption and .tags files
Maintains original image associations
Batch processing support
Error handling for missing files

`stats`

Directory analysis and statistics generation tool that provides detailed file counts and metrics:

Detailed file counting by extension with color-coded output for different file types (JXL, PNG, JPG, etc.)
Multiple sorting options (by name, count, or specific file types)
Recursive directory scanning with aggregated statistics
Color-coded thresholds for dataset size evaluation
Automatic categorization of files into image and text groups
Grand total calculations across all subdirectories

`shortcode`

Hugo-compatible shortcode generator for image galleries with blurhash integration:

Generates Hugo-compatible shortcode blocks for each image
Integrates blurhash codes for progressive image loading
Automatically extracts and includes image dimensions
Preserves and integrates image captions from metadata
Supports grid layout configurations
Processes directories recursively while maintaining structure
Handles relative path resolution for static content

`yiffdata`

Comprehensive image metadata extraction and JSON generation utility:

Extracts precise image dimensions using PIL
Combines existing blurhash codes from .bh files
Integrates caption data from .caption files
Generates consolidated JSON output with all metadata
Maintains original filename references
Supports batch processing of entire directories
Preserves file relationships and metadata hierarchy

`txt2tags`

Batch file extension conversion utility for dataset management:

Converts .txt files to .tags format for ML training compatibility
Preserves original file content and structure
Supports recursive directory traversal
Interactive mode for selective conversion
Maintains original file timestamps and permissions
Simple command-line interface with directory input

`txt2emoji`

Advanced text-to-emoji conversion system with context awareness:

Sophisticated word-to-emoji mapping with custom dictionaries
Context-aware emoji selection to avoid redundancy
Detailed conversion explanations with rationale
Batch processing with multiple output formats
Configurable threshold and filtering options
NLTK integration for improved text parsing
Extensive customization options for emoji mappings

`jtp2`

State-of-the-art image classification system using Redrocket's PILOT2 model:

Implements Vision Transformer architecture with custom modifications
Features GatedHead classifier for improved accuracy
CUDA-accelerated inference with FP16 support
Configurable confidence thresholds for tag generation
Comprehensive batch processing capabilities
Automatic tag file generation alongside images
Supports multiple image formats including JXL

`keyframe`

Efficient video keyframe extraction tool using FFmpeg:

Extracts high-quality keyframes from video files
Creates organized output directories automatically
Maintains original frame quality and metadata
Intelligent I-frame detection and extraction
Sequential frame naming with padding
Minimal quality loss during extraction
Simple command-line interface

`chop_blocks`

Advanced LoRA model manipulation tool for fine-grained control using code from resize-lora by Gaeros:

Precise block-level filtering of LoRA models
Sophisticated weight adjustment capabilities
Full SafeTensors format support
Detailed analysis and reporting of model structure
Preserves model metadata during modifications
Vector string format for block manipulation
Supports both SDXL and SD1 naming conventions

🔧 Core Utilities

File Processing (`utils/file_processor.py`)

Base framework for file processing operations:

Abstract base class for consistent file handling
Configurable processing options (recursive, dry-run, debug)
Built-in logging and error handling
Support for multiple file extensions
Hidden file filtering

Example usage:

from utils.file_processor import FileProcessor, ProcessorOptions

class MyProcessor(FileProcessor):
    def process_content(self, content: str) -> str:
        # Add your processing logic here
        return content.replace('old', 'new')

# Initialize with options
options = ProcessorOptions(
    recursive=True,
    dry_run=False,
    file_extensions={'.txt', '.md'}
)

# Process files
processor = MyProcessor(options)
processor.process_directory(Path('path/to/directory'))

Internationalization (`utils/i18n_utils.py`)

Centralized i18n functionality using Python's gettext:

System locale detection and setup
Translation file management
Fallback handling to English
Organized locale structure support
Simple integration with setup_i18n() function

Example usage:

from utils.i18n_utils import setup_i18n

# Initialize translations for your script
_ = setup_i18n('my_script')

# Use translations in your code
print(_("Processing files..."))
print(_("Found {} images").format(count))

Logging (`utils/logging_utils.py`)

Standardized logging setup across the toolkit:

Configurable log levels and directories
Console and file output support
Formatted logging messages
Debug mode toggle
Clean handler management

Example usage:

from utils.logging_utils import setup_logger
from pathlib import Path

# Setup logger with file output
logger = setup_logger(
    name="my_script",
    log_dir=Path("logs"),
    debug=True
)

# Use logger
logger.debug("Detailed debug info")
logger.info("Processing started")
logger.warning("Missing optional file")
logger.error("Failed to process file")

Image Processing Utilities (`caption/imgproc_utils.py`)

Common utilities for image processing tasks:

Colored logging output
File discovery and filtering
Batch processing support
Output path management
Processing validation
Multiple image format support

Example usage:

from caption.imgproc_utils import ProcessingOptions, find_images, batch_iterator
from pathlib import Path

# Setup options
opts = ProcessingOptions(
    recursive=True,
    batch_size=32,
    supported_extensions={'.png', '.jpg'}
)

# Find and process images
image_dir = Path('images')
for batch in batch_iterator(find_images(image_dir, opts), opts.batch_size):
    # Process batch of images
    for image_path in batch:
        print(f"Processing {image_path}")

Image Processing Base (`caption/imgproc_base.py`)

Abstract base class for image processors:

CUDA/CPU device management
Standard processing workflow
Result saving functionality
Error handling
PIL image support with JXL compatibility

Example usage:

from caption.imgproc_base import ImageProcessor
from caption.imgproc_utils import ProcessingOptions
from PIL import Image
from pathlib import Path

class MyImageProcessor(ImageProcessor):
    def load_models(self) -> None:
        # Load your ML models here
        self.model = load_my_model()
    
    def process_image(self, image: Image.Image, image_path: Path) -> str:
        # Process the image and return result
        return "processed image result"

# Initialize and use
processor = MyImageProcessor(ProcessingOptions())
processor.load_models()
processor.process_file(Path('image.jpg'), Path('output'))

Batch Processing (`utils/batch_processor.py`)

Generic batch processing framework:

Parallel processing support
Configurable batch sizes
Multi-worker processing
CUDA/CPU device management
Progress tracking
Type-safe generic implementation
Automatic worker count optimization

Example usage:

from utils.batch_processor import BatchProcessor, BatchOptions
from pathlib import Path
from typing import List

class MyBatchProcessor(BatchProcessor[Path, str]):
    def process_item(self, item: Path) -> str:
        # Process single item
        return f"Processed {item.name}"
    
    def should_process_item(self, item: Path) -> bool:
        return item.suffix in {'.png', '.jpg'}

# Initialize processor
opts = BatchOptions(
    batch_size=32,
    num_workers=4,
    device="cuda"
)

# Process files
processor = MyBatchProcessor(opts)
files = Path('data').glob('*')
results = list(processor.process_all(files, parallel=True))

📁 Directory Structure

The utility modules are organized as follows:

~/toolkit/
├── utils/
│   ├── file_processor.py
│   ├── i18n_utils.py
│   ├── logging_utils.py
│   ├── batch_processor.py
│   └── locales/
│       └── [language_code]/
│           └── LC_MESSAGES/
│               └── [domain].mo
└── caption/
    ├── imgproc_utils.py
    └── imgproc_base.py

🚀 Installation

Clone the repository: (optional)

git clone https://huggingface.co./k4d3/toolkit

Add the repository to your PATH: (optional)

export PATH="$PATH:~/path/to/toolkit"

Add the .zshrc to your shell: (optional and you will need to make changes to it)

source ~/path/to/toolkit/.zshrc
nano ~/.zshrc

📝 Requirements

miniconda with the environment set up for training with sd-scripts, inferring with timm, llama, etc
ZSH shell (optional)
CUDA-capable GPU (recommended)
Required Python packages:
- torch
- transformers
- pillow
- pillow-jxl
- opencv-python
- numpy
- and a lot more

🔧 Usage

Each script can be used independently or as part of a workflow. Here are some usage examples:

XY Plot

xyplot ./ComfyUI_00341_.png ./ComfyUI_00342_.png ./ComfyUI_00346_.png --column-labels "No LoRA" "minit-v1s6000.safetensors M:1.0 TE:1.0" "minit-v1s6000.safetensors M:1.40 TE:1.0" --rows 1 --output plot1.png

JoyCaption

joy --feed-from-tags=10 --custom_prompt="Write a very long descriptive caption for this image in a formal tone. Do not mention feelings and emotions evoked by the image." .

png2mp4

png2mp4 --repeat 16

inject_to_txt

inject_to_txt 1_honovy "honovy"

replace_comma_with_keep_tags_txt

replace_comma_with_keep_tags_txt 1 1_honovy

📦 Directory Structure

~/
├── datasets/
├── output_dir/
├── models/
├── toolkit/

📄 License

WTFPL - Do what the fuck you want with it.

The included data and models are copyrighted by their respective owners with their own licenses.

🤝 Contributing

Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.

📚 Documentation

If the documentation of a script is missing, ask a language model about it.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference API

Unable to determine this model's library. Check the docs .

AI Image Processing Toolkit

🛠️ Scripts Overview

wdv3

Features

train_functions

git-wrapper

check4sig

gallery-dl

joy

png2mp4

xyplot

concat_captions

stats

shortcode

yiffdata

txt2tags

txt2emoji

jtp2

keyframe

chop_blocks

🔧 Core Utilities

File Processing (utils/file_processor.py)

Internationalization (utils/i18n_utils.py)

Logging (utils/logging_utils.py)

Image Processing Utilities (caption/imgproc_utils.py)

Image Processing Base (caption/imgproc_base.py)

Batch Processing (utils/batch_processor.py)

📁 Directory Structure

🚀 Installation

📝 Requirements

🔧 Usage

XY Plot

JoyCaption

png2mp4

inject_to_txt

replace_comma_with_keep_tags_txt

📦 Directory Structure

📄 License

🤝 Contributing

📚 Documentation

`wdv3`

`train_functions`

`git-wrapper`

`check4sig`

`gallery-dl`

`joy`

`png2mp4`

`xyplot`

`concat_captions`

`stats`

`shortcode`

`yiffdata`

`txt2tags`

`txt2emoji`

`jtp2`

`keyframe`

`chop_blocks`

File Processing (`utils/file_processor.py`)

Internationalization (`utils/i18n_utils.py`)

Logging (`utils/logging_utils.py`)

Image Processing Utilities (`caption/imgproc_utils.py`)

Image Processing Base (`caption/imgproc_base.py`)

Batch Processing (`utils/batch_processor.py`)