File size: 3,071 Bytes
52409f1
7049bd0
30ed147
96d8926
52409f1
 
 
6093608
52409f1
 
c1187fc
345e642
79b1869
52409f1
 
7049bd0
5c85be0
345e642
5c85be0
 
 
345e642
 
5c85be0
 
 
 
 
 
 
 
345e642
5c85be0
 
 
 
 
 
 
 
345e642
5c85be0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
title: AI Video Composer
short_description: Generate video from your assets by asking
emoji: 🏞
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
disable_embedding: true
models:
  - Qwen/Qwen2.5-Coder-32B-Instruct
---

# 🏞 AI Video Composer

AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the Qwen2.5-Coder language model to generate FFmpeg commands based on your requirements.

## How It Works

1. **Upload Media Files**:

   - Supports multiple file formats including:
     - Images: .png, .jpg, .jpeg, .tiff, .bmp, .gif, .svg
     - Audio: .mp3, .wav, .ogg
     - Video: .mp4, .avi, .mov, .mkv, .flv, .wmv, .webm, and more
   - File size limit: 10MB per file
   - Video duration limit: 2 minutes

2. **Provide Instructions**:

   - Write natural language instructions describing how you want to process your media
   - Examples:
     - "Convert these images into a slideshow with 1 second per image"
     - "Add this audio track to the video"
     - "Make the video play 2x faster"
     - "Create a waveform visualization for this audio file"

3. **Advanced Parameters**:

   - Top-p (nucleus sampling): Controls diversity of generated commands (0-1)
   - Temperature: Controls randomness in command generation (0-5)

4. **Processing**:
   - The app analyzes your files and instructions
   - Generates an optimized FFmpeg command using Qwen2.5-Coder
   - Executes the command and returns the processed video
   - Displays the generated FFmpeg command for transparency

## Features

- **Smart Command Generation**: Automatically generates optimal FFmpeg commands based on natural language input
- **Error Handling**: Validates commands before execution and retries with alternative approaches if needed
- **Multiple Asset Support**: Process multiple media files in a single operation
- **Waveform Visualization**: Special support for audio visualization with customizable parameters
- **Image Sequence Processing**: Efficient handling of image sequences for slideshow creation
- **Format Conversion**: Support for various input/output format conversions
- **Example Gallery**: Built-in examples demonstrating common use cases

## Technical Details

- Built with Gradio for the user interface
- Uses FFmpeg for media processing
- Powered by Qwen2.5-Coder for command generation
- Implements robust error handling and command validation
- Processes files in a temporary directory for safety
- Supports both simple operations and complex media transformations

## Limitations

- Maximum file size: 10MB per file
- Maximum video duration: 2 minutes
- Output format: Always MP4
- Processing time may vary based on input complexity

## Contributing

If you have ideas for improvements or bug fixes, please open a PR:

[![Open a Pull Request](https://huggingface.co./datasets/huggingface/badges/raw/main/open-a-pr-lg-light.svg)](https://huggingface.co./spaces/huggingface-projects/video-composer-gpt4/discussions)