Building a MusicGen API to Generate Custom Music Tracks Locally
The way we create and experience music is evolving, thanks to generative AI. Tools like MusicGen are at the forefront of this revolution, enabling developers and creators to produce unique audio from textual descriptions. Imagine generating an inspiring soundtrack or a soothing melody tailored to your exact needs—all with a simple API.
In this article, I’ll guide you through building a MusicGen API using facebook/musicgen-large
, combining technical instruction with insights into why generative audio is reshaping the creative landscape.
The Power of MusicGen
MusicGen, developed by Meta, is a powerful text-to-audio model capable of creating diverse musical compositions based on prompts like "relaxing piano music" or "energetic dance beats." Its versatility makes it ideal for:
- Personalized soundtracks for video content.
- Ambient music for apps, games, or experiences.
- Rapid prototyping for music producers.
Unlike traditional music creation, MusicGen doesn’t require advanced composition skills. This democratization of creativity is why generative AI is so transformative.
Setting Up Your Environment
Before we dive into code, let's ensure we have the right tools.
Requirements
To run the API locally, you’ll need:
- Python 3.9+
- A CUDA-compatible GPU for faster processing (though CPU works too).
- Libraries:
torch
,transformers
,FastAPI
,uvicorn
,scipy
Installation
Install the required libraries:
pip install torch transformers fastapi uvicorn scipy
This setup prepares your machine to run the facebook/musicgen-large
model and handle audio processing seamlessly.
Introducing the MusicGen API
What Does the API Do?
The API:
- Accepts a prompt describing the desired music style.
- Allows users to specify the duration of the generated audio.
- Returns two unique audio tracks for variety.
We use FastAPI to manage the API endpoints, leveraging its high performance and automatic validation capabilities.
The Code: Building the MusicGen API
Here’s the full implementation of the API:
from fastapi import FastAPI, HTTPException, BackgroundTasks
from pydantic import BaseModel
import uvicorn
import os
import scipy.io.wavfile
import torch
from transformers import pipeline
import random
import traceback
app = FastAPI()
class MusicRequest(BaseModel):
prompt: str
duration: int # Duration for each track
# Disable tokenizers parallelism warning
os.environ["TOKENIZERS_PARALLELISM"] = "false"
@app.post("/generate-music/")
async def generate_music(request: MusicRequest, background_tasks: BackgroundTasks):
if request.duration <= 0:
raise HTTPException(status_code=400, detail="Duration must be greater than zero")
synthesiser = None
try:
# Set device (GPU or CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {'CUDA' if device.type == 'cuda' else 'CPU'}")
# Optionally limit GPU memory usage
if device.type == 'cuda':
try:
torch.cuda.set_per_process_memory_fraction(0.8, device=0)
print("Limited GPU memory usage to 80%")
except Exception as mem_error:
print(f"Failed to limit GPU memory: {mem_error}")
# Load MusicGen Large model
synthesiser = pipeline("text-to-audio", model="facebook/musicgen-large", device=0 if device.type == 'cuda' else -1)
print("Model loaded successfully")
# Generate two audio tracks using a random seed
random_seed = random.randint(0, 2**32 - 1)
torch.manual_seed(random_seed)
if device.type == 'cuda':
torch.cuda.manual_seed_all(random_seed)
music1 = synthesiser(request.prompt, forward_params={"do_sample": True, "max_length": request.duration * 50})
random_seed += 1
torch.manual_seed(random_seed)
if device.type == 'cuda':
torch.cuda.manual_seed_all(random_seed)
music2 = synthesiser(request.prompt, forward_params={"do_sample": True, "max_length": request.duration * 50})
# Save audio files
output1 = os.path.join(os.getcwd(), "song1.wav")
scipy.io.wavfile.write(output1, rate=music1["sampling_rate"], data=music1["audio"])
output2 = os.path.join(os.getcwd(), "song2.wav")
scipy.io.wavfile.write(output2, rate=music2["sampling_rate"], data=music2["audio"])
return {"song1": output1, "song2": output2}
except Exception as e:
traceback.print_exc()
raise HTTPException(status_code=500, detail=f"Error generating music: {e}")
finally:
if synthesiser:
del synthesiser
torch.cuda.empty_cache()
print("Cleaned up resources")
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Challenges and Solutions
GPU Memory Management
Problem: Long audio tracks can cause GPU memory issues.
Solution: Limit GPU memory usage to 80%:
torch.cuda.set_per_process_memory_fraction(0.8, device=0)
Model Initialization Overhead
Problem: Loading large models can delay response times.
Solution: Use FastAPI’s background tasks to handle operations asynchronously.
Running the API Locally
Start the API server:
uvicorn app:app --host 0.0.0.0 --port 8000
Example Request
To generate music, send a POST request to /generate-music/
:
{
"prompt": "calm and meditative music",
"duration": 30
}
The API will return paths to two generated audio files (song1.wav
and song2.wav
).
Why Generative Audio Matters
Generative AI like MusicGen empowers creators to experiment with music in ways that were once unimaginable. Whether you’re prototyping a film score or adding background music to your game, this technology removes barriers to entry.
This democratization of music production enables anyone—from hobbyists to professionals—to create something unique and personal.
Next Steps: Elevating Your API
Consider extending this API by:
- Adding Post-Processing: Enhance audio with normalization and filters using libraries like Pydub.
- Frontend Integration: Build an interface for non-technical users to interact with the API.
- Cloud Deployment: Host the API on platforms like AWS or Azure for broader accessibility.
Conclusion
Generative AI is redefining the boundaries of music production. By combining models like MusicGen with tools like FastAPI, we’re not just building APIs—we’re creating new ways to express creativity.
If you’re interested in exploring how AI can enhance your workflow or want to build custom APIs, feel free to connect with me. Together, we can build the future of music and AI.