Spaces:

yakhyo
/

kokoro-onnx

Running

App Files Files Community

yakhyo commited on 21 days ago

Commit

1337d7e

1 Parent(s): 84589d3

Initial commit

Browse files

Files changed (26) hide show

.gitignore +171 -0
README.md +91 -11
app.py +95 -0
assets/edu_note.wav +0 -0
assets/fun_fact.wav +0 -0
assets/thanks.wav +0 -0
example.ipynb +0 -0
gradio_demo.png +0 -0
inference.py +25 -0
models/__init__.py +2 -0
models/kokoro.py +125 -0
models/tokenizer.py +238 -0
requirements.txt +4 -0
voices/af.pt +3 -0
voices/af_bella.pt +3 -0
voices/af_nicole.pt +3 -0
voices/af_sarah.pt +3 -0
voices/af_sky.pt +3 -0
voices/am_adam.pt +3 -0
voices/am_michael.pt +3 -0
voices/bf_emma.pt +3 -0
voices/bf_isabella.pt +3 -0
voices/bm_george.pt +3 -0
voices/bm_lewis.pt +3 -0
weights/.gitkeep +0 -0
weights/kokoro-v0_19.onnx +3 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,171 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# UV
+#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#uv.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
+.pdm.toml
+.pdm-python
+.pdm-build/
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
+# PyPI configuration file
+.pypirc

README.md CHANGED Viewed

@@ -1,14 +1,94 @@
 ---
-title: Kokoro 82m
-emoji: ⚡
-colorFrom: blue
-colorTo: gray
-sdk: gradio
-sdk_version: 5.12.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: Kokoro-82m TTS ONNX Runtime Inference
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Kokoro-82M ONNX Runtime Inference
+![Downloads](https://img.shields.io/github/downloads/yakhyo/kokoro-82m-onnx/total)
+[![GitHub Repo stars](https://img.shields.io/github/stars/yakhyo/kokoro-82m-onnx)](https://github.com/yakhyo/kokoro-82m-onnx/stargazers)
+[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/yakhyo/kokoro-82m-onnx)
+This repository contains minimal code and resources for inference using the **Kokoro-82M** model. The repository supports inference using **ONNX Runtime**.
+<table>
+  <tr>
+    <td>Machine learning models rely on large datasets and complex algorithms to identify patterns and make predictions.</td>
+    <td>Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still edible!</td>
+  </tr>
+  <tr>
+    <td align="center">
+       <video controls autoplay loop src="https://github.com/user-attachments/assets/a8e9bfb7-777a-4b44-901c-c79c39c02c6f" ></video>
+    </td>
+    <td align="center">
+      <video controls autoplay loop src="https://github.com/user-attachments/assets/358723ad-c0ab-44a3-90cc-64d89c042c9a" ></video>
+    </td>
+  </tr>
+</table>
+## Features
+- **ONNX Runtime Inference**: Kokoro-82M (v0_19) Minimal ONNX Runtime Inference code. It supports `en-us` and `en-gb`.
 ---
+## Installation
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/yakhyo/kokoro-82m.git
+   cd kokoro-82m
+   ```
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Install `espeak` for text-to-speech functionality:
+   Linux:
+   ```bash
+   apt-get install espeak -y
+   ```
 ---
+## Usage
+### Download ONNX Model
+[click to download](https://github.com/yakhyo/kokoro-82m/releases/download/v0.0.1/kokoro-v0_19.onnx)
+### Jupyter Notebook Inference Example
+Run inference using the jupyter notebook:
+[example.ipynb](example.ipynb)
+### CLI Inference
+Specify input text and model weights in `inference.py` then run:
+```bash
+python inference.py
+```
+### Gradio App
+Run below start Gradio App
+```bash
+python app.py
+```
+<div>
+  <img src="gradio_demo.png", width="100%>
+</div>
+---
+## License
+This project is licensed under the [MIT License](LICENSE).
+Model weights licensed under the [Apache 2.0](#license)
+---
+## Acknowledgments
+- https://huggingface.co/hexgrad/Kokoro-82M

app.py ADDED Viewed

	@@ -0,0 +1,95 @@

+import os
+import gradio as gr
+import tempfile
+import soundfile as sf
+from models import Tokenizer, Kokoro
+# Function to fetch available style vectors dynamically
+def get_style_vector_choices(directory="voices"):
+    return [file for file in os.listdir(directory) if file.endswith(".pt")]
+# Function to perform TTS using your local model
+def local_tts(
+        text: str,
+        model_path: str,
+        style_vector: str,
+        output_file_format: str = "wav",
+        speed: float = 1.0
+):
+    if len(text) > 0:
+        try:
+            tokenizer = Tokenizer()
+            style_vector_path = os.path.join("voices", style_vector)
+            inference = Kokoro(model_path, style_vector_path, tokenizer=tokenizer, lang='en-us')
+            audio, sample_rate = inference.generate_audio(text, speed=speed)
+            with tempfile.NamedTemporaryFile(suffix=f".{output_file_format}", delete=False) as temp_file:
+                sf.write(temp_file.name, audio, sample_rate)
+                temp_file_path = temp_file.name
+            return temp_file_path
+        except Exception as e:
+            raise gr.Error(f"An error occurred during TTS inference: {str(e)}")
+    else:
+        raise gr.Error("Input text cannot be empty.")
+# Get the list of available style vectors
+style_vector_choices = get_style_vector_choices()
+# sample texts and their corresponding audio
+sample_outputs = [
+    ("Educational Note", "Machine learning models rely on large datasets and complex algorithms to identify patterns and make predictions.", "assets/edu_note.wav"),
+    ("Fun Fact", "Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still edible!", "assets/fun_fact.wav"),
+    ("Thanks", "Thank you for listening to this audio. It was generated by the Kokoro TTS model.", "assets/thanks.wav")
+]
+example_texts = [
+    ["Machine learning models rely on large datasets and complex algorithms to identify patterns and make predictions."],
+    ["Did you know that honey never spoils? Archaeologists have found pots of honey in ancient Egyptian tombs that are over 3,000 years old and still edible!"],
+    ["Thank you for listening to this audio. It was generated by the Kokoro TTS model."]
+]
+# Gradio Interface
+with gr.Blocks() as demo:
+    gr.Markdown("# <center> Kokoro-82m Text-to-Speech with Gradio </center>")
+    # Model-specific inputs
+    with gr.Row(variant="panel"):
+        model_path = gr.Textbox(label="Model Path", value="weights/kokoro-v0_19.onnx", interactive=False)
+        style_vector = gr.Dropdown(choices=style_vector_choices, label="Style Vector", value=style_vector_choices[0])
+        output_file_format = gr.Dropdown(choices=["wav", "mp3"], label="Output Format", value="wav")
+        speed = gr.Slider(minimum=0.5, maximum=2.0, value=1.0, step=0.1, label="Speed")
+    # Text input and output
+    text = gr.Textbox(
+        label="Input Text",
+        placeholder="Enter text to convert to speech."
+    )
+    btn = gr.Button("Generate Speech")
+    output_audio = gr.Audio(label="Generated Audio", type="filepath")
+    # Link inputs and outputs
+    btn.click(
+        fn=local_tts,
+        inputs=[text, model_path, style_vector, output_file_format, speed],
+        outputs=output_audio
+    )
+    # Add example texts
+    gr.Examples(
+        examples=example_texts,
+        inputs=[text],
+        label="Click an example to populate the input text"
+    )
+    # Add example texts and audios
+    gr.Markdown("### Sample Texts and Audio")
+    for topic, sample_text, sample_audio in sample_outputs:
+        with gr.Row():
+            gr.Textbox(value=sample_text, label=topic, interactive=False)
+            gr.Audio(value=sample_audio, label="Example Audio", type="filepath", interactive=False)
+demo.launch(server_name="127.0.0.1")

assets/edu_note.wav ADDED Viewed

Binary file (416 kB). View file

assets/fun_fact.wav ADDED Viewed

Binary file (497 kB). View file

assets/thanks.wav ADDED Viewed

Binary file (288 kB). View file

example.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

gradio_demo.png ADDED Viewed

inference.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import soundfile as sf
+from models import Tokenizer, Kokoro
+def main():
+    model_path = "weights/kokoro-v0_19.onnx"
+    style_vector_path = "voices/af.pt"
+    output_filename = "test_out.wav"
+    tokenizer = Tokenizer()
+    text = (
+        "This approach ensures the entire text is processed without exceeding the token limit and outputs seamless audio for the full input. Let me know if you need further assistance!"
+    )
+    inference = Kokoro(model_path, style_vector_path, tokenizer=tokenizer, lang='en-us')
+    audio, sample_rate = inference.generate_audio(text, speed=1.0)
+    # Save the audio to a file
+    sf.write(output_filename, audio, sample_rate)
+    print(f"Audio saved to {output_filename}")
+if __name__ == "__main__":
+    main()

models/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from .kokoro import Kokoro
2	+ from .tokenizer import Tokenizer

models/kokoro.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import torch
+import numpy as np
+import onnxruntime as ort
+TOKEN_LIMIT = 510
+SAMPLE_RATE = 24_000
+class Kokoro:
+    def __init__(self, model_path: str, style_vector_path: str, tokenizer, lang: str = 'en-us') -> None:
+        """
+        Initializes the ONNXInference class.
+        Args:
+            model_path (str): Path to the ONNX model file.
+            style_vector_path (str): Path to the style vector file.
+            lang (str): Language code for the tokenizer.
+        """
+        self.sess = ort.InferenceSession(model_path)
+        self.style_vector_path = style_vector_path
+        self.tokenizer = tokenizer
+        self.lang = lang
+    def preprocess(self, text):
+        """
+        Converts input text to tokenized numerical IDs and loads the style vector.
+        Args:
+            text (str): Input text to preprocess.
+        Returns:
+            tuple: Tokenized input and corresponding style vector.
+        """
+        # Convert text to phonemes and tokenize
+        phonemes = self.tokenizer.phonemize(text, lang=self.lang)
+        tokenized_phonemes = self.tokenizer.tokenize(phonemes)
+        if not tokenized_phonemes:
+            raise ValueError("No tokens found after tokenization")
+        style_vector = torch.load(self.style_vector_path, weights_only=True)
+        if len(tokenized_phonemes) > TOKEN_LIMIT:
+            token_chunks = self.split_into_chunks(tokenized_phonemes)
+            tokens_list = []
+            styles_list = []
+            for chunk in token_chunks:
+                token_chunk = [[0, *chunk, 0]]
+                style_chunk = style_vector[len(chunk)].numpy()
+                tokens_list.append(token_chunk)
+                styles_list.append(style_chunk)
+            return tokens_list, styles_list
+        style_vector = style_vector[len(tokenized_phonemes)].numpy()
+        tokenized_phonemes = [[0, *tokenized_phonemes, 0]]
+        return tokenized_phonemes, style_vector
+    @staticmethod
+    def split_into_chunks(tokens):
+        """
+        Splits a list of tokens into chunks of size TOKEN_LIMIT.
+        Args:
+            tokens (list): List of tokens to split.
+        Returns:
+            list: List of token chunks.
+        """
+        tokens_chunks = []
+        for i in range(0, len(tokens), TOKEN_LIMIT):
+            tokens_chunks.append(tokens[i:i+TOKEN_LIMIT])
+        return tokens_chunks
+    def infer(self, tokens, style_vector, speed=1.0):
+        """
+        Runs inference using the ONNX model.
+        Args:
+            tokens (list): Tokenized input for the model.
+            style_vector (numpy.ndarray): Style vector for the model.
+            speed (float): Speed parameter for inference.
+        Returns:
+            numpy.ndarray: Generated audio data.
+        """
+        # Perform inference
+        audio = self.sess.run(
+            None,
+            {
+                'tokens': tokens,
+                'style': style_vector,
+                'speed': np.array([speed], dtype=np.float32),
+            }
+        )[0]
+        return audio
+    def generate_audio(self, text,  speed=1.0):
+        """
+        Full pipeline: preprocess, infer, and save the generated audio.
+        Args:
+            text (str): Input text to generate audio from.
+            speed (float): Speed parameter for inference.
+        """
+        # Preprocess text
+        tokenized_data, styles_data = self.preprocess(text)
+        audio_segments = []
+        if len(tokenized_data) > 1:  # list of token chunks
+            for token_chunk, style_chunk in zip(tokenized_data, styles_data):
+                audio = self.infer(token_chunk, style_chunk, speed=speed)
+                audio_segments.append(audio)
+        else:  # single token less than input limit
+            # Run inference
+            audio = self.infer(tokenized_data, styles_data, speed=speed)
+            audio_segments.append(audio)
+        full_audio = np.concatenate(audio_segments)
+        return full_audio, SAMPLE_RATE

models/tokenizer.py ADDED Viewed

	@@ -0,0 +1,238 @@

+import re
+from phonemizer import backend
+from typing import List
+class Tokenizer:
+    def __init__(self):
+        self.VOCAB = self._get_vocab()
+        self.phonemizers = {
+            'en-us': backend.EspeakBackend(language='en-us', preserve_punctuation=True, with_stress=True),
+            'en-gb': backend.EspeakBackend(language='en-gb', preserve_punctuation=True, with_stress=True),
+        }
+    @staticmethod
+    def _get_vocab():
+        """
+        Generates a mapping of symbols to integer indices for tokenization.
+        Returns:
+            dict: A dictionary where keys are symbols and values are unique integer indices.
+        """
+        # Define the symbols
+        _pad = "$"
+        _punctuation = ';:,.!?¡¿—…"«»“” '
+        _letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
+        _letters_ipa = (
+            "ɑɐɒæɓʙβɔɕçɗɖðʤəɘɚɛɜɝɞɟʄɡɠɢʛɦɧħɥʜɨɪʝɭɬɫɮʟɱɯɰŋɳɲɴøɵɸθœɶʘɹɺɾɻʀʁɽʂʃʈʧʉʊʋⱱʌɣɤʍχʎʏʑʐʒʔʡʕʢǀǁǂǃˈˌːˑʼʴʰʱʲʷˠˤ˞↓↑→↗↘'̩'ᵻ"
+        )
+        symbols = [_pad] + list(_punctuation) + list(_letters) + list(_letters_ipa)
+        # Create a dictionary mapping each symbol to its index
+        return {symbol: index for index, symbol in enumerate(symbols)}
+    @staticmethod
+    def split_num(num: re.Match) -> str:
+        """
+        Processes numeric strings, formatting them as time, years, or other representations.
+        Args:
+            num (re.Match): A regex match object representing the numeric string.
+        Returns:
+            str: A formatted string based on the numeric input.
+        """
+        num = num.group()
+        # Handle time (e.g., "12:30")
+        if ':' in num:
+            hours, minutes = map(int, num.split(':'))
+            if minutes == 0:
+                return f"{hours} o'clock"
+            elif minutes < 10:
+                return f'{hours} oh {minutes}'
+            return f'{hours} {minutes}'
+        # Handle years or general numeric cases
+        year = int(num[:4])
+        if year < 1100 or year % 1000 < 10:
+            return num
+        left, right = num[:2], int(num[2:4])
+        suffix = 's' if num.endswith('s') else ''
+        # Format years
+        if 100 <= year % 1000 <= 999:
+            if right == 0:
+                return f'{left} hundred{suffix}'
+            elif right < 10:
+                return f'{left} oh {right}{suffix}'
+        return f'{left} {right}{suffix}'
+    @staticmethod
+    def flip_money(match: re.Match) -> str:
+        """
+        Converts monetary values to a textual representation.
+        Args:
+            m (re.Match): A regex match object representing the monetary value.
+        Returns:
+            str: A formatted string describing the monetary value.
+        """
+        m = m.group()
+        currency = 'dollar' if m[0] == '$' else 'pound'
+        # Handle whole amounts (e.g., "$10", "£20")
+        if '.' not in m:
+            singular = '' if m[1:] == '1' else 's'
+            return f'{m[1:]} {currency}{singular}'
+        # Handle amounts with decimals (e.g., "$10.50", "£5.25")
+        whole, cents = m[1:].split('.')
+        singular = '' if whole == '1' else 's'
+        cents = int(cents.ljust(2, '0'))  # Ensure 2 decimal places
+        coins = f"cent{'' if cents == 1 else 's'}" if m[0] == '$' else ('penny' if cents == 1 else 'pence')
+        return f'{whole} {currency}{singular} and {cents} {coins}'
+    @staticmethod
+    def point_num(match):
+        whole, fractional = match.group().split('.')
+        return ' point '.join([whole, ' '.join(fractional)])
+    def normalize_text(self, text: str) -> str:
+        """
+        Normalizes input text by replacing special characters, punctuation, and applying custom transformations.
+        Args:
+            text (str): Input text to normalize.
+        Returns:
+            str: Normalized text.
+        """
+        # Replace specific characters with standardized versions
+        replacements = {
+            chr(8216): "'",  # Left single quotation mark
+            chr(8217): "'",  # Right single quotation mark
+            '«': chr(8220),  # Left double angle quotation mark to left double quotation mark
+            '»': chr(8221),  # Right double angle quotation mark to right double quotation mark
+            chr(8220): '"',  # Left double quotation mark
+            chr(8221): '"',  # Right double quotation mark
+            '(': '«',        # Replace parentheses with angle quotation marks
+            ')': '»'
+        }
+        for old, new in replacements.items():
+            text = text.replace(old, new)
+        # Replace punctuation and add spaces
+        punctuation_replacements = {
+            '、': ',',
+            '。': '.',
+            '！': '!',
+            '，': ',',
+            '：': ':',
+            '；': ';',
+            '？': '?',
+        }
+        for old, new in punctuation_replacements.items():
+            text = text.replace(old, new + ' ')
+        # Apply regex-based replacements
+        text = re.sub(r'[^\S\n]', ' ', text)
+        text = re.sub(r'  +', ' ', text)
+        text = re.sub(r'(?<=\n) +(?=\n)', '', text)
+        # Expand abbreviations and handle special cases
+        abbreviation_patterns = [
+            (r'\bD[Rr]\.(?= [A-Z])', 'Doctor'),
+            (r'\b(?:Mr\.|MR\.(?= [A-Z]))', 'Mister'),
+            (r'\b(?:Ms\.|MS\.(?= [A-Z]))', 'Miss'),
+            (r'\b(?:Mrs\.|MRS\.(?= [A-Z]))', 'Mrs'),
+            (r'\betc\.(?! [A-Z])', 'etc'),
+            (r'(?i)\b(y)eah?\b', r"\1e'a"),
+        ]
+        for pattern, replacement in abbreviation_patterns:
+            text = re.sub(pattern, replacement, text)
+        # Handle numbers and monetary values
+        text = re.sub(r'\d*\.\d+|\b\d{4}s?\b|(?<!:)\b(?:[1-9]|1[0-2]):[0-5]\d\b(?!:)', self.split_num, text)
+        text = re.sub(r'(?<=\d),(?=\d)', '', text)  # Remove commas from numbers
+        text = re.sub(
+            r'(?i)[$£]\d+(?:\.\d+)?(?: hundred| thousand| (?:[bm]|tr)illion)*\b|[$£]\d+\.\d\d?\b',
+            self.flip_money,
+            text
+        )
+        text = re.sub(r'\d*\.\d+', self.point_num, text)
+        text = re.sub(r'(?<=\d)-(?=\d)', ' to ', text)
+        # Handle possessives and specific letter cases
+        text = re.sub(r'(?<=\d)S', ' S', text)
+        text = re.sub(r"(?<=[BCDFGHJ-NP-TV-Z])'?s\b", "'S", text)
+        text = re.sub(r"(?<=X')S\b", 's', text)
+        # Handle abbreviations with dots
+        text = re.sub(r'(?:[A-Za-z]\.){2,} [a-z]', lambda m: m.group().replace('.', '-'), text)
+        text = re.sub(r'(?i)(?<=[A-Z])\.(?=[A-Z])', '-', text)
+        return text.strip()
+    def tokenize(self, phonemes: str) -> List[int]:
+        """
+        Tokenizes a given string into a list of indices based on VOCAB.
+        Args:
+            text (str): Input string to tokenize.
+        Returns:
+            list: A list of integer indices corresponding to the characters in the input string.
+        """
+        return [self.VOCAB[x] for x in phonemes if x in self.VOCAB]
+    def phonemize(self, text: str, lang: str = 'en-us', normalize: bool = True) -> str:
+        """
+        Converts text to phonemes using the specified language phonemizer and applies normalization.
+        Args:
+            text (str): Input text to be phonemized.
+            lang (str): Language identifier ('en-us' or 'en-gb') for selecting the phonemizer.
+            normalize (bool): Whether to normalize the text before phonemization.
+        Returns:
+            str: A processed string of phonemes.
+        """
+        # Normalize text if required
+        if normalize:
+            text = self.normalize_text(text)
+        # Generate phonemes using the specified phonemizer
+        if lang not in self.phonemizers:
+            print(f"Language '{lang}' not supported. Defaulting to 'en-us'.")
+            lang = 'en-us'
+        phonemes = self.phonemizers[lang].phonemize([text])
+        phonemes = phonemes[0] if phonemes else ''
+        # Apply custom phoneme replacements
+        replacements = {
+            'kəkˈoːɹoʊ': 'kˈoʊkəɹoʊ',
+            'kəkˈɔːɹəʊ': 'kˈəʊkəɹəʊ',
+            'ʲ': 'j',
+            'r': 'ɹ',
+            'x': 'k',
+            'ɬ': 'l',
+        }
+        for old, new in replacements.items():
+            phonemes = phonemes.replace(old, new)
+        # Apply regex-based replacements
+        phonemes = re.sub(r'(?<=[a-zɹː])(?=hˈʌndɹɪd)', ' ', phonemes)
+        phonemes = re.sub(r' z(?=[;:,.!?¡¿—…"«»“” ]|$)', 'z', phonemes)
+        # Additional language-specific rules
+        if lang == 'a':
+            phonemes = re.sub(r'(?<=nˈaɪn)ti(?!ː)', 'di', phonemes)
+        # Filter out characters not in VOCAB
+        phonemes = ''.join(filter(lambda p: p in self.VOCAB, phonemes))
+        return phonemes.strip()

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio
+torch
+phonemizer
+soundfile

voices/af.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fad4192fd8a840f925b0e3fc2be54e20531f91a9ac816a485b7992ca0bd83ebf
+size 524355

voices/af_bella.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2828c6c2f94275ef3441a2edfcf48293298ee0f9b56ce70fb2e344345487b922
+size 524449

voices/af_nicole.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9401802fb0b7080c324dec1a75d60f31d977ced600a99160e095dbc5a1172692
+size 524454

voices/af_sarah.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba7918c4ace6ace4221e7e01eb3a6d16596cba9729850551c758cd2ad3a4cd08
+size 524449

voices/af_sky.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9f16f1bb778de36a177ae4b0b6f1e59783d5f4d3bcecf752c3e1ee98299b335e
+size 524375

voices/am_adam.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1921528b400a553f66528c27899d95780918fe33b1ac7e2a871f6a0de475f176
+size 524444

voices/am_michael.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a255c9562c363103adc56c09b7daf837139d3bdaa8bd4dd74847ab1e3e8c28be
+size 524459

voices/bf_emma.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:992e6d8491b8926ef4a16205250e51a21d9924405a5d37e2db6e94adfd965c3b
+size 524365

voices/bf_isabella.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d0865a03931230100167f7a81d394b143c072efe2d7e4c4a87b5c54d6283f580
+size 524365

voices/bm_george.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7d763dfe13e934357f4d8322b718787d79e32f2181e29ca0cf6aa637d8092b96
+size 524464

voices/bm_lewis.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f70d9ea4d65f522f224628f06d86ea74279faae23bd7e765848a374aba916b76
+size 524449

weights/.gitkeep ADDED Viewed

File without changes

weights/kokoro-v0_19.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ebef42457f7efee9b60b4f1d5aec7692f2925923948a0d7a2a49d2c9edf57e49
+size 345554732