Deploy ChatTTS from scratch and create your own text-to-speech solution.

Basic Environment#

PyTorch 2.1.0

Python 3.10 (ubuntu22.04)

Cuda 12.1

Installation#

git clone https://github.com/2noise/ChatTTS.git
cd ChatTTS
pip install -r requirements.txt


# If you want to start webui.py, install gradio as well
pip install gradio

Running#

python webui.py
# After running, it will automatically download the model to the huggingface folder. You need to provide your own magic or use the huggingface proxy website
# Run on a specified port
python webui.py --server_port 1234

Writing your own API#

# Simple version api.py Run python api.py
import ChatTTS
import torch
import torchaudio
from IPython.display import Audio

# Initialize the ChatTTS model
chat = ChatTTS.Chat()
chat.load_models(compile=False)  # Set to True for higher performance
rand_spk = chat.sample_random_speaker()
# Text to be converted
texts = ["Hello, I am text to speech"]

# Perform inference
wavs = chat.infer(texts)

# Save audio file
torchaudio.save("output1.wav", torch.from_numpy(wavs[0]), 24000)

###################################
# Advanced class v-api.py Run python v-api.py
import ChatTTS 
import torch
import torchaudio
from IPython.display import Audio


chat = ChatTTS.Chat()
chat.load_models(compile=False)  # Set to True for higher performance

###################################
# Sample a speaker from Gaussian.

rand_spk = chat.sample_random_speaker()

params_infer_code = {
  'spk_emb': rand_spk, # add sampled speaker 
  'temperature': .3, # using custom temperature
  'top_P': 0.7, # top P decode
  'top_K': 20, # top K decode
}

###################################
# For sentence level manual control.

# use oral_(0-9), laugh_(0-2), break_(0-7) 
# to generate special token in text to synthesize.
params_refine_text = {
  'prompt': '[oral_2][laugh_0][break_6]'
} 

###################################
# For word level manual control.
text = 'What is [uv_break]your favorite english food?[laugh][lbreak]'
wav = chat.infer(text, skip_refine_text=True, params_refine_text=params_refine_text,  params_infer_code=params_infer_code)
torchaudio.save("output2.wav", torch.from_numpy(wav[0]), 24000)

Summary#

There are many adjustments inside, such as speech speed, random character saving, special symbols, laughter, intervals, etc. This open source project has just been released, but it sounds very good and has great potential for growth.

You can create your own API to use, which is quite convenient.

Source:

2noise/ChatTTS

A generative speech model for daily dialogue.

Python370044014

This article is synchronized and updated to xLog by Mix Space
The original link is https://sunx.ai/posts/nlp/chattts