Text to Speech (TTS)¶

Clients¶

Spokestack

Manager¶

This module contains the Spokestack text to speech manager which handles a text to speech client, decodes the returned audio, and writes the audio to the specified output.

class spokestack.tts.manager.SequenceIO(sequence)[source]¶: Wrapper that allows for incrementally received audio to be decoded.

class spokestack.tts.manager.TextToSpeechManager(client, output, format_='mp3')[source]¶

Manages tts client and io target.

Parameters

client (Any) – Text to speech client that returns encoded mp3 audio
output (Any) – Audio io target
format – Audio format, one of FORMAT_MP3 or FORMAT_PCM16

close()[source]¶

Closes the client and output.

Return type: None

synthesize(utterance, mode='text', voice='demo-male', profile='default')[source]¶

Synthesizes the given utterance with the voice and format provided.

Text can be formatted as plain text (mode=”text”), SSML (mode=”ssml”), or Speech Markdown (mode=”markdown”).

This method also supports different formats for the synthesized audio via the profile argument. The supported profiles and their associated formats are:

Parameters

utterance (str) – string that needs to be rendered as speech.
mode (str) – synthesis mode to use with utterance. text, ssml, markdown, etc.
voice (str) – name of the tts voice.
profile (str) – name of the audio profile used to create the resulting stream.

Return type

None

TTS-Lite¶

Spokestack-Lite Speech Synthesizer

This module contains the SpeechSynthesizer class used to convert text to speech using local TTS models trained on the Spokestack platform. A SpeechSynthesizer instance can be passed to the TextToSpeechManager for playback.

Example

This example assumes that a TTS model was downloaded from the Spokestack platform and extracted to the model directory.

from spokestack.io.pyaudio import PyAudioOutput
from spokestack.tts.manager import TextToSpeechManager, FORMAT_PCM16
from spokestack.tts.lite import SpeechSynthesizer, BLOCK_LENGTH, SAMPLE_RATE

tts = TextToSpeechManager(
    SpeechSynthesizer("./model"),
    PyAudioOutput(sample_rate=SAMPLE_RATE, frames_per_buffer=BLOCK_LENGTH),
    format_=FORMAT_PCM16)

tts.synthesize("Hello world!")

class spokestack.tts.lite.SpeechSynthesizer(model_path)[source]¶

Initialize a new lightweight speech synthesizer

Parameters: model_path (str) – Path to the extracted TTS model downloaded from the Spokestack platform

synthesize(utterance, *_args, **_kwargs)[source]¶

Synthesize a text utterance to speech audio

Parameters: utterance (str) – The text string to synthesize
Returns: A generator for returns a sequence of PCM-16 numpy audio blocks for playback, storage, etc.
Return type: Iterator[np.array]