Text to Speech (TTS)¶
Clients¶
Manager¶
This module contains the Spokestack text to speech manager which handles a text to speech client, decodes the returned audio, and writes the audio to the specified output.
-
class
spokestack.tts.manager.
SequenceIO
(sequence)[source]¶ Wrapper that allows for incrementally received audio to be decoded.
-
class
spokestack.tts.manager.
TextToSpeechManager
(client, output, format_='mp3')[source]¶ Manages tts client and io target.
- Parameters
client (
Any
) – Text to speech client that returns encoded mp3 audiooutput (
Any
) – Audio io targetformat – Audio format, one of FORMAT_MP3 or FORMAT_PCM16
-
synthesize
(utterance, mode='text', voice='demo-male', profile='default')[source]¶ Synthesizes the given utterance with the voice and format provided.
Text can be formatted as plain text (mode=”text”), SSML (mode=”ssml”), or Speech Markdown (mode=”markdown”).
This method also supports different formats for the synthesized audio via the profile argument. The supported profiles and their associated formats are:
- Parameters
utterance (str) – string that needs to be rendered as speech.
mode (str) – synthesis mode to use with utterance. text, ssml, markdown, etc.
voice (str) – name of the tts voice.
profile (str) – name of the audio profile used to create the resulting stream.
- Return type
None
TTS-Lite¶
Spokestack-Lite Speech Synthesizer
This module contains the SpeechSynthesizer class used to convert text to speech using local TTS models trained on the Spokestack platform. A SpeechSynthesizer instance can be passed to the TextToSpeechManager for playback.
Example
This example assumes that a TTS model was downloaded from the Spokestack
platform and extracted to the model
directory.
from spokestack.io.pyaudio import PyAudioOutput
from spokestack.tts.manager import TextToSpeechManager, FORMAT_PCM16
from spokestack.tts.lite import SpeechSynthesizer, BLOCK_LENGTH, SAMPLE_RATE
tts = TextToSpeechManager(
SpeechSynthesizer("./model"),
PyAudioOutput(sample_rate=SAMPLE_RATE, frames_per_buffer=BLOCK_LENGTH),
format_=FORMAT_PCM16)
tts.synthesize("Hello world!")