Automatic Speech Recognition (ASR)¶
spokestack.asr.spokestack.cloud_client module¶
This module contains the websocket logic used to communicate with Spokestack’s cloud-based ASR service.
-
exception
spokestack.asr.spokestack.cloud_client.
APIError
(response)[source]¶ Spokestack api error pass through
- Parameters
response (dict) – message from the api service
-
class
spokestack.asr.spokestack.cloud_client.
CloudClient
(key_id, key_secret, socket_url='wss://api.spokestack.io', audio_format='PCM16LE', sample_rate=16000, language='en', limit=10, idle_timeout=None)[source]¶ Spokestack client for cloud based speech to text
- Parameters
key_id (str) – identity from spokestack api credentials
key_secret (str) – secret key from spokestack api credentials
socket_url (str) – url for socket connection
audio_format (str) – format of input audio
sample_rate (int) – audio sample rate (kHz)
language (str) – language for recognition
limit (int) – Limit of messages per api response
idle_timeout (Any) – Time before client timeout. Defaults to None
-
property
idle_count
¶ current counter of idle time
- Return type
int
-
property
idle_timeout
¶ property for maximum idle time
- Return type
Any
-
property
is_connected
¶ status of the socket connection
- Return type
bool
-
property
is_final
¶ status of most recent sever response
- Return type
bool
-
property
response
¶ current response message
- Return type
dict
spokestack.asr.spokestack.speech_recognizer module¶
This module contains the recognizer for cloud based ASR in the speech pipeline
-
class
spokestack.asr.spokestack.speech_recognizer.
CloudSpeechRecognizer
(spokestack_id='', spokestack_secret='', language='en', sample_rate=16000, frame_width=20, idle_timeout=5000, **kwargs)[source]¶ Speech recognizer for use in the speech pipeline
- Parameters
spokestack_id (str) – identity under spokestack api credentials
spokestack_secret (str) – secret key from spokestack api credentials
language (str) – language recognized
sample_rate (int) – audio sample rate (kHz)
frame_width (int) – frame width of the audio (ms)
idle_timeout (int) – the number of iterations before the connection times out
spokestack.asr.google.speech_recognizer module¶
This module contains the google asr speech recognizer
-
class
spokestack.asr.google.speech_recognizer.
GoogleSpeechRecognizer
(language, credentials=None, sample_rate=16000, **kwargs)[source]¶ Transforms speech into text using Google’s ASR.
- Parameters
language (str) – The language of given audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: “en-US”
credentials (Union[None, str, dict]) – Dictionary of Google API credentials or path to credentials. if set to None credentials will be pulled from the environment variable: GOOGLE_APPLICATION_CREDENTIALS
sample_rate (int) – sample rate of the input audio (Hz)
**kwargs (optional) – additional keyword arguments
This module contains the Spokestack KeywordRecognizer which identifies multiple keywords from an audio stream.
-
class
spokestack.asr.keyword.tflite.
KeywordRecognizer
(classes, pre_emphasis=0.97, sample_rate=16000, fft_window_type='hann', fft_hop_length=10, model_dir='', posterior_threshold=0.5, **kwargs)[source]¶ Recognizes keywords in an audio stream.
- Parameters
classes (List[str]) – Keyword labels
pre_emphasis (float) – The value of the pre-emphasis filter
sample_rate (int) – The number of audio samples per second of audio (kHz)
fft_window_type (str) – The type of fft window. (only support for hann)
fft_hop_length (int) – Audio sliding window for STFT calculation (ms)
model_dir (str) – Path to the directory containing .tflite models
posterior_threshold (float) – Probability threshold for detection