speech-dispatcher: Basic Design
1.2 Design
==========
Current Design
==============
The communication between all applications and synthesizers, when
implemented directly, is a mess. For this purpose, we wanted Speech
Dispatcher to be a layer separating applications and synthesizers so
that applications wouldn't have to care about synthesizers and
synthesizers wouldn't have to care about interaction with applications.
We decided we would implement Speech Dispatcher as a server receiving
commands from applications over a protocol called 'SSIP', parsing them
if needed, and calling the appropriate functions of output modules
communicating with the different synthesizers. These output modules are
implemented as plug-ins, so that the user can just load a new module if
he wants to use a new synthesizer.
Each client (application that wants to speak) opens a socket
connection to Speech Dispatcher and calls functions like say(), stop(),
and pause() provided by a library implementing the protocol. This
shared library is still on the client side and sends Speech Dispatcher
SSIP commands over the socket. When the messages arrive at Speech
Dispatcher, it parses them, reads the text that should be said and puts
it in one of several queues according to the priority of the message and
other criteria. It then decides when, with which parameters (set up by
the client and the user), and on which synthesizer it will say the
message. These requests are handled by the output plug-ins (output
modules) for different hardware and software synthesizers and then said
aloud.
[Speech Dispatcher architecture]
See also the detailed description ⇒Client Programming
interfaces, and ⇒Server Programming documentation.
Future Design
=============
Speech Dispatcher currently mixes two important features: common
low-level interface to multiple speech synthesizers and message
management (including priorities and history). This became even more
evident when we started thinking about handling messages intended for
output on braille devices. Such messages of course need to be
synchronized with speech messages and there is little reason why the
accessibility tools should send the same message twice for these two
different kinds of output used by blind people (often simultaneously).
Outside the world of accessibility, applications also want to either
have full control over the sound (bypass prioritisation) or to only
retrieve the synthesized data, but not play them immediatelly.
We want to eventually split Speech Dispatcher into two independent
components: one providing a low-level interface to speech synthesis
drivers, which we now call TTS API Provider and is already largely
implemented in the Free(b)Soft project, and the second doing message
managemenet, called Message Dispatcher. This will allow Message
Dispatcher to also output on Braille as well as to use the TTS API
Provider separately.
From implementation point of view, an opportunity for new design
based on our previous experiences allowed us to remove several
bottlenecks for speed (responsiveness), ease of use and ease of
implementation of extensions (particularly output modules for new
synthesizers). From the architecture point of view and possibilities
for new developments, we are entirely convinced that both the new design
in general and the inner design of the new components is much better.
While a good API and its implementation for Braille are already
existent in the form of BrlAPI, the API for speech is now under
developement. Please see another architecture diagram showing how we
imagine Message Dispatcher in the future.
[Speech Dispatcher architecture]
References: <http://www.freebsoft.org/tts-api/>
<http://www.freebsoft.org/tts-api-provider/>