|
Bavarian Archive for Speech Signals -
Services Schiel
|
|
|
BASSS System Evaluation
|
Unlike other software applications the functionality of
speech technology applications cannot be verified by simple
testing. In most cases the quality assurance of applications
involving Speech Recognition, Dialogue Handling or Speech Synthesis
requires a usability test with a representative group of test
users.
|
|
What is system evaluation?
|
The evaluation of a speech application consists either
in a series of
controlled tests of the application performed by a
selected group of test users or the test of the application in
an artificial test environment ('test bed') or both.
Usually all tests are monitored
(recorded) and later analysed with regard to pre-defined test
criteria. Test results are interpreted and
summarized in an evaluation report
that might be the basis of strategic decisions of the customer
(e.g. to use speech recognition in a product or not).
Therefore an objective and independent system evaluation
is of paramount importance.
|
|
System Evaluation Standards?
|
Since the field of speech driven applications is rather
young and such applications tend to be very different in
techniques used and procedures applied, there exist no
recognized evaluation standards for speech applications. Each
case must be analysed very carefully to determine those test
criteria that really enable the customer to come to a proper
judgement of the new technology.
Walker et al (1997) have proposed the PARADISE framework to evaluate
spoken dialogue systems. But unfortunately we found in several
evaluations that PARADISE is only applicable for a very simple
and spezialized form of dialogue system.
For the case of multi-modal speech applications
the BAS has developped another
evaluation scheme called PROMISE (Beringer et al. 2002)
which was successfully applied in the evaluation of the
SMARTKOM systems.
|
|
What can BASSS do for you?
|
Members of BASSS have a vast experience with system
evaluations (VERBMOBIL, SMARTKOM) and are therefore the ideal
partners for a pragmatic, objective and independent evaluation
of all kinds of speech applications. We distinguish between
holistic (or black-box) evaluations where the
overall performance of the complete system is measured in
usability tests and analytic evaluations where selected
components of a speech application undergo objective tests on
suitable reference LRs and/or psycho-physical tests.
|
|
|
Copyright 2005 BAS Services Schiel
Impressum: Florian Schiel, Moltkestr. 1, D-80803 München,
Germany, schiel@bas-services.de
|