Speech Corpora Validation

Technical Validation of Language Resources (LRs) is a core competence of BAS Services Schiel (BASSS).
What is validation?

In the context of LRs the validation process comprises all quality checks with regard to either the specification or the documentation of the LR.

For example, the validation of a speech corpus will include roughly the following quality checks:

  • Documentation complete and error free
  • Speech signal quality checks, formal checks, completeness
  • Annotations: sample checks, formal checks, correctness, completeness
  • Meta Data: formal checks, correctness, completeness
The results of the validation are summarized in a validation report that enables the customer to evaluate the current value of the LR.

Why validate a LR?

There are several reasons to mandate BASSS with an LR validation:
The first and obvious is independent quality assurance. This can be best guaranteed by pre-validation checks during the production process and a final validation after completion.
The second reason is to obtain an independent evaluation regarding the quality of an existing LR.

Validation Guidelines

Although individual LR validations depend on the nature of the LR and of course also on the intentions of the customer, BASSS has compiled a vademecum for the 'best practise' of LR validation, which can be taken as a basis for any speech corpus validation. BASSS also offers 1-day or 3-day tutorials on this topic.

Validation Techniques
BASSS has developed a web based validation tool for the manual validation of large speech samples that enables us to minimize logistic efforts and errors during a LR validation. Please refer to our scientific publications for a closer look at WebTranscribe.

