Understanding Speech-To-Text

To trial our STT service please contact us through our Support Desk.

Speech-To-Text is a service designed specifically to transcribe children’s speech. Given an audio file containing children’s speech, the service will return a text transcription of the words spoken within the audio file as well as the confidence score for each word. We also return start and end times for each word.

When to Use Speech-To-Text


Speech-To-Text can be used in a wide variety of scenarios. From dictation to language learning, keyword spotting to search; there are a huge amount of use cases that our Speech-To-Text service can be integrated with.

SoapBox uses its proprietary acoustic models (AM), pronunciation dictionary and custom language model (CLM) to create a speech-to-text system that underpins the Fluency service. Our acoustic models are trained on thousands of hours of child speech data in real world conditions and cover a broad range of accents, dialects and settings, to ensure the most equitable and accurate speech recognition performance on children’s voices.

Custom Language Model (CLM)


A language model is a core part of a speech recognition engine. It is used to transcribe or recognise speech present in an audio file. It contains a large list of words, word combinations and their probability of occurrence.

You can find more information on language modelling here.

For child speech recognition accuracy is important. Most off-the-shelf speech-to-text providers use a generic language model that covers most use cases . This works fine in some scenarios but such off-the-shelf solutions tend not to perform well for children's speech. For this reason we provide the ability for every customer to customise their own solution, effectively creating a bespoke solution with high accuracy for children's speech and your content. 

SoapBox Speech-To-Text and Fluency require a CLM for every client to ensure the optimum results for their use-case. We will work with you to deliver your customised solution in a quick and efficient manner.

More details around Custom Language Models can be found at Understanding Custom Language Models

To find out more about creating a CLM for your use case, please contact support via our Support Desk or at api@soapboxlabs.com.

Technical Docs


Further documentation regarding our Speech-To-Text service is available here.