Custom Words and Pronunciations Examples
Overview
The <custom-word>
tag is used for custom or made-up words or pronunciations. The pronunciation
attribute is compulsory. This is so the SoapBox engine knows what the phoneme breakdown is, since the custom word pronunciations may not be part of our pre-trained recognition models.
Use cases for <custom-word> markup
Three common use cases for the <custom-word> tag are
Nonsense and out-of-vocabulary words
Proper nouns
Defining a non-default pronunciation for a word
Nonsense and out-of-vocabulary words
Example: “Say the word voos”
The nonsense word voos
is not in the engine, therefore, without the <custom-word>
markup, this target returns an error.
Input targets:
<custom-word pronunciation = "v uw s">voos</custom-word>
JSON output:
One
quality_score
token_type
: “custom-word“Example [audio was “/v uw s/“]:
"results": [ { "hypothesis_score": 98, "duration": 2.04, "hypothesis_duration": 0.93, "category": "voos", "end": 1.5, "start": 0.57, "word_breakdown": [ { "duration": 0.93, "quality_score": 98, "token_type": "custom-word", "end": 1.5, "start": 0.57, "phone_breakdown": [ { "duration": 0.24, "quality_score": 97, "end": 0.81, "start": 0.57, "phone": "v" }, { "duration": 0.12, "quality_score": 100, "end": 0.93, "start": 0.81, "phone": "uw" }, { "duration": 0.57, "quality_score": 95, "end": 1.5, "start": 0.93, "phone": "s" } ], "word": "voos", "target_transcription": "v uw s" } ] }
Proper Nouns
Example: “Can you say the main character’s name, Phonzy”
Often proper names are out-of-vocabulary words. The <custom-word>
markup can be used to enter these targets.
Input targets:
<custom-word pronunciation = "f ao n z iy">Phonzy</custom-word>
JSON output:
One
quality_score
token_type
: “custom-word“Example [audio was “/f ao n z iy/“]:
"results": [ { "hypothesis_score": 88, "duration": 2.28, "hypothesis_duration": 0.9, "category": "phonzy", "end": 1.47, "start": 0.57, "word_breakdown": [ { "duration": 0.9, "quality_score": 88, "token_type": "custom-word", "end": 1.47, "start": 0.57, "phone_breakdown": [ { "duration": 0.21, "quality_score": 98, "end": 0.78, "start": 0.57, "phone": "f" }, { "duration": 0.12, "quality_score": 71, "end": 0.9, "start": 0.78, "phone": "ao" }, { "duration": 0.06, "quality_score": 95, "end": 0.96, "start": 0.9, "phone": "n" }, { "duration": 0.21, "quality_score": 66, "end": 1.17, "start": 0.96, "phone": "z" }, { "duration": 0.3, "quality_score": 97, "end": 1.47, "start": 1.17, "phone": "iy" } ], "word": "phonzy", "target_transcription": "f ao n z iy" } ] } ]
Defining a non-default pronunciation for a word
The <custom-word>
tag can be used to to distinguish or specify certain pronunciations of words. Since languages have different dialects, one use case for the <custom-word>
markup is to ensure that a child is using the correct pronunciation for a specific dialect.
Example: “Say the word tomato”
For example, the word tomato
can have 2 pronunciations: /t ah m aa t ow/
and /t ah m ey t ow/
.
Input targets:
Please note that it might be useful to add a suffix to the content word in the markup tag (e.g., tomato
can become tomatoaa
and tomatoey
) to differentiate the two pronunciations in the JSON output more easily. Underscores or other special characters can be used, but they will be deleted in the JSON output.
<custom-word pronunciation = "t ah m aa t ow">tomatoaa</custom-word>
<custom-word pronunciation = "t ah m ey t ow">tomatoey</custom-word>
JSON output:
One
quality_score
value per target wordtoken_type
: “custom-word“Example [audio was “/t ah m aa t ow/“]:
"results": [ { "hypothesis_score": 87, "duration": 2.82, "hypothesis_duration": 1.14, "category": "tomatoaa", "end": 1.92, "start": 0.78, "word_breakdown": [ { "duration": 1.14, "quality_score": 87, "token_type": "custom-word", "end": 1.92, "start": 0.78, "phone_breakdown": [ { "duration": 0.12, "quality_score": 92, "end": 0.9, "start": 0.78, "phone": "t" }, { "duration": 0.03, "quality_score": 87, "end": 0.93, "start": 0.9, "phone": "ah" }, { "duration": 0.06, "quality_score": 100, "end": 0.99, "start": 0.93, "phone": "m" }, { "duration": 0.24, "quality_score": 61, "end": 1.23, "start": 0.99, "phone": "aa" }, { "duration": 0.12, "quality_score": 95, "end": 1.35, "start": 1.23, "phone": "t" }, { "duration": 0.57, "quality_score": 63, "end": 1.92, "start": 1.35, "phone": "ow" } ], "word": "tomatoaa", "target_transcription": "t ah m aa t ow" } ] }, { "hypothesis_score": 50, "duration": 2.82, "hypothesis_duration": 1.14, "category": "tomatoey", "end": 1.92, "start": 0.78, "word_breakdown": [ { "duration": 1.14, "quality_score": 50, "token_type": "custom-word", "end": 1.92, "start": 0.78, "phone_breakdown": [ { "duration": 0.12, "quality_score": 92, "end": 0.9, "start": 0.78, "phone": "t" }, { "duration": 0.03, "quality_score": 87, "end": 0.93, "start": 0.9, "phone": "ah" }, { "duration": 0.06, "quality_score": 100, "end": 0.99, "start": 0.93, "phone": "m" }, { "duration": 0.24, "quality_score": 5, "end": 1.23, "start": 0.99, "phone": "ey" }, { "duration": 0.12, "quality_score": 95, "end": 1.35, "start": 1.23, "phone": "t" }, { "duration": 0.57, "quality_score": 63, "end": 1.92, "start": 1.35, "phone": "ow" } ], "word": "tomatoey", "target_transcription": "t ah m ey t ow" } ] } ]