Letter Sound Examples
Overview
This page covers how to create targets when using the <sound-out>
markup
Individual letter sounds
Multiple letter sounds
All the letter sounds in a word
Letter sounds that require 2 phonemes
Individual letter sounds
A common use case of the SoapBox voice engine is when a student repeats, reads, or calls out individual letter sounds (i.e., phonemes).
The <sound-out>
markup is used to create these targets.
The pronunciation
attribute is always required when using isolated phonemes as targets: e.g., <sound-out pronunciation=”sh”>sh</sound-out>.
See Working With Short Sounds for more details about the pronunciation
tag.
Example: “Tell me the SOUND of the letter G”
Input target
<sound-out pronunciation="g">g</sound-out>
JSON output
one
quality_score
value for letter “g”,token_type
: “sound-out”Example [audio was /g/]:
"results": [{ "hypothesis_score": 98.0, "duration": 4.2, "hypothesis_duration": 0.18, "category": "g", "end": 2.7, "start": 2.52, "word_breakdown": [{ "duration": 0.18, "quality_score": 98.0, "token_type": "sound-out", "end": 2.7, "start": 2.52, "phone_breakdown": [{ "duration": 0.18, "quality_score": 96.0, "end": 2.7, "start": 2.52, "phone": "g" }], "word": "g", "target_transcription": "g" }] }]
Working with multiple letters sounds as targets
There are two ways to input multiple letter sounds as targets:
When the order of the response does NOT matter
When the order of the response DOES matter
When the order of the response does NOT matter
This is when the order of the response from the student is not important and saying the sounds in any order is a valid response.
Example: “Tell me the sound of each of the letters: P S I L M”
Input target: multiple targets
<sound-out pronunciation="p">p</sound-out> <sound-out pronunciation="s">s</sound-out> <sound-out pronunciation="ih">i</sound-out> <sound-out pronunciation="l">l</sound-out> <sound-out pronunciation="m">m</sound-out>
JSON output:
Several
quality_score
values, one per letter.token_type
: “sound-out”Example [audio was /p/ /ih/ /s/ /l/ /m/]:
Note: the student said /ih/ and /s/ in a different order to the prompt
"results": [ { "hypothesis_score": 89, "duration": 8.61, "hypothesis_duration": 0.45, "category": "p", "end": 0.99, "start": 0.54, "word_breakdown": [ { "duration": 0.45, "quality_score": 89, "token_type": "sound-out", "end": 0.99, "start": 0.54, "phone_breakdown": [ { "duration": 0.45, "quality_score": 85, "end": 0.99, "start": 0.54, "phone": "p" } ], "word": "p", "target_transcription": "p" } ] }, { "hypothesis_score": 87, "duration": 8.61, "hypothesis_duration": 1.2, "category": "s", "end": 3.36, "start": 2.16, "word_breakdown": [ { "duration": 1.2, "quality_score": 87, "token_type": "sound-out", "end": 3.36, "start": 2.16, "phone_breakdown": [ { "duration": 1.2, "quality_score": 81, "end": 3.36, "start": 2.16, "phone": "s" } ], "word": "s", "target_transcription": "s" } ] }, { "hypothesis_score": 78, "duration": 8.61, "hypothesis_duration": 0.33, "category": "i", "end": 4.71, "start": 4.38, "word_breakdown": [ { "duration": 0.33, "quality_score": 78, "token_type": "sound-out", "end": 4.71, "start": 4.38, "phone_breakdown": [ { "duration": 0.33, "quality_score": 70, "end": 4.71, "start": 4.38, "phone": "ih" } ], "word": "i", "target_transcription": "ih" } ] }, { "hypothesis_score": 73, "duration": 8.61, "hypothesis_duration": 0.66, "category": "l", "end": 6.18, "start": 5.52, "word_breakdown": [ { "duration": 0.66, "quality_score": 73, "token_type": "sound-out", "end": 6.18, "start": 5.52, "phone_breakdown": [ { "duration": 0.66, "quality_score": 64, "end": 6.18, "start": 5.52, "phone": "l" } ], "word": "l", "target_transcription": "l" } ] }, { "hypothesis_score": 77, "duration": 8.61, "hypothesis_duration": 1.32, "category": "m", "end": 8.34, "start": 7.02, "word_breakdown": [ { "duration": 1.32, "quality_score": 77, "token_type": "sound-out", "end": 8.34, "start": 7.02, "phone_breakdown": [ { "duration": 1.32, "quality_score": 69, "end": 8.34, "start": 7.02, "phone": "m" } ], "word": "m", "target_transcription": "m" }] }]
When the order of the response DOES matters
This is when the order of the response from the student is important.
The sounds are expected to be produced in the order given in the target. If student says them in a different order, misplaced sounds are marked as deleted.
Example: “Tell me the sounds of the letters P S I L M”
Input target: Use a single target:
<sound-out pronunciation="p">p</sound-out> <sound-out pronunciation="s">s</sound-out> <sound-out pronunciation="ih">i</sound-out> <sound-out pronunciation="l">l</sound-out> <sound-out pronunciation="m">m</sound-out>
JSON output:
Several
quality_score
values, one per letter.token_type
: “sound-out”Example [audio was /p/ /s/ /ih/ /l/ /m/]:
"results": [{ "hypothesis_score": 79.0, "duration": 8.61, "hypothesis_duration": 7.8, "category": "p s i l m", "end": 8.34, "start": 0.54, "word_breakdown": [{ "duration": 7.8, "quality_score": 87.0, "token_type": "sound-out", "end": 8.34, "start": 0.54, "phone_breakdown": [{ "duration": 0.57, "quality_score": 82.0, "end": 1.11, "start": 0.54, "phone": "p" }], "word": "p", "target_transcription": "p" }, { "duration": 7.8, "quality_score": 86.0, "token_type": "sound-out", "end": 8.34, "start": 0.54, "phone_breakdown": [{ "duration": 1.2, "quality_score": 81.0, "end": 3.36, "start": 2.16, "phone": "s" }], "word": "s", "target_transcription": "s" }, { "duration": 7.8, "quality_score": 77.0, "token_type": "sound-out", "end": 8.34, "start": 0.54, "phone_breakdown": [{ "duration": 0.33, "quality_score": 68.0, "end": 4.71, "start": 4.38, "phone": "ih" }], "word": "i", "target_transcription": "ih" }, { "duration": 7.8, "quality_score": 73.0, "token_type": "sound-out", "end": 8.34, "start": 0.54, "phone_breakdown": [{ "duration": 0.66, "quality_score": 64.0, "end": 6.18, "start": 5.52, "phone": "l" }], "word": "l", "target_transcription": "l" }, { "duration": 7.8, "quality_score": 73.0, "token_type": "sound-out", "end": 8.34, "start": 0.54, "phone_breakdown": [{ "duration": 1.32, "quality_score": 64.0, "end": 8.34, "start": 7.02, "phone": "m" }], "word": "m", "target_transcription": "m" }] }]
All the letter sounds in a word
In a use case where a student is required to read or call out all the letter sounds (i.e., phonemes) in a word, the word can be used as the target.
The <sound-out>
markup is used to create these targets: <sound-out>dog</sound-out>
.
Note: The pronunciation
attribute is required only when using a target that is not in the dictionary.
Example: “Tell me the ALL SOUNDS in the word dog”
Input target:
<sound-out>dog</sound-out>
JSON output:
Phonemes are scored, as they were produced singularly (not as part of a word).
One
quality_score
value for the word “dog” that is the combination of the isolated phonemequality_score
values.token_type
: “sound-out”Example [audio was /d/ /ao/ /g/]:
"results": [ { "hypothesis_score": 71, "duration": 3.48, "hypothesis_duration": 1.86, "category": "dog", "end": 2.55, "start": 0.69, "word_breakdown": [ { "duration": 1.86, "quality_score": 71, "token_type": "sound-out", "end": 2.55, "start": 0.69, "phone_breakdown": [ { "duration": 0.24, "quality_score": 64, "end": 0.93, "start": 0.69, "phone": "d" }, { "duration": 0.3, "quality_score": 56, "end": 1.89, "start": 1.59, "phone": "ao" }, { "duration": 0.15, "quality_score": 61, "end": 2.55, "start": 2.4, "phone": "g" } ], "word": "dog", "target_transcription": "d ao g" } ] } ]
Working with letter sounds that require 2 phonemes
Some letter sounds like those for “x” and “q” require 2 or more phonemes to be pronounced. The <custom-word>
markup must be used in this use case as no pause is expected between the phonemes.
Example: “Tell me the sound of the letter X”
Input target:
<custom-word pronunciation="k s"> x </custom-word>
JSON output:
one
quality_score
value for “x”,token_type
: "custom-word"Example [audio was /k s/]:
"results": [{ "category": "x", "hypothesis_score": 93, "word_breakdown": [{ "quality_score": 93, "target_transcription": "k s", "word": "x", "token_type": "custom-word", "phone_breakdown": [{ "phone": "k", "quality_score": 91, "end": 0.51, "start": 0.39, "duration": 0.12 },{ "phone": "s", "quality_score": 88, "end": 1.14, "start": 0.51, "duration": 0.63 }], "end": 1.14, "start": 0.39, "pitch": { "values": [] }, "duration": 0.75 }], "end": 1.14, "start": 0.39, "duration": 1.53, "hypothesis_duration": 0.75 }]
Short sounds guide and best practices
Additional details for working with Short Sounds can be found here: Working With Short Sounds .