Letter Name Examples
Overview
The <letter>
markup is used to create letter name targets. The pronunciation
attribute can also be used when a specific letter pronunciation is required.
Individual letter names
This is for a use case where a student repeats, reads, or calls out individual letter names (i.e., the alphabet letter names).
Example: “Say the name of the letter G”
Input target:
<letter>g</letter>
JSON output:
One
quality_score
value for the letter “g”,token_type
: “letter”Example [audio was g]:
"results": [{ "hypothesis_score": 93.0, "duration": 2.04, "hypothesis_duration": 0.75, "category": "g", "end": 1.44, "start": 0.69, "word_breakdown": [{ "duration": 0.75, "quality_score": 93.0, "token_type": "letter", "end": 1.44, "start": 0.69, "phone_breakdown": [{ "duration": 0.18, "quality_score": 88.0, "end": 0.87, "start": 0.69, "phone": "jh" }, { "duration": 0.57, "quality_score": 92.0, "end": 1.44, "start": 0.87, "phone": "iy" }], "word": "g", "target_transcription": "jh iy" }] }]
Working with multiple pronunciations
Example: “Say the name of the letter A”
Multiple pronunciations are available for some letters (e.g., “a” and “z”). To ensure the SoapBox voice engine verifies the correct pronunciation, the pronunciation
attribute is used with the <letter>
tag.
Input target:
<letter pronunciation="ey"> a </letter>
or
<letter pronunciation="ah"> a </letter>
JSON output:
One
quality_score
value for “a”, (with/ey/
pronunciation)token_type
: “letter”Example [audio was /ey/]:
"results": [{ "hypothesis_score": 94, "duration": 2.46, "hypothesis_duration": 0.69, "category": "a", "end": 1.83, "start": 1.14, "word_breakdown": [ { "duration": 0.69, "quality_score": 94, "token_type": "letter", "end": 1.83, "start": 1.14, "phone_breakdown": [ { "duration": 0.69, "quality_score": 91, "end": 1.83, "start": 1.14, "phone": "ey" } ], "word": "a", "target_transcription": "ey" }] }]
Working with multiple letters as targets
There are two ways to input multiple letter names as targets:
When the order of the response does NOT matter
When the order of the response DOES matter
When the order of the response does NOT matter
This is when the order of the response from the student is not important and saying the letter names in any order is a valid response.
Example: “Say the name of the letters C E K H R”
Input target: Use multiple targets
<letter>c</letter> <letter>k</letter> <letter>e</letter> <letter>h</letter> <letter>r</letter>
JSON output:
Several
quality_score
values, one per each lettertoken_type
: “letter”Example [audio was c k e h r]:
"results": [ { "hypothesis_score": 95, "duration": 13.02, "hypothesis_duration": 0.78, "category": "c", "end": 1.32, "start": 0.54, "word_breakdown": [ { "duration": 0.78, "quality_score": 95, "token_type": "letter", "end": 1.32, "start": 0.54, "phone_breakdown": [ { "duration": 0.24, "quality_score": 89, "end": 0.78, "start": 0.54, "phone": "s" }, { "duration": 0.54, "quality_score": 95, "end": 1.32, "start": 0.78, "phone": "iy" } ], "word": "c", "target_transcription": "s iy" } ] }, { "hypothesis_score": 95, "duration": 13.02, "hypothesis_duration": 0.69, "category": "k", "end": 3.63, "start": 2.94, "word_breakdown": [ { "duration": 0.69, "quality_score": 95, "token_type": "letter", "end": 3.63, "start": 2.94, "phone_breakdown": [ { "duration": 0.15, "quality_score": 99, "end": 3.09, "start": 2.94, "phone": "k" }, { "duration": 0.54, "quality_score": 86, "end": 3.63, "start": 3.09, "phone": "ey" } ], "word": "k", "target_transcription": "k ey" } ] }, { "hypothesis_score": 97, "duration": 13.02, "hypothesis_duration": 0.54, "category": "e", "end": 1.32, "start": 0.78, "word_breakdown": [ { "duration": 0.54, "quality_score": 97, "token_type": "letter", "end": 1.32, "start": 0.78, "phone_breakdown": [ { "duration": 0.54, "quality_score": 95, "end": 1.32, "start": 0.78, "phone": "iy" } ], "word": "e", "target_transcription": "iy" } ] }, { "hypothesis_score": 91, "duration": 13.02, "hypothesis_duration": 0.75, "category": "h", "end": 9.69, "start": 8.94, "word_breakdown": [ { "duration": 0.75, "quality_score": 91, "token_type": "letter", "end": 9.69, "start": 8.94, "phone_breakdown": [ { "duration": 0.33, "quality_score": 88, "end": 9.27, "start": 8.94, "phone": "ey" }, { "duration": 0.42, "quality_score": 87, "end": 9.69, "start": 9.27, "phone": "ch" } ], "word": "h", "target_transcription": "ey ch" } ] }, { "hypothesis_score": 57, "duration": 13.02, "hypothesis_duration": 0.6, "category": "r", "end": 12.09, "start": 11.49, "word_breakdown": [ { "duration": 0.6, "quality_score": 57, "token_type": "letter", "end": 12.09, "start": 11.49, "phone_breakdown": [ { "duration": 0.45, "quality_score": 57, "end": 11.94, "start": 11.49, "phone": "aa" }, { "duration": 0.15, "quality_score": 36, "end": 12.09, "start": 11.94, "phone": "r" } ], "word": "r", "target_transcription": "aa r" }] }]
When the order of the response DOES matter
This is when the order of the response from the student is important.
Letters are expected to be produced in the order given in the target. If the student says them in a different order, misplaced letters are marked as deleted.
Example: “Say the name of the letters C E K H R”
Input target: use a single target:
<letter>c</letter> <letter>e</letter> <letter>K</letter> <letter>h</letter> <letter>r</letter>
JSON output:
Several
quality_score
values, one per each lettertoken_type
: “letter”Example [audio was c k e h r] (different order, e is marked as deletion):
"results": [ { "hypothesis_score": 68, "duration": 13.02, "hypothesis_duration": 11.55, "category": "c e k h r", "end": 12.09, "start": 0.54, "word_breakdown": [ { "duration": 0.78, "quality_score": 95, "token_type": "letter", "end": 1.32, "start": 0.54, "phone_breakdown": [ { "duration": 0.24, "quality_score": 89, "end": 0.78, "start": 0.54, "phone": "s" }, { "duration": 0.54, "quality_score": 95, "end": 1.32, "start": 0.78, "phone": "iy" } ], "word": "c", "target_transcription": "s iy" }, { "duration": 0, "quality_score": 0, "token_type": "letter", "end": -1, "start": -1, "phone_breakdown": [ { "duration": 0, "quality_score": 0, "end": 0, "start": 0, "phone": "iy" } ], "word": "e", "target_transcription": "iy" }, { "duration": 0.69, "quality_score": 95, "token_type": "letter", "end": 3.63, "start": 2.94, "phone_breakdown": [ { "duration": 0.15, "quality_score": 99, "end": 3.09, "start": 2.94, "phone": "k" }, { "duration": 0.54, "quality_score": 86, "end": 3.63, "start": 3.09, "phone": "ey" } ], "word": "k", "target_transcription": "k ey" }, { "duration": 0.75, "quality_score": 91, "token_type": "letter", "end": 9.69, "start": 8.94, "phone_breakdown": [ { "duration": 0.33, "quality_score": 88, "end": 9.27, "start": 8.94, "phone": "ey" }, { "duration": 0.42, "quality_score": 87, "end": 9.69, "start": 9.27, "phone": "ch" } ], "word": "h", "target_transcription": "ey ch" }, { "duration": 0.6, "quality_score": 57, "token_type": "letter", "end": 12.09, "start": 11.49, "phone_breakdown": [ { "duration": 0.45, "quality_score": 57, "end": 11.94, "start": 11.49, "phone": "aa" }, { "duration": 0.15, "quality_score": 36, "end": 12.09, "start": 11.94, "phone": "r" } ], "word": "r", "target_transcription": "aa r" } ] } ]
Working with repeated targets (e.g., in a DIBELS-type test)
In a test (e.g., a DIBELS-type test) a student may be presented with multiple letters where order is important and letters often appear in a sequence more than once. To achieve the desired results, single targets have to be used and all expected repetitions have to be in the target (i.e., the double “o” in the example below).
Example: “Tell me the name of each of these letters: E L h g x t m S O o”
Input target: use a single target
<letter>E</letter> <letter>L</letter> <letter>h</letter> <letter>g</letter> <letter>x</letter> <letter>t</letter> <letter>m</letter> <letter>S</letter> <letter>O</letter> <letter>o</letter>
JSON output:
One word-level quality_score
per letter (capitalization is normalized to lower case)
token_type
: “letter”
Example [audio was e l h g x t m s o o]:
"results": [ { "hypothesis_score": 87, "duration": 12.3, "hypothesis_duration": 11.58, "category": "e l h g x t m s o o", "end": 11.94, "start": 0.36, "word_breakdown": [ { "duration": 0.66, "quality_score": 94, "token_type": "letter", "end": 1.02, "start": 0.36, "phone_breakdown": [ { "duration": 0.66, "quality_score": 90, "end": 1.02, "start": 0.36, "phone": "iy" } ], "word": "e", "target_transcription": "iy" }, { "duration": 0.69, "quality_score": 66, "token_type": "letter", "end": 2.07, "start": 1.38, "phone_breakdown": [ { "duration": 0.42, "quality_score": 49, "end": 1.8, "start": 1.38, "phone": "eh" }, { "duration": 0.27, "quality_score": 62, "end": 2.07, "start": 1.8, "phone": "l" } ], "word": "l", "target_transcription": "eh l" }, { "duration": 0.72, "quality_score": 95, "token_type": "letter", "end": 3.03, "start": 2.31, "phone_breakdown": [ { "duration": 0.36, "quality_score": 92, "end": 2.67, "start": 2.31, "phone": "ey" }, { "duration": 0.36, "quality_score": 93, "end": 3.03, "start": 2.67, "phone": "ch" } ], "word": "h", "target_transcription": "ey ch" }, { "duration": 0.66, "quality_score": 96, "token_type": "letter", "end": 4.08, "start": 3.42, "phone_breakdown": [ { "duration": 0.21, "quality_score": 94, "end": 3.63, "start": 3.42, "phone": "jh" }, { "duration": 0.45, "quality_score": 94, "end": 4.08, "start": 3.63, "phone": "iy" } ], "word": "g", "target_transcription": "jh iy" }, { "duration": 0.75, "quality_score": 93, "token_type": "letter", "end": 5.37, "start": 4.62, "phone_breakdown": [ { "duration": 0.27, "quality_score": 81, "end": 4.89, "start": 4.62, "phone": "eh" }, { "duration": 0.18, "quality_score": 93, "end": 5.07, "start": 4.89, "phone": "k" }, { "duration": 0.3, "quality_score": 96, "end": 5.37, "start": 5.07, "phone": "s" } ], "word": "x", "target_transcription": "eh k s" }, { "duration": 0.66, "quality_score": 98, "token_type": "letter", "end": 6.45, "start": 5.79, "phone_breakdown": [ { "duration": 0.15, "quality_score": 96, "end": 5.94, "start": 5.79, "phone": "t" }, { "duration": 0.51, "quality_score": 97, "end": 6.45, "start": 5.94, "phone": "iy" } ], "word": "t", "target_transcription": "t iy" }, { "duration": 0.63, "quality_score": 73, "token_type": "letter", "end": 7.62, "start": 6.99, "phone_breakdown": [ { "duration": 0.33, "quality_score": 56, "end": 7.32, "start": 6.99, "phone": "eh" }, { "duration": 0.3, "quality_score": 73, "end": 7.62, "start": 7.32, "phone": "m" } ], "word": "m", "target_transcription": "eh m" }, { "duration": 0.66, "quality_score": 84, "token_type": "letter", "end": 8.91, "start": 8.25, "phone_breakdown": [ { "duration": 0.3, "quality_score": 63, "end": 8.55, "start": 8.25, "phone": "eh" }, { "duration": 0.36, "quality_score": 98, "end": 8.91, "start": 8.55, "phone": "s" } ], "word": "s", "target_transcription": "eh s" }, { "duration": 0.69, "quality_score": 88, "token_type": "letter", "end": 10.44, "start": 9.75, "phone_breakdown": [ { "duration": 0.69, "quality_score": 83, "end": 10.44, "start": 9.75, "phone": "ow" } ], "word": "o", "target_transcription": "ow" }, { "duration": 0.69, "quality_score": 88, "token_type": "letter", "end": 11.94, "start": 11.25, "phone_breakdown": [ { "duration": 0.69, "quality_score": 82, "end": 11.94, "start": 11.25, "phone": "ow" } ], "word": "o", "target_transcription": "ow" }] }]
Differentiating between letter names and letter sounds
A common scenario may be for an educator to check whether a student is saying a letter name rather than a letter sound. In some cases, this is straightforward. In others, this is more complex. See this explanation here for more details.