Short Sounds Feature
Overview
Short Sounds V1 is the first release of SoapBox Labs' Verification API feature optimized to tackle the most common short sound use cases.
The Short Sounds feature adds additional markup to support the implementation of multiple use cases on the literacy journey (e.g., individual letter sounds and letter names).
Note: Short Sounds V1 is only compatible with the Verification endpoint.
Short Sounds input targets
Specific markup is required in the input targets to enable the correct analysis of short sounds.
To use the Short Sounds feature, the input target to the Verification API needs to be formatted using these specifically-designed markup tags:
<letter>
is used to verify letter names<sound-out>
is used to verify letter sounds<custom-word>
is used for verifying out-of-vocabulary and nonsense words
More information about the syntax and attributes of these and other tags can be found at Customizing Text Using Markup .
letter markup
If the pronunciation of the letter name needs to be scored, use the <letter>
tag. For example, the pronunciation of the letter W (i.e., /d/ /ah/ /b/ /ah/ /l/ /y/ /uw/) would be verified using the markup <letter> w </letter>
.
For more details on using the <letter>
tag, go to: Letter Name Examples
sound-out markup
The sounded-out version of a word can be verified with the <sound-out>
tag. For example, the sounded-out version of “dog” (i.e., /d/ /ao/ /g/) would be verified using the markup <sound-out>dog</sound-out>
.
An optional pronunciation
attribute can be used with the <sound-out>
markup tag to override the standard dictionary pronunciation.
For more details on using the <sound-out>
tag, go to: Letter Sound Examples
custom-word markup
The <custom-word>
markup tag is used for verifying out-of-vocabulary and nonsense words. The pronunciation
attribute is compulsory.
For example, the not-in-dictionary blended word "foojag" would be verified using this markup:
<custom-word pronunciation="f uw jh ae g"> foojag </custom-word>
.Custom pronunciation must be provided using the ARPABET notation.
For more details on using the <custom-word>
tag, go to Custom Words and Pronunciations Examples .
Using the pronunciation attribute
Optional
When using the <sound-out>
tag, the pronunciation
attribute is optional. If not used, the SoapBox Engine will provide a default pronunciation automatically.
Required
When using the <sound-out>
tag, the pronunciation
attribute is required if the word in not in the system.
When using the <custom-word>
tag, the pronunciation
attribute is always required. If no pronunciation
attribute is entered, an error is returned.
Learn more about the pronunciation
attribute here: Working With Short Sounds
Using multiple targets
The use of multiple concurrent targets is suggested when your task requires the SoapBox voice engine to differentiate between phonetically similar targets or between isolated and blended sounds (e.g., isolated /d/ vs. /d/ in dog).
If multiple targets in <sound-out>
have the same content but different pronunciations, the content might need to be differentiated too. For example, <sound-out pronunciation="aa">a</sound-out>
and <sound-out pronunciation="ah">a</sound-out>
are better handled by the system as <sound-out pronunciation="aa">aa</sound-out>
and <sound-out pronunciation="ah">ah</sound-out>
.
For more examples, see the Working With Short Sounds section.
Guidelines and use cases
Through the following links, some common use cases for the Short Sounds feature, along with the suggested syntax to illustrate how to create single or multiple targets and the corresponding JSON outputs, are outlined.
The reported JSON examples are templates only. Durations, start times, end times, and scores should not be taken as standards.
Letter names
A student is asked to repeat, read, or call out individual letter names (isolated or in a sequence).
Read more and see examples here.
Letter sounds
A student is asked to repeat, read, or call out individual letter sounds (isolated or in a sequence).
Read more and see examples here.
Phoneme isolation
A student is asked to isolate a single sound from within a word (e.g., “What is the starting sound in the word dog?”).
Read more and see examples here.
Phoneme blending
A student is asked to blend individual sounds into a real or nonsense word.
Read more and see examples here.
Phoneme segmentation
A student is asked to break a word (either real or nonsense) into its individual speech sounds.
Read more and see examples here.
Phoneme manipulation
A student is asked to modify, change, or move the individual sounds in a word, often to create a new word.
Read more and see examples here.
JSON outputs
The output format of Short Sounds V1 is the standard JSON Verification format augmented by the extra attribute token_type
at word-level.
The default token_type
value is word
. When a markup tag is used in the target, the token_type
value changes to the same value as the input markup tag (i.e., letter
, sound-out
, or custom-word
). Therefore, the quality_score
values might need to be regarded differently depending on the token_type
:
If
letter
, thequality_score
values inword_breakdown
andphone_breakdown
refer to the whole letter name and the phonemes in the letter name.If
sound-out
, the phonemes in thephone_breakdown
are scored as they are pronounced with a pause in between them. Thequality_score
inword_breakdown
is the combination of the isolated phoneme scores.If
custom-word
, thequality_score
values in theword_breakdown
andphone_breakdown
have to be intended as inword
. The phonemes in thephone_breakdown
are scored as they are pronounced as part of a word.
API command
Short Sounds V1 can be accessed via API calls that use the specific target design mentioned above.
Templates for such requests are:
For letter names
curl -s -k -H x-app-key:<app-key> -F 'file=@<path-to-audio-file>' -F 'category="<letter>g</letter>"' -F user_token=<user-token> -F model_id=<model-id> https://api.soapboxlabs.com/v1/speech/verification
For letter sounds
curl -s -k -H x-app-key:<app-key> -F 'file=@<path-to-audio-file>' -F 'category="<sound-out>dog</sound-out>"' -F user_token=<user-token> -F model_id=<model-id> https://api.soapboxlabs.com/v1/speech/verification
or
curl -s -k -H x-app-key:<app-key> -F 'file=@<path-to-audio-file>' -F 'category="<sound-out pronunciation="d">d</sound-out>"' -F user_token=<user-token> -F model_id=<model-id> https://api.soapboxlabs.com/v1/speech/verification
For custom words
curl -s -k -H x-app-key:<app-key> -F 'file=@<path-to-audio-file>' -F 'category="<custom-word pronunciation="f uw jh ae g">foojag</custom-word>"' -F user_token=<user-token> -F model_id=<model-id> https://api.soapboxlabs.com/v1/speech/verification
Please note that the double quotation mark "
character in the curl request must be the simple ASCII one. Moreover, it may need to be escaped in the pronunciation
attribute. For example, pronunciation="d"
may need to be coded as pronunciation=\"d\"
.