Short Sounds Feature

Overview

Short Sounds V1 is the first release of SoapBox Labs' Verification API feature optimized to tackle the most common short sound use cases.

The Short Sounds feature adds additional markup to support the implementation of multiple use cases on the literacy journey (e.g., individual letter sounds and letter names).

Note: Short Sounds V1 is only compatible with the Verification endpoint.

Short Sounds input targets

Specific markup is required in the input targets to enable the correct analysis of short sounds.

To use the Short Sounds feature, the input target to the Verification API needs to be formatted using these specifically-designed markup tags:

  • <letter> is used to verify letter names

  • <sound-out>is used to verify letter sounds

  • <custom-word> is used for verifying out-of-vocabulary and nonsense words

More information about the syntax and attributes of these and other tags can be found at Customizing Text Using Markup .

letter markup

If the pronunciation of the letter name needs to be scored, use the <letter> tag. For example, the pronunciation of the letter W (i.e., /d/ /ah/ /b/ /ah/ /l/ /y/ /uw/) would be verified using the markup <letter> w </letter>.

For more details on using the <letter> tag, go to: Letter Name Examples

sound-out markup

The sounded-out version of a word can be verified with the <sound-out> tag. For example, the sounded-out version of “dog” (i.e., /d/ /ao/ /g/) would be verified using the markup <sound-out>dog</sound-out>.

An optional pronunciation attribute can be used with the <sound-out> markup tag to override the standard dictionary pronunciation.

For more details on using the <sound-out> tag, go to: Letter Sound Examples

custom-word markup

The <custom-word> markup tag is used for verifying out-of-vocabulary and nonsense words. The pronunciation attribute is compulsory.

  • For example, the not-in-dictionary blended word "foojag" would be verified using this markup: <custom-word pronunciation="f uw jh ae g"> foojag </custom-word>.

  • Custom pronunciation must be provided using the ARPABET notation.

For more details on using the <custom-word> tag, go to Custom Words and Pronunciations Examples .

Using the pronunciation attribute

Optional
When using the <sound-out> tag, the pronunciation attribute is optional. If not used, the SoapBox Engine will provide a default pronunciation automatically.

Required
When using the <sound-out> tag, the pronunciation attribute is required if the word in not in the system.

When using the <custom-word>tag, the pronunciation attribute is always required. If no pronunciation attribute is entered, an error is returned.

Learn more about the pronunciation attribute here: Working With Short Sounds

Using multiple targets

The use of multiple concurrent targets is suggested when your task requires the SoapBox voice engine to differentiate between phonetically similar targets or between isolated and blended sounds (e.g., isolated /d/ vs. /d/ in dog).

If multiple targets in <sound-out> have the same content but different pronunciations, the content might need to be differentiated too. For example, <sound-out pronunciation="aa">a</sound-out> and <sound-out pronunciation="ah">a</sound-out> are better handled by the system as <sound-out pronunciation="aa">aa</sound-out> and <sound-out pronunciation="ah">ah</sound-out>.

For more examples, see the Working With Short Sounds section.

Guidelines and use cases

Through the following links, some common use cases for the Short Sounds feature, along with the suggested syntax to illustrate how to create single or multiple targets and the corresponding JSON outputs, are outlined.

The reported JSON examples are templates only. Durations, start times, end times, and scores should not be taken as standards.

Letter names

A student is asked to repeat, read, or call out individual letter names (isolated or in a sequence).
Read more and see examples here.

Letter sounds

A student is asked to repeat, read, or call out individual letter sounds (isolated or in a sequence).
Read more and see examples here.

Phoneme isolation

A student is asked to isolate a single sound from within a word (e.g., “What is the starting sound in the word dog?”).
Read more and see examples here.

Phoneme blending

A student is asked to blend individual sounds into a real or nonsense word.
Read more and see examples here.

Phoneme segmentation

A student is asked to break a word (either real or nonsense) into its individual speech sounds.
Read more and see examples here.

Phoneme manipulation

A student is asked to modify, change, or move the individual sounds in a word, often to create a new word.
Read more and see examples here.

JSON outputs

The output format of Short Sounds V1 is the standard JSON Verification format augmented by the extra attribute token_type at word-level.

The default token_type value is word. When a markup tag is used in the target, the token_type value changes to the same value as the input markup tag (i.e., letter, sound-out, or custom-word). Therefore, the quality_score values might need to be regarded differently depending on the token_type:

  • If letter, the quality_score values in word_breakdown and phone_breakdown refer to the whole letter name and the phonemes in the letter name.

  • If sound-out, the phonemes in the phone_breakdown are scored as they are pronounced with a pause in between them. The quality_score in word_breakdown is the combination of the isolated phoneme scores.

  • If custom-word, the quality_score values in the word_breakdown and phone_breakdown have to be intended as in word. The phonemes in the phone_breakdown are scored as they are pronounced as part of a word.

API command

Short Sounds V1 can be accessed via API calls that use the specific target design mentioned above.

Templates for such requests are:

For letter names

curl -s -k -H x-app-key:<app-key>
    -F 'file=@<path-to-audio-file>'  
    -F 'category="<letter>g</letter>"'
    -F user_token=<user-token>
    -F model_id=<model-id> 
    https://api.soapboxlabs.com/v1/speech/verification

For letter sounds

curl -s -k -H x-app-key:<app-key>
    -F 'file=@<path-to-audio-file>'  
    -F 'category="<sound-out>dog</sound-out>"'
    -F user_token=<user-token>
    -F model_id=<model-id> 
    https://api.soapboxlabs.com/v1/speech/verification

or

curl -s -k -H x-app-key:<app-key>
    -F 'file=@<path-to-audio-file>'  
    -F 'category="<sound-out pronunciation="d">d</sound-out>"'
    -F user_token=<user-token>
    -F model_id=<model-id> 
    https://api.soapboxlabs.com/v1/speech/verification

For custom words

curl -s -k -H x-app-key:<app-key>
    -F 'file=@<path-to-audio-file>'
    -F 'category="<custom-word pronunciation="f uw jh ae g">foojag</custom-word>"'
    -F user_token=<user-token>
    -F model_id=<model-id>
    https://api.soapboxlabs.com/v1/speech/verification

Please note that the double quotation mark " character in the curl request must be the simple ASCII one. Moreover, it may need to be escaped in the pronunciation attribute. For example, pronunciation="d" may need to be coded as pronunciation=\"d\".