Letter Name Examples

Overview

The <letter> markup is used to create letter name targets. The pronunciation attribute can also be used when a specific letter pronunciation is required.

Individual letter names

This is for a use case where a student repeats, reads, or calls out individual letter names (i.e., the alphabet letter names).

Example: “Say the name of the letter G”

Input target:

 <letter>g</letter>

JSON output: 

  • One quality_score value for the letter “g”, 

  • token_type:  “letter”

  • Example [audio was g]:

 "results": [{
    "hypothesis_score": 93.0,
    "duration": 2.04,
    "hypothesis_duration": 0.75,
    "category": "g",
    "end": 1.44,
    "start": 0.69,
    "word_breakdown": [{
      "duration": 0.75,
      "quality_score": 93.0,
      "token_type": "letter",
      "end": 1.44,
      "start": 0.69,
      "phone_breakdown": [{
        "duration": 0.18,
        "quality_score": 88.0,
        "end": 0.87,
        "start": 0.69,
        "phone": "jh"
      }, {
        "duration": 0.57,
        "quality_score": 92.0,
        "end": 1.44,
        "start": 0.87,
        "phone": "iy"
      }],
      "word": "g",
      "target_transcription": "jh iy"
    }]
  }]

Working with multiple pronunciations

Example: “Say the name of the letter A”

Multiple pronunciations are available for some letters (e.g., “a” and “z”). To ensure the SoapBox voice engine verifies the correct pronunciation, the pronunciation attribute is used with the <letter> tag.

Input target:

<letter pronunciation="ey"> a </letter>
or

<letter pronunciation="ah"> a </letter>

JSON output:

  • One quality_score value for “a”,  (with /ey/ pronunciation)

  • token_type:  “letter”

  • Example [audio was /ey/]:

"results": [{
      "hypothesis_score": 94,
      "duration": 2.46,
      "hypothesis_duration": 0.69,
      "category": "a",
      "end": 1.83,
      "start": 1.14,
      "word_breakdown": [
        {
          "duration": 0.69,
          "quality_score": 94,
          "token_type": "letter",
          "end": 1.83,
          "start": 1.14,
          "phone_breakdown": [
            {
              "duration": 0.69,
              "quality_score": 91,
              "end": 1.83,
              "start": 1.14,
              "phone": "ey"
            }
          ],
          "word": "a",
          "target_transcription": "ey"
        }]
    }]

Working with multiple letters as targets

There are two ways to input multiple letter names as targets:

  • When the order of the response does NOT matter

  • When the order of the response DOES matter

When the order of the response does NOT matter

This is when the order of the response from the student is not important and saying the letter names in any order is a valid response.

Example: “Say the name of the letters C E K H R”

Input target: Use multiple targets

<letter>c</letter> 
<letter>k</letter> 
<letter>e</letter> 
<letter>h</letter> 
<letter>r</letter>

JSON output:

  • Several quality_score values, one per each letter

  • token_type:  “letter”

  • Example [audio was c k e h r]:

"results": [
    {
      "hypothesis_score": 95,
      "duration": 13.02,
      "hypothesis_duration": 0.78,
      "category": "c",
      "end": 1.32,
      "start": 0.54,
      "word_breakdown": [
        {
          "duration": 0.78,
          "quality_score": 95,
          "token_type": "letter",
          "end": 1.32,
          "start": 0.54,
          "phone_breakdown": [
            {
              "duration": 0.24,
              "quality_score": 89,
              "end": 0.78,
              "start": 0.54,
              "phone": "s"
            },
            {
              "duration": 0.54,
              "quality_score": 95,
              "end": 1.32,
              "start": 0.78,
              "phone": "iy"
            }
          ],
          "word": "c",
          "target_transcription": "s iy"
        }
      ]
    },
    {
      "hypothesis_score": 95,
      "duration": 13.02,
      "hypothesis_duration": 0.69,
      "category": "k",
      "end": 3.63,
      "start": 2.94,
      "word_breakdown": [
        {
          "duration": 0.69,
          "quality_score": 95,
          "token_type": "letter",
          "end": 3.63,
          "start": 2.94,
          "phone_breakdown": [
            {
              "duration": 0.15,
              "quality_score": 99,
              "end": 3.09,
              "start": 2.94,
              "phone": "k"
            },
            {
              "duration": 0.54,
              "quality_score": 86,
              "end": 3.63,
              "start": 3.09,
              "phone": "ey"
            }
          ],
          "word": "k",
          "target_transcription": "k ey"
        }
      ]
    },
    {
      "hypothesis_score": 97,
      "duration": 13.02,
      "hypothesis_duration": 0.54,
      "category": "e",
      "end": 1.32,
      "start": 0.78,
      "word_breakdown": [
        {
          "duration": 0.54,
          "quality_score": 97,
          "token_type": "letter",
          "end": 1.32,
          "start": 0.78,
          "phone_breakdown": [
            {
              "duration": 0.54,
              "quality_score": 95,
              "end": 1.32,
              "start": 0.78,
              "phone": "iy"
            }
          ],
          "word": "e",
          "target_transcription": "iy"
        }
      ]
    },
    {
      "hypothesis_score": 91,
      "duration": 13.02,
      "hypothesis_duration": 0.75,
      "category": "h",
      "end": 9.69,
      "start": 8.94,
      "word_breakdown": [
        {
          "duration": 0.75,
          "quality_score": 91,
          "token_type": "letter",
          "end": 9.69,
          "start": 8.94,
          "phone_breakdown": [
            {
              "duration": 0.33,
              "quality_score": 88,
              "end": 9.27,
              "start": 8.94,
              "phone": "ey"
            },
            {
              "duration": 0.42,
              "quality_score": 87,
              "end": 9.69,
              "start": 9.27,
              "phone": "ch"
            }
          ],
          "word": "h",
          "target_transcription": "ey ch"
        }
      ]
    },
    {
      "hypothesis_score": 57,
      "duration": 13.02,
      "hypothesis_duration": 0.6,
      "category": "r",
      "end": 12.09,
      "start": 11.49,
      "word_breakdown": [
        {
          "duration": 0.6,
          "quality_score": 57,
          "token_type": "letter",
          "end": 12.09,
          "start": 11.49,
          "phone_breakdown": [
            {
              "duration": 0.45,
              "quality_score": 57,
              "end": 11.94,
              "start": 11.49,
              "phone": "aa"
            },
            {
              "duration": 0.15,
              "quality_score": 36,
              "end": 12.09,
              "start": 11.94,
              "phone": "r"
            }
          ],
          "word": "r",
          "target_transcription": "aa r"
        }]
    }]

When the order of the response DOES matter

This is when the order of the response from the student is important.

Letters are expected to be produced in the order given in the target. If the student says them in a different order, misplaced letters are marked as deleted.

Example: “Say the name of the letters C E K H R”

Input target: use a single target:

<letter>c</letter> <letter>e</letter> <letter>K</letter> <letter>h</letter> <letter>r</letter>

JSON output:

  • Several quality_score values, one per each letter

  • token_type:  “letter”

  • Example [audio was c k e h r] (different order, e is marked as deletion):

"results": [
    {
      "hypothesis_score": 68,
      "duration": 13.02,
      "hypothesis_duration": 11.55,
      "category": "c e k h r",
      "end": 12.09,
      "start": 0.54,
      "word_breakdown": [
        {
          "duration": 0.78,
          "quality_score": 95,
          "token_type": "letter",
          "end": 1.32,
          "start": 0.54,
          "phone_breakdown": [
            {
              "duration": 0.24,
              "quality_score": 89,
              "end": 0.78,
              "start": 0.54,
              "phone": "s"
            },
            {
              "duration": 0.54,
              "quality_score": 95,
              "end": 1.32,
              "start": 0.78,
              "phone": "iy"
            }
          ],
          "word": "c",
          "target_transcription": "s iy"
        },
        {
          "duration": 0,
          "quality_score": 0,
          "token_type": "letter",
          "end": -1,
          "start": -1,
          "phone_breakdown": [
            {
              "duration": 0,
              "quality_score": 0,
              "end": 0,
              "start": 0,
              "phone": "iy"
            }
          ],
          "word": "e",
          "target_transcription": "iy"
        },
        {
          "duration": 0.69,
          "quality_score": 95,
          "token_type": "letter",
          "end": 3.63,
          "start": 2.94,
          "phone_breakdown": [
            {
              "duration": 0.15,
              "quality_score": 99,
              "end": 3.09,
              "start": 2.94,
              "phone": "k"
            },
            {
              "duration": 0.54,
              "quality_score": 86,
              "end": 3.63,
              "start": 3.09,
              "phone": "ey"
            }
          ],
          "word": "k",
          "target_transcription": "k ey"
        },
        {
          "duration": 0.75,
          "quality_score": 91,
          "token_type": "letter",
          "end": 9.69,
          "start": 8.94,
          "phone_breakdown": [
            {
              "duration": 0.33,
              "quality_score": 88,
              "end": 9.27,
              "start": 8.94,
              "phone": "ey"
            },
            {
              "duration": 0.42,
              "quality_score": 87,
              "end": 9.69,
              "start": 9.27,
              "phone": "ch"
            }
          ],
          "word": "h",
          "target_transcription": "ey ch"
        },
        {
          "duration": 0.6,
          "quality_score": 57,
          "token_type": "letter",
          "end": 12.09,
          "start": 11.49,
          "phone_breakdown": [
            {
              "duration": 0.45,
              "quality_score": 57,
              "end": 11.94,
              "start": 11.49,
              "phone": "aa"
            },
            {
              "duration": 0.15,
              "quality_score": 36,
              "end": 12.09,
              "start": 11.94,
              "phone": "r"
            }
          ],
          "word": "r",
          "target_transcription": "aa r"
        }
      ]
    }
  ]

Working with repeated targets (e.g., in a DIBELS-type test)

In a test (e.g., a DIBELS-type test) a student may be presented with multiple letters where order is important and letters often appear in a sequence more than once. To achieve the desired results, single targets have to be used and all expected repetitions have to be in the target (i.e., the double “o” in the example below).

Example: “Tell me the name of each of these letters: E L h g x t m S O o”

Input target: use a single target

<letter>E</letter> <letter>L</letter> <letter>h</letter> <letter>g</letter> <letter>x</letter> <letter>t</letter> <letter>m</letter> <letter>S</letter> <letter>O</letter> <letter>o</letter>

JSON output:

One word-level quality_score per letter (capitalization is normalized to lower case)

token_type:  “letter”

Example [audio was e l h g x t m s o o]:

"results": [
    {
      "hypothesis_score": 87,
      "duration": 12.3,
      "hypothesis_duration": 11.58,
      "category": "e l h g x t m s o o",
      "end": 11.94,
      "start": 0.36,
      "word_breakdown": [
        {
          "duration": 0.66,
          "quality_score": 94,
          "token_type": "letter",
          "end": 1.02,
          "start": 0.36,
          "phone_breakdown": [
            {
              "duration": 0.66,
              "quality_score": 90,
              "end": 1.02,
              "start": 0.36,
              "phone": "iy"
            }
          ],
          "word": "e",
          "target_transcription": "iy"
        },
        {
          "duration": 0.69,
          "quality_score": 66,
          "token_type": "letter",
          "end": 2.07,
          "start": 1.38,
          "phone_breakdown": [
            {
              "duration": 0.42,
              "quality_score": 49,
              "end": 1.8,
              "start": 1.38,
              "phone": "eh"
            },
            {
              "duration": 0.27,
              "quality_score": 62,
              "end": 2.07,
              "start": 1.8,
              "phone": "l"
            }
          ],
          "word": "l",
          "target_transcription": "eh l"
        },
        {
          "duration": 0.72,
          "quality_score": 95,
          "token_type": "letter",
          "end": 3.03,
          "start": 2.31,
          "phone_breakdown": [
            {
              "duration": 0.36,
              "quality_score": 92,
              "end": 2.67,
              "start": 2.31,
              "phone": "ey"
            },
            {
              "duration": 0.36,
              "quality_score": 93,
              "end": 3.03,
              "start": 2.67,
              "phone": "ch"
            }
          ],
          "word": "h",
          "target_transcription": "ey ch"
        },
        {
          "duration": 0.66,
          "quality_score": 96,
          "token_type": "letter",
          "end": 4.08,
          "start": 3.42,
          "phone_breakdown": [
            {
              "duration": 0.21,
              "quality_score": 94,
              "end": 3.63,
              "start": 3.42,
              "phone": "jh"
            },
            {
              "duration": 0.45,
              "quality_score": 94,
              "end": 4.08,
              "start": 3.63,
              "phone": "iy"
            }
          ],
          "word": "g",
          "target_transcription": "jh iy"
        },
        {
          "duration": 0.75,
          "quality_score": 93,
          "token_type": "letter",
          "end": 5.37,
          "start": 4.62,
          "phone_breakdown": [
            {
              "duration": 0.27,
              "quality_score": 81,
              "end": 4.89,
              "start": 4.62,
              "phone": "eh"
            },
            {
              "duration": 0.18,
              "quality_score": 93,
              "end": 5.07,
              "start": 4.89,
              "phone": "k"
            },
            {
              "duration": 0.3,
              "quality_score": 96,
              "end": 5.37,
              "start": 5.07,
              "phone": "s"
            }
          ],
          "word": "x",
          "target_transcription": "eh k s"
        },
        {
          "duration": 0.66,
          "quality_score": 98,
          "token_type": "letter",
          "end": 6.45,
          "start": 5.79,
          "phone_breakdown": [
            {
              "duration": 0.15,
              "quality_score": 96,
              "end": 5.94,
              "start": 5.79,
              "phone": "t"
            },
            {
              "duration": 0.51,
              "quality_score": 97,
              "end": 6.45,
              "start": 5.94,
              "phone": "iy"
            }
          ],
          "word": "t",
          "target_transcription": "t iy"
        },
        {
          "duration": 0.63,
          "quality_score": 73,
          "token_type": "letter",
          "end": 7.62,
          "start": 6.99,
          "phone_breakdown": [
            {
              "duration": 0.33,
              "quality_score": 56,
              "end": 7.32,
              "start": 6.99,
              "phone": "eh"
            },
            {
              "duration": 0.3,
              "quality_score": 73,
              "end": 7.62,
              "start": 7.32,
              "phone": "m"
            }
          ],
          "word": "m",
          "target_transcription": "eh m"
        },
        {
          "duration": 0.66,
          "quality_score": 84,
          "token_type": "letter",
          "end": 8.91,
          "start": 8.25,
          "phone_breakdown": [
            {
              "duration": 0.3,
              "quality_score": 63,
              "end": 8.55,
              "start": 8.25,
              "phone": "eh"
            },
            {
              "duration": 0.36,
              "quality_score": 98,
              "end": 8.91,
              "start": 8.55,
              "phone": "s"
            }
          ],
          "word": "s",
          "target_transcription": "eh s"
        },        {
          "duration": 0.69,
          "quality_score": 88,
          "token_type": "letter",
          "end": 10.44,
          "start": 9.75,
          "phone_breakdown": [
            {
              "duration": 0.69,
              "quality_score": 83,
              "end": 10.44,
              "start": 9.75,
              "phone": "ow"
            }
          ],
          "word": "o",
          "target_transcription": "ow"
        },
        {
          "duration": 0.69,
          "quality_score": 88,
          "token_type": "letter",
          "end": 11.94,
          "start": 11.25,
          "phone_breakdown": [
            {
              "duration": 0.69,
              "quality_score": 82,
              "end": 11.94,
              "start": 11.25,
              "phone": "ow"
            }
          ],
          "word": "o",
          "target_transcription": "ow"
        }]
    }]

Differentiating between letter names and letter sounds

A common scenario may be for an educator to check whether a student is saying a letter name rather than a letter sound. In some cases, this is straightforward. In others, this is more complex. See this explanation here for more details.