Custom Words and Pronunciations Examples

Overview

The <custom-word> tag is used for custom or made-up words or pronunciations. The pronunciation attribute is compulsory. This is so the SoapBox engine knows what the phoneme breakdown is, since the custom word pronunciations may not be part of our pre-trained recognition models.

Use cases for <custom-word> markup

Three common use cases for the <custom-word> tag are

  • Nonsense and out-of-vocabulary words

  • Proper nouns

  • Defining a non-default pronunciation for a word

Nonsense and out-of-vocabulary words

Example:  “Say the word voos”

The nonsense word voos is not in the engine, therefore, without the <custom-word> markup, this target returns an error.

Input targets:

  • <custom-word pronunciation = "v uw s">voos</custom-word>

JSON output: 

  • One quality_score

  • token_type:  “custom-word“

  • Example [audio was “/v uw s/“]:

"results": [
  {
    "hypothesis_score": 98,
    "duration": 2.04,
    "hypothesis_duration": 0.93,
    "category": "voos",
    "end": 1.5,
    "start": 0.57,
    "word_breakdown": [
      {
        "duration": 0.93,
        "quality_score": 98,
        "token_type": "custom-word",
        "end": 1.5,
        "start": 0.57,
        "phone_breakdown": [
          {
            "duration": 0.24,
            "quality_score": 97,
            "end": 0.81,
            "start": 0.57,
            "phone": "v"
          },
          {
            "duration": 0.12,
            "quality_score": 100,
            "end": 0.93,
            "start": 0.81,
            "phone": "uw"
          },
          {
            "duration": 0.57,
            "quality_score": 95,
            "end": 1.5,
            "start": 0.93,
            "phone": "s"
          }
        ],
        "word": "voos",
        "target_transcription": "v uw s"
      }
    ]
  }

Proper Nouns

Example:  “Can you say the main character’s name, Phonzy”

Often proper names are out-of-vocabulary words. The <custom-word> markup can be used to enter these targets.

Input targets:

  • <custom-word pronunciation = "f ao n z iy">Phonzy</custom-word>

JSON output:

  • One quality_score

  • token_type:  “custom-word“

  • Example [audio was “/f ao n z iy/“]:

"results": [
    {
      "hypothesis_score": 88,
      "duration": 2.28,
      "hypothesis_duration": 0.9,
      "category": "phonzy",
      "end": 1.47,
      "start": 0.57,
      "word_breakdown": [
        {
          "duration": 0.9,
          "quality_score": 88,
          "token_type": "custom-word",
          "end": 1.47,
          "start": 0.57,
          "phone_breakdown": [
            {
              "duration": 0.21,
              "quality_score": 98,
              "end": 0.78,
              "start": 0.57,
              "phone": "f"
            },
            {
              "duration": 0.12,
              "quality_score": 71,
              "end": 0.9,
              "start": 0.78,
              "phone": "ao"
            },
            {
              "duration": 0.06,
              "quality_score": 95,
              "end": 0.96,
              "start": 0.9,
              "phone": "n"
            },
            {
              "duration": 0.21,
              "quality_score": 66,
              "end": 1.17,
              "start": 0.96,
              "phone": "z"
            },
            {
              "duration": 0.3,
              "quality_score": 97,
              "end": 1.47,
              "start": 1.17,
              "phone": "iy"
            }
          ],
          "word": "phonzy",
          "target_transcription": "f ao n z iy"
        }
      ]
    }
  ]

Defining a non-default pronunciation for a word

The <custom-word> tag can be used to to distinguish or specify certain pronunciations of words. Since languages have different dialects, one use case for the <custom-word> markup is to ensure that a child is using the correct pronunciation for a specific dialect.

Example: “Say the word tomato”

For example, the word tomato can have 2 pronunciations: /t ah m aa t ow/ and /t ah m ey t ow/.

Input targets:

Please note that it might be useful to add a suffix to the content word in the markup tag (e.g., tomato can become tomatoaa and tomatoey) to differentiate the two pronunciations in the JSON output more easily. Underscores or other special characters can be used, but they will be deleted in the JSON output.

  • <custom-word pronunciation = "t ah m aa t ow">tomatoaa</custom-word>

  • <custom-word pronunciation = "t ah m ey t ow">tomatoey</custom-word>

JSON output: 

  • One quality_score value per target word

  • token_type:  “custom-word“

  • Example [audio was “/t ah m aa t ow/“]:

"results": [
    {
      "hypothesis_score": 87,
      "duration": 2.82,
      "hypothesis_duration": 1.14,
      "category": "tomatoaa",
      "end": 1.92,
      "start": 0.78,
      "word_breakdown": [
        {
          "duration": 1.14,
          "quality_score": 87,
          "token_type": "custom-word",
          "end": 1.92,
          "start": 0.78,
          "phone_breakdown": [
            {
              "duration": 0.12,
              "quality_score": 92,
              "end": 0.9,
              "start": 0.78,
              "phone": "t"
            },
            {
              "duration": 0.03,
              "quality_score": 87,
              "end": 0.93,
              "start": 0.9,
              "phone": "ah"
            },
            {
              "duration": 0.06,
              "quality_score": 100,
              "end": 0.99,
              "start": 0.93,
              "phone": "m"
            },
            {
              "duration": 0.24,
              "quality_score": 61,
              "end": 1.23,
              "start": 0.99,
              "phone": "aa"
            },
            {
              "duration": 0.12,
              "quality_score": 95,
              "end": 1.35,
              "start": 1.23,
              "phone": "t"
            },
            {
              "duration": 0.57,
              "quality_score": 63,
              "end": 1.92,
              "start": 1.35,
              "phone": "ow"
            }
          ],
          "word": "tomatoaa",
          "target_transcription": "t ah m aa t ow"
        }
      ]
    },
    {
      "hypothesis_score": 50,
      "duration": 2.82,
      "hypothesis_duration": 1.14,
      "category": "tomatoey",
      "end": 1.92,
      "start": 0.78,
      "word_breakdown": [
        {
          "duration": 1.14,
          "quality_score": 50,
          "token_type": "custom-word",
          "end": 1.92,
          "start": 0.78,
          "phone_breakdown": [
            {
              "duration": 0.12,
              "quality_score": 92,
              "end": 0.9,
              "start": 0.78,
              "phone": "t"
            },
            {
              "duration": 0.03,
              "quality_score": 87,
              "end": 0.93,
              "start": 0.9,
              "phone": "ah"
            },
            {
              "duration": 0.06,
              "quality_score": 100,
              "end": 0.99,
              "start": 0.93,
              "phone": "m"
            },
            {
              "duration": 0.24,
              "quality_score": 5,
              "end": 1.23,
              "start": 0.99,
              "phone": "ey"
            },
            {
              "duration": 0.12,
              "quality_score": 95,
              "end": 1.35,
              "start": 1.23,
              "phone": "t"
            },
            {
              "duration": 0.57,
              "quality_score": 63,
              "end": 1.92,
              "start": 1.35,
              "phone": "ow"
            }
          ],
          "word": "tomatoey",
          "target_transcription": "t ah m ey t ow"
        }
      ]
    }
  ]