Verification (Online)

This page demonstrates how to run Verification on a short audio file using synchronous speech verification.

SSynchronous Speech Verification returns the analyzed targets for short audio. The maximum audio length we support is 60 seconds; however, we recommend supplying shorter files.

The Cloud Verification v1 is officially released and is generally available from the https://api.soapboxlabs.com/v1/speech/verification endpoint.

Getting Started

Permitted Audio

Make sure to review the documentation around Audio Encoding and Best Practices.

RESTful Web Service

Review the SoapBox Cloud Web Service documentation to familiarise yourself with the concepts involved.

Verification Concepts

Familiarise yourself with the concepts of Speech Verification.

Authentication

Review the Web Service Authentication documentation.

Sending a Cloud Speech Verification Request

Once authenticated, Speech Verification requests should be sent via HTTPS to:

https://api.soapboxlabs.com/v1/speech/verification

Requests should be sent to this endpoint via an HTTPS POST request. The following parameters should be specified for a successful response:

file

The Audio File to be analysed

category

The target utterance/prompt to check for within the audio file.
We try to align the speech contained in the audio file to this text and return a score.

* Multiple “category” parameters may be specified to search for multiple targets/prompts

user_token

A unique id that represents the speaker in the audio file. This should be a non-human readable alphanumeric identifier (such as a UUID) that has meaning to you but not SoapBox Labs. This token can be used to request deletion of a specific users data in line with our Data Privacy commitments.

CURL Example (single target)

curl -X POST \
    -H "x-app-key:$APP_KEY" \
    -F "file=@$AUDIO_FILE" \
    -F "category=right" \
    -F "user_token=$USER_TOKEN" \
    https://api.soapboxlabs.com/v1/speech/verification

CURL Example (multiple targets)

curl -X POST \
    -H "x-app-key:$APP_KEY" \
    -F "file=@$AUDIO_FILE" \
    -F "category=right" \
    -F "category=left" \
    -F "category=up" \
    -F "category=down" \
    -F "category=this is a test" \
    -F "user_token=$USER_TOKEN" \
    https://api.soapboxlabs.com/v1/speech/verification

Cloud Speech Verification Responses

Once Verification has processed the supplied audio file and targets, a result object using JSON format will be returned. At the root of the result object are the following fields:

user_id	The user_id specified in the request.
language_code	The language contained in the audio file being analysed.
result_id	A unique identifier for the request
time	The UTC time the request was processed at.
results	The results of the targets/categories specified [Array of objects for each target]. See results below for further details.

The following is an example of the JSON structure you can expect from Verification.

{
  "user_id": "abc123",
  "results": [{
    "hypothesis_score": 88.0,
    "category": "i like stripes",
    "end": 4.62,
    "start": 1.17,
    "word_breakdown": [{
      "quality_score": 87.0,
      "end": 1.8,
      "start": 1.17,
      "phone_breakdown": [{
        "phone": "ay",
        "quality_score": 83.0,
        "start": 1.17,
        "end": 1.8
      }],
      "word": "i",
      "target_transcription": "ay"
    }, {
      "quality_score": 83.0,
      "end": 2.79,
      "start": 2.04,
      "phone_breakdown": [{
        "phone": "l",
        "quality_score": 92.0,
        "start": 2.04,
        "end": 2.25
      }, {
        "phone": "ay",
        "quality_score": 70.0,
        "start": 2.25,
        "end": 2.46
      }, {
        "phone": "k",
        "quality_score": 74.0,
        "start": 2.46,
        "end": 2.79
      }],
      "word": "like",
      "target_transcription": "l ay k"
    }, {
      "quality_score": 93.0,
      "end": 4.62,
      "start": 3.0,
      "phone_breakdown": [{
        "phone": "s",
        "quality_score": 92.0,
        "start": 3.0,
        "end": 3.21
      }, {
        "phone": "t",
        "quality_score": 96.0,
        "start": 3.21,
        "end": 3.3
      }, {
        "phone": "r",
        "quality_score": 98.0,
        "start": 3.3,
        "end": 3.39
      }, {
        "phone": "ay",
        "quality_score": 96.0,
        "start": 3.39,
        "end": 3.51
      }, {
        "phone": "p",
        "quality_score": 99.0,
        "start": 3.51,
        "end": 3.72
      }, {
        "phone": "s",
        "quality_score": 67.0,
        "start": 3.72,
        "end": 4.62
      }],
      "word": "stripes",
      "target_transcription": "s t r ay p s"
    }]
  }],
  "language_code": "en-GB",
  "result_id": "abc123-282_1638878192718",
  "time": "2021-12-07T11:56:33.108Z"
}

JSON Breakdown

The following are snippets from the full JSON response above with some additional information for each key.

results

Results contains an array with an entry for each of the targets specified in the request

"results": [{
    "hypothesis_score": 88.0,
    "category": "i like stripes",
    "end": 4.62,
    "start": 1.17,
    "word_breakdown": []
  }]

category	The target/category(s) specified in the request.
hypothesis_score	The overall score for that category.
start / end	The start/end times the target was detected in the audio file.
word_breakdown	A further breakdown of each of the words contained in the target(s).

results \ word_breakdown

"word_breakdown": [{
      "quality_score": 83.0,
      "end": 2.79,
      "start": 2.04,
      "phone_breakdown": [],
      "word": "like",
      "target_transcription": "l ay k"
    }]

word	Each word in the target.
target_transcription	The phonetic pronunciation of the word.
quality_score	The score associated with this word.
start / end	The start/end times the word was detected in the audio file.
phone_breakdown	A further phonetic breakdown of the word. Each phone in the target_transcription is also scored.

results \ word_breakdown \ phonetic_breakdown

"phone_breakdown": [{
        "phone": "l",
        "quality_score": 92.0,
        "start": 2.04,
        "end": 2.25
      }, {
        "phone": "ay",
        "quality_score": 70.0,
        "start": 2.25,
        "end": 2.46
      }, {
        "phone": "k",
        "quality_score": 74.0,
        "start": 2.46,
        "end": 2.79
      }]

phone	The constituent phone of the word being analysed.
quality_score	How well the phone was pronounced or articulated
start / end	The start/end times the phone was detected in the audio file.

Arpabet & IPA

The phonetic breakdown currently uses a subset of the Arpabet. A conversion table to IPA is available here.

Sample Code

Sample code for calling the Verification Web Service Synchronously.

Python

"""
This sample Python3.6 program makes 3 simple HTTP requests to the Soapbox API,
and prints the response content

Requirments:
    - The widely-used 'requests' package is required, which can be installed with pip:
        $ pip install requests
    - Sample audio and text input files ("i_like_stripes.wav" & "i_like_stripes.txt")


Usage:
    $ python3 sample_api_request.py

    ---Soapbox Verification service---
    Status Code : 200
    Response Body : {
      "results": [{
        "hypothesis_score": 87.0,
        "duration": 1536,
        ...
    }
    ---------------------------------
"""

import requests
import sys

SOAPBOX_VERIFICATION_URI = "https://api.soapboxlabs.com/v1/speech/verification"
SOAPBOX_APP_KEY = "API_KEY_HERE"
SOAPBOX_USER_TOKEN = "ADD_USER_TOKEN_HERE"

def verification_request() -> requests.models.Response:
    """
    POST a voice recording to the Soapbox API Verification service, and return the response object.
    """
    file_data = open(f'{sys.path[0]}/AUDIO_FILE.wav', 'rb')
    files_obj = {
        'file': file_data
    }
    form_data = {
        "user_token": SOAPBOX_USER_TOKEN,
        "category": "TARGET_TEXT"
    }
    headers = {
        "X-App-Key": SOAPBOX_APP_KEY
    }
    response = requests.post(
        url=SOAPBOX_VERIFICATION_URI,
        headers=headers,
        data=form_data,
        files=files_obj
    )
    return response

def main():
    verification_result = verification_request()
    print("\n\n---Soapbox Verification service---")
    print(f"Status Code : {verification_result.status_code}")
    print(f"Response Body : {verification_result.content.decode('utf8')}")
    print("---------------------------------")

if __name__ == '__main__':
    main()