Verification (Online)
This page demonstrates how to run Verification on a short audio file using synchronous speech verification.
SSynchronous Speech Verification returns the analyzed targets for short audio. The maximum audio length we support is 60 seconds; however, we recommend supplying shorter files.
The Cloud Verification v1 is officially released and is generally available from the https://api.soapboxlabs.com/v1/speech/verification
endpoint.
Getting Started
Permitted Audio
Make sure to review the documentation around Audio Encoding and Best Practices.
RESTful Web Service
Review the SoapBox Cloud Web Service documentation to familiarise yourself with the concepts involved.
Verification Concepts
Familiarise yourself with the concepts of Speech Verification.
Authentication
Review the Web Service Authentication documentation.
Sending a Cloud Speech Verification Request
Once authenticated, Speech Verification requests should be sent via HTTPS to:
https://api.soapboxlabs.com/v1/speech/verification
Requests should be sent to this endpoint via an HTTPS POST request. The following parameters should be specified for a successful response:
file | The Audio File to be analysed |
---|---|
category | The target utterance/prompt to check for within the audio file. * Multiple “category” parameters may be specified to search for multiple targets/prompts |
user_token | A unique id that represents the speaker in the audio file. This should be a non-human readable alphanumeric identifier (such as a UUID) that has meaning to you but not SoapBox Labs. This token can be used to request deletion of a specific users data in line with our Data Privacy commitments. |
CURL Example (single target)
curl -X POST \ -H "x-app-key:$APP_KEY" \ -F "file=@$AUDIO_FILE" \ -F "category=right" \ -F "user_token=$USER_TOKEN" \ https://api.soapboxlabs.com/v1/speech/verification
CURL Example (multiple targets)
curl -X POST \ -H "x-app-key:$APP_KEY" \ -F "file=@$AUDIO_FILE" \ -F "category=right" \ -F "category=left" \ -F "category=up" \ -F "category=down" \ -F "category=this is a test" \ -F "user_token=$USER_TOKEN" \ https://api.soapboxlabs.com/v1/speech/verification
Cloud Speech Verification Responses
Once Verification has processed the supplied audio file and targets, a result object using JSON format will be returned. At the root of the result object are the following fields:
user_id | The user_id specified in the request. |
---|---|
language_code | The language contained in the audio file being analysed. |
result_id | A unique identifier for the request |
time | The UTC time the request was processed at. |
results | The results of the targets/categories specified [Array of objects for each target]. See results below for further details. |
The following is an example of the JSON structure you can expect from Verification.
{ "user_id": "abc123", "results": [{ "hypothesis_score": 88.0, "category": "i like stripes", "end": 4.62, "start": 1.17, "word_breakdown": [{ "quality_score": 87.0, "end": 1.8, "start": 1.17, "phone_breakdown": [{ "phone": "ay", "quality_score": 83.0, "start": 1.17, "end": 1.8 }], "word": "i", "target_transcription": "ay" }, { "quality_score": 83.0, "end": 2.79, "start": 2.04, "phone_breakdown": [{ "phone": "l", "quality_score": 92.0, "start": 2.04, "end": 2.25 }, { "phone": "ay", "quality_score": 70.0, "start": 2.25, "end": 2.46 }, { "phone": "k", "quality_score": 74.0, "start": 2.46, "end": 2.79 }], "word": "like", "target_transcription": "l ay k" }, { "quality_score": 93.0, "end": 4.62, "start": 3.0, "phone_breakdown": [{ "phone": "s", "quality_score": 92.0, "start": 3.0, "end": 3.21 }, { "phone": "t", "quality_score": 96.0, "start": 3.21, "end": 3.3 }, { "phone": "r", "quality_score": 98.0, "start": 3.3, "end": 3.39 }, { "phone": "ay", "quality_score": 96.0, "start": 3.39, "end": 3.51 }, { "phone": "p", "quality_score": 99.0, "start": 3.51, "end": 3.72 }, { "phone": "s", "quality_score": 67.0, "start": 3.72, "end": 4.62 }], "word": "stripes", "target_transcription": "s t r ay p s" }] }], "language_code": "en-GB", "result_id": "abc123-282_1638878192718", "time": "2021-12-07T11:56:33.108Z" } |
JSON Breakdown
The following are snippets from the full JSON response above with some additional information for each key.
results
Results contains an array with an entry for each of the targets specified in the request
"results": [{ "hypothesis_score": 88.0, "category": "i like stripes", "end": 4.62, "start": 1.17, "word_breakdown": [] }] |
category | The target/category(s) specified in the request. |
---|---|
hypothesis_score | The overall score for that category. |
start / end | The start/end times the target was detected in the audio file. |
word_breakdown | A further breakdown of each of the words contained in the target(s). |
results \ word_breakdown
"word_breakdown": [{ "quality_score": 83.0, "end": 2.79, "start": 2.04, "phone_breakdown": [], "word": "like", "target_transcription": "l ay k" }] |
word | Each word in the target. |
---|---|
target_transcription | The phonetic pronunciation of the word. |
quality_score | The score associated with this word. |
start / end | The start/end times the word was detected in the audio file. |
phone_breakdown | A further phonetic breakdown of the word. Each phone in the target_transcription is also scored. |
results \ word_breakdown \ phonetic_breakdown
"phone_breakdown": [{ "phone": "l", "quality_score": 92.0, "start": 2.04, "end": 2.25 }, { "phone": "ay", "quality_score": 70.0, "start": 2.25, "end": 2.46 }, { "phone": "k", "quality_score": 74.0, "start": 2.46, "end": 2.79 }] |
phone | The constituent phone of the word being analysed. |
---|---|
quality_score | How well the phone was pronounced or articulated |
start / end | The start/end times the phone was detected in the audio file. |
Arpabet & IPA
The phonetic breakdown currently uses a subset of the Arpabet. A conversion table to IPA is available here.
Sample Code
Sample code for calling the Verification Web Service Synchronously.