Fluency - Last Word Feature
Overview
This feature is designed to enable customers to define the end point for fluency data calculations in an audio file.
The Different Last Word Types
Choosing the point to stop calculating can have a significant impact on accuracy ratings for the reader.
This feature allows you to choose between 3 options which will change the returned data counts.
Type | Description |
---|---|
Whole | This will return data for entire file (default setting) |
Read | This will return data up to last word read correctly |
Heard | This will return data up to last word said/heard by system |
The user can choose between the different types by adding the form-data last_word_type
and setting that to either whole
which is the default if a user does not specify a type, read
or heard
.
Depending on the type selected; the number of deletions returned in the JSON will change.
Example
If a child reads 50 of 100 words in 60 seconds. All those 50 words are read correctly, but the passage reading is incomplete due to reaching the 1 minute cut off. Therefore there are potentially 50 deletions to be counted.
With
whole
the system count all errors in the entire file (regardless of whether the reader stopped) and will return 50 deletions, which leads to a 50% accuracy.With
read
the system will only return metrics until the last word detected ascorrect
. This means the child correctly reads 50 of 100 words and stops, what's returned is 50 correct and 0 deletions leading to 100% accuracy.With
heard
the system will only return metrics until the last spoken word transcribed in the audio file, either correct, substitution or insertion. This means that if the child correctly reads 50 of 100 words, and then speaks an extra 5 words not in the reference text, the system will take those into account so 50 correct out of 55 spoken in total, this leads to a 90.9% accuracy.
Last Word JSON
The Last Word JSON below is added to the results JSON returned by our API.
"last_word": { "end_timestamp": 5.135, "text_score_index": 2, "type": "WHOLE" }
Key | Description |
---|---|
type | This is the user submitted type. Depending on the type selected the counts returned in the root JSON will differ. |
text_score_index | This is the index of the text_score array in the root JSON. |
end_timestamp | This is the timestamp in the audio file where the last detected word occurred.
|
Making a request
The user can choose between the different types by adding the form-data last_word_type
and setting that to either whole
which is the default if a user does not specify a type, read
or heard
.
curl -X POST \ -H "x-app-key:YOUR_API_KEY_HERE" \ -F "file=@AudioFile.wav" \ -F "user_token=abc123" \ -F "reference_text=@ReferenceText.txt" \ -F "model_id=XXXX" \ -F "last_word_type=read" \ https://api.soapboxlabs.com/v1/async/request/speech/fluency
Calculations
As Accuracy is something that is defined differently by clients across different products and use cases, the Fluency endpoint does not provide “accuracy” as part of the response.
Similarly, the WCPM score is something the is dependent of your definition of a “reading error” and this can be further augmented depending on the last word type chosen.
The Fluency endpoint provides the data to for clients to make these calculations on a case by case basis. Key metrics like Accuracy, Read Percentage and WCPM can all be calculated from the JSON output.
More detailed information can be found here:
Fluency - Example Calculations