Fluency - Last Word Feature

Overview

This feature is designed to enable customers to define the end point for fluency data calculations in an audio file.

The Different Last Word Types

Choosing the point to stop calculating can have a significant impact on accuracy ratings for the reader.

This feature allows you to choose between 3 options which will change the returned data counts.

Type	Description
Whole	This will return data for entire file (default setting)
Read	This will return data up to last word read correctly
Heard	This will return data up to last word said/heard by system

The user can choose between the different types by adding the form-data last_word_type and setting that to either whole which is the default if a user does not specify a type, read or heard.

Depending on the type selected; the number of deletions returned in the JSON will change.

Example

If a child reads 50 of 100 words in 60 seconds. All those 50 words are read correctly, but the passage reading is incomplete due to reaching the 1 minute cut off. Therefore there are potentially 50 deletions to be counted.

With whole the system count all errors in the entire file (regardless of whether the reader stopped) and will return 50 deletions, which leads to a 50% accuracy.
With read the system will only return metrics until the last word detected as correct. This means the child correctly reads 50 of 100 words and stops, what's returned is 50 correct and 0 deletions leading to 100% accuracy.
With heard the system will only return metrics until the last spoken word transcribed in the audio file, either correct, substitution or insertion. This means that if the child correctly reads 50 of 100 words, and then speaks an extra 5 words not in the reference text, the system will take those into account so 50 correct out of 55 spoken in total, this leads to a 90.9% accuracy.

Last Word JSON

The Last Word JSON below is added to the results JSON returned by our API.

"last_word": {
	"end_timestamp": 5.135,
	"text_score_index": 2,
	"type": "WHOLE"
}

Key

Description

type

This is the user submitted type. Depending on the type selected the counts returned in the root JSON will differ.

text_score_index

This is the index of the text_score array in the root JSON.
If the type is whole it will point to the last item.
If the type is read it will point to the index of the last correct word.
If the type is heard it will point to the index of the last spoken word.

end_timestamp

This is the timestamp in the audio file where the last detected word occurred.

If it's whole then the timestamp is the total audio duration of the whole file.
If the type is read, it is the end time of the last detected correct word.
If it is heard, it could be the last correct, insertion or substitution word.

Making a request

The user can choose between the different types by adding the form-data last_word_type and setting that to either whole which is the default if a user does not specify a type, read or heard.

curl -X POST \
      -H "x-app-key:YOUR_API_KEY_HERE" \
      -F "file=@AudioFile.wav" \
      -F "user_token=abc123" \
      -F "reference_text=@ReferenceText.txt" \
      -F "model_id=XXXX" \
      -F "last_word_type=read" \
https://api.soapboxlabs.com/v1/async/request/speech/fluency

Calculations

As Accuracy is something that is defined differently by clients across different products and use cases, the Fluency endpoint does not provide “accuracy” as part of the response.

Similarly, the WCPM score is something the is dependent of your definition of a “reading error” and this can be further augmented depending on the last word type chosen.

The Fluency endpoint provides the data to for clients to make these calculations on a case by case basis. Key metrics like Accuracy, Read Percentage and WCPM can all be calculated from the JSON output.

More detailed information can be found here:
Fluency - Example Calculations