Fluency - Example Calculations
Overview
Below is a set of examples to show how a customer can use our service to calculate metrics like the WCPM or Accuracy.
What we provide
The follow information is provided in the results JSON object. Using this information we can enable customers to make WCPM and other calculations with ease.
num_differences | The number of differences between the reference and transcription text. |
---|---|
substitution_count | The number of times a word has been substituted for another. |
insertion_count | The number of times a word has been inserted. |
correct_count | The number of times the child correctly said a word from the reference text. |
deletion_count | The number of times a word from the reference text was not said. |
word_count | The number of words in the reference_text. |
repetition_count | The number of times the child repeated a word. |
As accuracy is something that is defined differently by clients across different products and use cases, the Fluency endpoint does not provide “accuracy” as part of the response.
Similarly, the WCPM score is something the is dependent of your definition of a “reading error” and this can be further augmented depending on the last word type chosen.
Read more: Fluency - Last Word Feature
The following calculations can be used to assess how well a child reads a particular passage. Using the different types, you can more accurately get how a child preformed and avoid situations where there is a lot of deletions at the end of a audio file and the child is punished for it.
Sample Equations
The following equations are using sample figures from the table below. These figures are example values that are typically returned by the Fluency endpoint .
correct_count | insertion_count | substitution_count | deletion_count |
---|---|---|---|
42 | 10 | 1 | 8 |
word_count | repetition_count | end_timestamp | |
78 | 4 | 36.14 |
Note:
The Formula use a mixture of data returned by the engine. The variables in the Name column are not returned by the endpoint. As mistakes and accuracy are defined by the customer, the below is a guide on how they can be calculated depending on your use case.
Name | Formula | READ | Result |
---|---|---|---|
mistakes | deletion_count + substitution_count | 8 + 1 | 9 |
words_said | correct_count + insertion_count + substitution_count | 42 + 10 + 1 | 53 |
passage_read | correct_count + deletion_count + substitution_count | 42 + 8 + 1 | 51 |
total_words | correct_count + deletion_count+ insertion_count + substitution_count | 42 + 8 + 10 + 1 | 61 |
reading_time | end_timestamp % 60 seconds | 36.14 / 60 | 0.6 |
WCPM | correct_count % reading_time | 42 / 0.6 | 70 |
Accuracy #1 | correct_count % passage_read | 42 % 51 | 82.35% |
Accuracy #2 | correct_count % total_words | 42 % 61 | 68.85% |
Read percentage #1 | passage_read % word_count | 51 % 78 | 65.38% |
Read percentage #2 | words_said % word_count | 53 % 78 | 67.95% |
Calculating Accuracy
There are a number of ways to calculate Accuracy. This will be determined by your learning goals and activity type. Two sample options are listed below:
Option #1
correct_count % passage_read.
This ignores any insertion or repetition the child may has said while reading, this is only concerned how accurate the child said only the prompt text.
Option #2
correct_count % total_words.
This includes any insertions the child may has said while reading the prompt, which would lead to a more accurate score of how well the child read the prompt.
Calculating Read Percentage
Similar to Accuracy there are a number of different ways to calculate this. Two sample options are listed below
Option #1
passage_read % word_count.
This ignores any insertion or repetition the child may has said while reading, it also accounts for deletions so any words the child skipped, this gives an indication of how far the child got in the passage and is not an indication of how much of the passage the child actually read.
Option #2
words_said % word_count.
This does not include deletions but includes everything the child said, which would accurately tell how much of the actual passage the child said.
Calculating WCPM (Words Correct Per Minute)
There are a number of different methods for calculating WCPM below we have outlined one possible way. The response returned from the Fluency endpoint contains all the data points required to calculate a Words Correct Per Minute metric. For example:
Formulae | Example | Result | |
---|---|---|---|
reading_time | end_timestamp / 60 seconds | 36.14 / 60 | 0.6 |
WCPM | correct_count / reading_time | 42 / 0.6 | 70 |
With the calculations above, its best to use correct_count that's returned by the engine rather than calculating it using num_correct = word_count - mistakes
this is because the word_count does not change for the read
or heard
types and this will cause an incorrect correct count if calculated using this method.
Additional Features
The following features provide additional datapoints which can enhance and improve the results above.
Last Word Type
With the last word feature, we get some additional information which can be used to improve the results. This will mainly effect the WCPM as instead of using the audio duration for the timestamp, we can now depending on which last word type is used, use a timestamp of when the child actually stopped speaking.
Read more: Fluency - Last Word Feature
JSON
The Last Word JSON below is added to the root JSON returned by our API. With this we can get a more accurate timestamp of when the child either stopped reading the passage with READ
or stopped talking with HEARD
.
"last_word": { "end_timestamp": 5.135, "text_score_index": 2, "type": "WHOLE" }
This feature allows you to choose between 3 options which will change the returned data counts.
Type | Description |
---|---|
Whole | This will return data for entire file (default setting) |
Read | This will return data up to last spoken word read correctly |
Heard | This will return data up to last spoken word heard by system |
Example Results
The following tables show results directly from the engine, for the three different types.
In this example the child has stopped reading from the passage after 31.14 seconds, they then speak a few more words which are flagged as insertions before stopping, meaning the last 24 seconds of audio is silence.
You will notice the amount of deletions for whole
is 151, the majority of these occurring after the child has finished speaking, this leads to a poor Accuracy and WCPM.
Alternatively for read
where we stop counting results after the last correct word, the number of deletions drops to 8, this indicates that the child skipped some words while reading. Because we do not count all the deletions at the end as we did in whole
, the WCPM and Accuracy are now much higher and more accurate to what actually occurred.
In contrast to read
, the heard
option considers any insertions or substitutions detected after the last correct word, but similar to read
does not consider deletions after the child stops speaking. This reduces the Accuracy somewhat compared to read
as there were extra words spoken, but the Accuracy is still much improved in comparison to whole
.
Please consult this Word Breakdown Table to see how the information is stored in the Word objects.
TYPE | WHOLE | READ | HEARD |
---|---|---|---|
Difference | 155 | 12 | 12 |
correct_count | 42 | 42 | 42 |
insertion_count | 6 | 4 | 6 |
deletion_count | 151 | 8 | 8 |
substitution_count | 1 | 0 | 1 |
repetition_count | 4 | 4 | 4 |
word_count | 194 | 194 | 194 |
text_score_index | 196 | 53 | 56 |
end_timestamp | 60 | 31.14 | 36.14 |
WCPM | 42 | 80.25 | 80.25 |
Accuracy Option # 1 | 21.65 % | 84.00% | 82.35% |
Accuracy Option # 2 | 21.00% | 77.78% | 73.68% |
Read Percentage Option # 1 | 25.26% | 25.77% | 25.26% |
Read Percentage Option # 2 | 100.00% | 23.71% | 26.29% |
Self Corrections
With self corrections the customer will have to decide if the word was a self correction or not. This can be done by going through the results and checking each word that was marked as a self correction and comparing the time_since_previous
with the last word to ensure that it is within your time frame for what is a self correction.
Read more: Fluency - Self Corrections Feature
JSON
"sub_types": { "self_correction": { "reparandums": [1] } }
Pseudo code to calculate the number of self corrections.
for item in text_scores: if item['sub_type']['self_correction']: index_of_reparandum = item['sub_type']['self_correction'][0] time_since_previous = text_score[index_of_reparandum]['transcription_details']['time_since_previous'] if time_since_previous < SELF_CORRECTION_THRESHOLD: ### We have a self correction! self_correction_count += 1
Calculations
With this information, users can now use it to discard reparandums from mistakes if they wish to not punish students for attempts. We can now get a true_insertion_count
which will take away the number of self corrections or reparandums if you wish from the insertion_count.
true_insertion_count = insertion_count - self_correction_count
Example Results
With this change, anything which relies on insertion_count
will have its score effected and will lead to improved accuracy if the self corrections are omitted.
TYPE | WHOLE | With Self Corrections |
---|---|---|
Difference | 12 | 12 |
correct_count | 42 | 42 |
insertion_count | 10 | 10 |
deletion_count | 8 | 8 |
substitution_count | 1 | 1 |
repetition_count | 4 | 4 |
word_count | 78 | 78 |
text_score_index | 53 | 53 |
end_timestamp | 36.14 | 36.14 |
number of reparandums | 0 | 4 |
WCPM | 70 | 70 |
Accuracy Option # 1 | 82.35% | 82.35% |
Accuracy Option # 2 | 68.85% | 73.68% |
Read Percentage Option # 1 | 65.38% | 65.38% |
Read Percentage Option # 2 | 67.95% | 62.82% |