Fluency - Example Calculations

Overview

Below is a set of examples to show how a customer can use our service to calculate metrics like the WCPM or Accuracy.

What we provide

The follow information is provided in the results JSON object. Using this information we can enable customers to make WCPM and other calculations with ease.

num_differences	The number of differences between the reference and transcription text.
substitution_count	The number of times a word has been substituted for another.
insertion_count	The number of times a word has been inserted.
correct_count	The number of times the child correctly said a word from the reference text.
deletion_count	The number of times a word from the reference text was not said.
word_count	The number of words in the reference_text.
repetition_count	The number of times the child repeated a word.

As accuracy is something that is defined differently by clients across different products and use cases, the Fluency endpoint does not provide “accuracy” as part of the response.

Similarly, the WCPM score is something the is dependent of your definition of a “reading error” and this can be further augmented depending on the last word type chosen.
Read more: Fluency - Last Word Feature

The following calculations can be used to assess how well a child reads a particular passage. Using the different types, you can more accurately get how a child preformed and avoid situations where there is a lot of deletions at the end of a audio file and the child is punished for it.

Sample Equations

The following equations are using sample figures from the table below. These figures are example values that are typically returned by the Fluency endpoint .

correct_count	insertion_count	substitution_count	deletion_count
42	10	1	8
word_count	repetition_count	end_timestamp
78	4	36.14

Note:

The Formula use a mixture of data returned by the engine. The variables in the Name column are not returned by the endpoint. As mistakes and accuracy are defined by the customer, the below is a guide on how they can be calculated depending on your use case.

Name	Formula	READ	Result
mistakes	deletion_count + substitution_count	8 + 1	9
words_said	correct_count + insertion_count + substitution_count	42 + 10 + 1	53
passage_read	correct_count + deletion_count + substitution_count	42 + 8 + 1	51
total_words	correct_count + deletion_count+ insertion_count + substitution_count	42 + 8 + 10 + 1	61
reading_time	end_timestamp % 60 seconds	36.14 / 60	0.6
WCPM	correct_count % reading_time	42 / 0.6	70
Accuracy #1	correct_count % passage_read	42 % 51	82.35%
Accuracy #2	correct_count % total_words	42 % 61	68.85%
Read percentage #1	passage_read % word_count	51 % 78	65.38%
Read percentage #2	words_said % word_count	53 % 78	67.95%

Calculating Accuracy

There are a number of ways to calculate Accuracy. This will be determined by your learning goals and activity type. Two sample options are listed below:

Option #1

correct_count % passage_read.

This ignores any insertion or repetition the child may has said while reading, this is only concerned how accurate the child said only the prompt text.

Option #2

correct_count % total_words.

This includes any insertions the child may has said while reading the prompt, which would lead to a more accurate score of how well the child read the prompt.

Calculating Read Percentage

Similar to Accuracy there are a number of different ways to calculate this. Two sample options are listed below

Option #1

passage_read % word_count.

This ignores any insertion or repetition the child may has said while reading, it also accounts for deletions so any words the child skipped, this gives an indication of how far the child got in the passage and is not an indication of how much of the passage the child actually read.

Option #2

words_said % word_count.

This does not include deletions but includes everything the child said, which would accurately tell how much of the actual passage the child said.

Calculating WCPM (Words Correct Per Minute)

There are a number of different methods for calculating WCPM below we have outlined one possible way. The response returned from the Fluency endpoint contains all the data points required to calculate a Words Correct Per Minute metric. For example:

	Formulae	Example	Result
reading_time	end_timestamp / 60 seconds	36.14 / 60	0.6
WCPM	correct_count / reading_time	42 / 0.6	70

With the calculations above, its best to use correct_count that's returned by the engine rather than calculating it using num_correct = word_count - mistakes this is because the word_count does not change for the read or heard types and this will cause an incorrect correct count if calculated using this method.

Additional Features

The following features provide additional datapoints which can enhance and improve the results above.

Last Word Type

With the last word feature, we get some additional information which can be used to improve the results. This will mainly effect the WCPM as instead of using the audio duration for the timestamp, we can now depending on which last word type is used, use a timestamp of when the child actually stopped speaking.
Read more: Fluency - Last Word Feature

JSON

The Last Word JSON below is added to the root JSON returned by our API. With this we can get a more accurate timestamp of when the child either stopped reading the passage with READ or stopped talking with HEARD.

"last_word": {
	"end_timestamp": 5.135,
	"text_score_index": 2,
	"type": "WHOLE"
}

This feature allows you to choose between 3 options which will change the returned data counts.

Type	Description
Whole	This will return data for entire file (default setting)
Read	This will return data up to last spoken word read correctly
Heard	This will return data up to last spoken word heard by system

Example Results

The following tables show results directly from the engine, for the three different types.

In this example the child has stopped reading from the passage after 31.14 seconds, they then speak a few more words which are flagged as insertions before stopping, meaning the last 24 seconds of audio is silence.

You will notice the amount of deletions for whole is 151, the majority of these occurring after the child has finished speaking, this leads to a poor Accuracy and WCPM.

Alternatively for read where we stop counting results after the last correct word, the number of deletions drops to 8, this indicates that the child skipped some words while reading. Because we do not count all the deletions at the end as we did in whole, the WCPM and Accuracy are now much higher and more accurate to what actually occurred.

In contrast to read, the heard option considers any insertions or substitutions detected after the last correct word, but similar to read does not consider deletions after the child stops speaking. This reduces the Accuracy somewhat compared to read as there were extra words spoken, but the Accuracy is still much improved in comparison to whole.

Please consult this Word Breakdown Table to see how the information is stored in the Word objects.

TYPE	WHOLE	READ	HEARD
Difference	155	12	12
correct_count	42	42	42
insertion_count	6	4	6
deletion_count	151	8	8
substitution_count	1	0	1
repetition_count	4	4	4
word_count	194	194	194
text_score_index	196	53	56
end_timestamp	60	31.14	36.14

WCPM	42	80.25	80.25
Accuracy Option # 1	21.65 %	84.00%	82.35%
Accuracy Option # 2	21.00%	77.78%	73.68%
Read Percentage Option # 1	25.26%	25.77%	25.26%
Read Percentage Option # 2	100.00%	23.71%	26.29%

Self Corrections

With self corrections the customer will have to decide if the word was a self correction or not. This can be done by going through the results and checking each word that was marked as a self correction and comparing the time_since_previous with the last word to ensure that it is within your time frame for what is a self correction.
Read more: Fluency - Self Corrections Feature

JSON

"sub_types": {
	"self_correction": {
		"reparandums": [1]
	}
}

Pseudo code to calculate the number of self corrections.

for item in text_scores:
  
  if item['sub_type']['self_correction']:
    index_of_reparandum = item['sub_type']['self_correction'][0]
    
    time_since_previous = text_score[index_of_reparandum]['transcription_details']['time_since_previous']
    
    if time_since_previous < SELF_CORRECTION_THRESHOLD:
      ### We have a self correction!
      self_correction_count += 1

Calculations

With this information, users can now use it to discard reparandums from mistakes if they wish to not punish students for attempts. We can now get a true_insertion_count which will take away the number of self corrections or reparandums if you wish from the insertion_count.

true_insertion_count = insertion_count - self_correction_count

Example Results

With this change, anything which relies on insertion_count will have its score effected and will lead to improved accuracy if the self corrections are omitted.

TYPE	WHOLE	With Self Corrections
Difference	12	12
correct_count	42	42
insertion_count	10	10
deletion_count	8	8
substitution_count	1	1
repetition_count	4	4
word_count	78	78
text_score_index	53	53
end_timestamp	36.14	36.14
number of reparandums	0	4

WCPM	70	70
Accuracy Option # 1	82.35%	82.35%
Accuracy Option # 2	68.85%	73.68%
Read Percentage Option # 1	65.38%	65.38%
Read Percentage Option # 2	67.95%	62.82%

Fluency - Example Calculations

Overview

What we provide

Sample Equations

Calculating Accuracy

Option #1

correct_count % passage_read.

Option #2

correct_count % total_words.

Calculating Read Percentage

Option #1

passage_read % word_count.

Option #2

words_said % word_count.

Calculating WCPM (Words Correct Per Minute)

Additional Features

Last Word Type

JSON

Example Results

Self Corrections

JSON

Pseudo code to calculate the number of self corrections.

Calculations

Example Results

Search