Benchmarks - Litteraworks

Learn More

Understanding the Metrics

A beginner-friendly guide to the evaluation metrics we use.

WER (Word Error Rate)

The percentage of words that were incorrectly transcribed. It's calculated by comparing the transcribed text to the reference text and counting substitutions, insertions, and deletions at the word level.

WER = (Substitutions + Insertions + Deletions) / Total Words x 100

Lower WER is better. A WER of 5% means 5 out of every 100 words contain errors.

CER (Character Error Rate)

Similar to WER, but measures errors at the character level instead of the word level. This metric is particularly useful for languages without clear word boundaries or for detecting minor spelling errors.

CER = (Char Substitutions + Insertions + Deletions) / Total Characters x 100

Lower CER is better. CER is typically lower than WER since a single word error might only affect a few characters.

Accuracy

The inverse of WER, representing the percentage of words correctly transcribed. It provides an intuitive measure of transcription quality.

Accuracy = 100% - WER

Higher accuracy is better. An accuracy of 95% means 95 out of every 100 words are correct.

"Improved" Score

After initial transcription, we apply GPT-4.1-mini to enhance the text by fixing common transcription errors, improving punctuation, and correcting obvious mistakes. The "improved" score reflects accuracy after this AI enhancement step.

Post-processed with GPT-4.1-mini text enhancement

The improved score shows how much AI post-processing can enhance raw transcription quality.

Error Types Explained

Substitutions

A word was replaced with a different word. Example: "cat" transcribed as "bat".

Insertions

An extra word was added that wasn't in the original audio. Example: "the cat" transcribed as "the big cat".

Deletions

A word from the original audio was missing in the transcription. Example: "the big cat" transcribed as "the cat".

LitteraWorks Benchmarks

Precision by Language

Unable to load data