# Konfidenz # Maß für die Wahrscheinlichkeit, dass die Zeichen der Einzelwörter dem [N-Gram-Modell](https://en.wikipedia.org/wiki/N-gram) in einem Referenzkorpus der Ordnung 5 entsprechen. Das Referenzkorpus wird nach einer maschinellen Erkennung der Sprache einer Seite aus den [hier](https://charmodel.vls.io/ui) gelisteten ausgewählt. # Korrekturmetriken # - `cer` (Character Error Rate): `LevenshteinDistance / (len(hyp)+len(ref))` Same as `100% - LevenshteinRatio` - `wer` (Word Error Rate): Split text into words/tokens, calculate a Levenshtein distance on token level. The result is then `LevenshteinDistanceTokens / (#token(hyp) + #token(ref))` Therefore, it is the same as `CER`, but on word/token level. - `bow_ratio` (Bag of Words Intersection Ratio): Calculate how many items appear in both BoWs and their frequencies (=> the intersection). The ratio is then calculated by `2 * Intersection / (bowsize(hyp) + bowsize(ref))` This ratio can be compared to `100% - CER` and `100% - WER`. - `bow_precision` (Bag of Words Precision): How many items in `hyp` are relevant/correct in respect to all items in `hyp`. ("How much of `hyp` is correct?") `#TruePositive(hyp) / bowsize(hyp)` --> "How much of the detection is correct" - `bow_recall` (Bag of Words Recall): How many relevant items are selected/detected in `hyp` in respect to all items in `ref`. ("How much of `ref` is selected?") `#TruePositive(hyp) / bowsize(ref)` --> "How much of reference is detected" - `bow_f1score` (Bag of Words F1-Score): Calculated based on precision and recall (harmonic mean). Same as `bow_ratio`.