# Konfidenz #
Maß für die Wahrscheinlichkeit, dass die Zeichen der Einzelwörter dem [N-Gram-Modell](https://en.wikipedia.org/wiki/N-gram) in einem Referenzkorpus der Ordnung 5 entsprechen.

Das Referenzkorpus wird nach einer maschinellen Erkennung der Sprache einer Seite aus den [hier](https://charmodel.vls.io/ui) gelisteten ausgewählt.


# Korrekturmetriken #
- `cer` (Character Error Rate):
    `LevenshteinDistance / (len(hyp)+len(ref))`
    Same as `100% - LevenshteinRatio`
- `wer` (Word Error Rate):
    Split text into words/tokens, calculate a Levenshtein distance
    on token level. The result is then
    `LevenshteinDistanceTokens / (#token(hyp) + #token(ref))`
    Therefore, it is the same as `CER`, but on word/token level.
- `bow_ratio` (Bag of Words Intersection Ratio):
    Calculate how many items appear in both BoWs and their
    frequencies (=> the intersection). The ratio is then calculated
    by `2 * Intersection / (bowsize(hyp) + bowsize(ref))`
    This ratio can be compared to `100% - CER` and `100% - WER`.

- `bow_precision` (Bag of Words Precision): How many items in `hyp`
    are relevant/correct in respect to all items in `hyp`.
    ("How much of `hyp` is correct?")
    `#TruePositive(hyp) / bowsize(hyp)`
    --> "How much of the detection is correct"
- `bow_recall` (Bag of Words Recall): How many relevant items are
    selected/detected in `hyp` in respect to all items in `ref`.
    ("How much of `ref` is selected?")
    `#TruePositive(hyp) / bowsize(ref)`
    --> "How much of reference is detected"
- `bow_f1score` (Bag of Words F1-Score): Calculated based on
    precision and recall (harmonic mean). Same as `bow_ratio`.