stringalign.error_classification.confusable_error

stringalign.error_classification.confusable_error#

stringalign.error_classification.confusable_error.count_confusable_errors(reference: str, predicted: str, tokenizer: Tokenizer, consider_confusables: Literal['confusables', 'intentional'] | dict[str, str]) int[source]#

Count the number of errors that are solely due to characters being replaced with a confusable (e.g. I and 1).

This function counts the number of edits we can avoid if we resolve the confusable characters in the strings before aligning them.

Parameters:#

reference

The reference text.

predicted

The predicted text.

tokenizer: Tokenizer

Tokenizer to use

consider_confusables

Which confusable list to use, see stringalign.normalize.StringNormalizer() or Confusables for more information.

Returns:#

int

The number of confusable errors.