stringalign.evaluate

Contents

stringalign.evaluate#

class stringalign.evaluate.AlignmentAnalyzer(reference: str, predicted: str, combined_alignment: tuple[AlignmentOperation, ...], raw_alignment: tuple[AlignmentOperation, ...], unique_alignment: bool, heuristic_edit_classifications: FrozenDict, metadata: FrozenDict | None, tokenizer: Tokenizer)[source]#

Utility data class that represents the errors for a single sample (reference/predicted pair)

Parameters:
  • reference (str) – The reference string, also known as gold standard and ground truth

  • predicted (str) – The predicted string

  • combined_alignment (tuple[stringalign.align.AlignmentOperation, ...]) – The combined alignment for the reference and predicted string

  • raw_alignment (tuple[stringalign.align.AlignmentOperation, ...]) – The uncombined alignment for the reference and predicted string

  • unique_alignment (bool) – Boolean flag which is true if the alignment is unique.

  • horisontal_segmentation_errors – The alignment operations that likely are wrong due to segmentation errors. Corresponds to edits in the start or end of the string.

  • token_duplication_errors – Alignment operations that correspond to tokens that were repeated more times in the prediction than in the reference.

  • removed_duplicate_token_errors – Alignment operations that correspond to tokens that were repeated fewer times in the prediction than in the reference.

  • diacritic_errors – Alignment operations that correspond to diacritics being added or removed (e.g. "ë" -> "e").

  • confusable_errors – Alignment operations that correspond to confusable tplems being predicted.

  • case_errors – Alignment operations that correspond to case errors (i.e. errors that are resolved by casefolding the strings).

  • metadata (stringalign.evaluate.FrozenDict | None) – Optional metadata to include with the line error, useful if you e.g. want to include a text line ID.

  • tokenizer (stringalign.tokenize.Tokenizer) – The tokenizer used prior to alignment. Included for reproducibility purposes.

combined_alignment: tuple[AlignmentOperation, ...]#
compute_ter() float[source]#

Compute the token error rate (a generalisation of CER and WER).

The token error rate is the number of token edits divided by the total number of tokens in the reference.

If the tokenizer tokenizes the string into characters, this is equivalent to the character error rate (CER).

Returns:

token_error_rate – The token error rate.

Return type:

dict[str, float] | float

property confusion_matrix: StringConfusionMatrix[source]#

The string confusion matrix for this string pair.

Returns:

string_confusion_matrix

Return type:

StringConfusionMatrix

classmethod from_strings(reference: str, predicted: str, tokenizer: Tokenizer | None, metadata: Mapping[Hashable, Hashable] | None = None, randomize_alignment: bool = False, random_state: Generator | int | None = None) Self[source]#

Create a AlignmentAnalyzer based on a reference string and a predicted string given a tokenizer.

Parameters:
  • reference – The reference string, also known as gold standard or ground truth.

  • predicted – The string to align with the reference.

  • alignment – An optimal alignment for these strings

  • tokenizer (optional) – A tokenizer that turns a string into an iterable of tokens. For this function, it is sufficient that it is a callable that turns a string into an iterable of tokens. If not provided, then stringalign.tokenize.DEFAULT_TOKENIZER is used instead, which by default is a grapheme cluster (character) tokenizer.

  • metadata – Additional metadata about the sample, e.g. sample id.

  • randomize_alignment – If True, then a random optimal alignment is chosen (slightly slower if enabled)

  • random_state – The NumPy RNG or a seed to create a NumPy RNG used for picking the optimal alignment. If None, then the default RNG will be used instead.

Returns:

alignment_analyzer – The AlignmentAnalyzer object.

Return type:

AlignmentAnalyzer

heuristic_edit_classifications: FrozenDict#
metadata: FrozenDict | None#
predicted: str#
raw_alignment: tuple[AlignmentOperation, ...]#
reference: str#
summarise() dict[Hashable, Hashable][source]#

Convert this utility class to a dictionary, where the error classifications are converted to booleans.

This is useful if we, for example, want to know the number of many samples with at least one suspected diacritic error. However, it removes information about what the errors might be.

Returns:

summary

Return type:

dict[Hashable, Hashable]

tokenizer: Tokenizer#
unique_alignment: bool#
visualize(which: Literal['raw', 'combined'] = 'raw', space_alignment_ops: bool = False) HtmlString[source]#

Visualize the alignment (for Jupyter Notebooks).

This is a simple wrapper around stringalign.visualize.create_alignment_html(). Use that function if you want to customise the visualisation further.

See Visualizing alignments for an example.

Parameters:
  • which – If which="raw", then the raw alignment is visualised and if which="combined" then the combined alignment is visualised.

  • space_alignment_ops – If this is True, then there will be a small space between each alignment operation.

Returns:

A special string type that is interpreted as HTML by Jupyter. It contains HTML for visualising the specified alignment.

Return type:

HtmlString

class stringalign.evaluate.ErrorType(*values)[source]#

Enum representing different edit types.

CASE_ERROR = 'case_error'#
CONFUSABLE_ERROR = 'confusable_error'#
DIACRITIC_ERROR = 'diacritic_error'#
HORISONTAL_SEGMENTATION_ERROR = 'horisontal_segmentation_error'#
REMOVED_DUPLICATE_TOKEN_ERROR = 'removed_duplicate_token_error'#
TOKEN_DUPLICATION_ERROR = 'token_duplication_error'#
class stringalign.evaluate.FrozenDict(data: Mapping[Hashable, Any] | None = None)[source]#

An immutable and hashable dictionary.

Pickle is used to create hashes for non-hashable values.

class stringalign.evaluate.MultiAlignmentAnalyzer(references: tuple[str, ...], predictions: tuple[str, ...], alignment_analyzers: tuple[AlignmentAnalyzer, ...], tokenizer: Tokenizer)[source]#

Utility class for evaluating all samples in a dataset.

Parameters:
alignment_analyzers: tuple[AlignmentAnalyzer, ...]#
property alignment_operation_counts: dict[Literal['raw', 'combined'], Counter[AlignmentOperation]][source]#

Count the number of times each alignment operation occurs.

This is useful to identify common mistakes for a transcription model.

Returns:

The number of times each alignment operation occurs

Return type:

Counter[AlignmentOperation]

See also

edit_counts

property alignment_operator_index: dict[Literal['raw', 'combined'], dict[AlignmentOperation, frozenset[AlignmentAnalyzer]]][source]#

Mapping from alignment ops. to sets of AlignmentAnalyzer with that operation in the combined alignment.

This function is used to find all samples that contain specific alignment operations. It can, for example be used to identify all lines that contain a specific error a transcription model makes, which again can be useful for finding mistakes in the references.

compute_ter() float[source]#
property confusion_matrix: StringConfusionMatrix[source]#

The micro-averaged confusion matrix for all samples.

dump() list[dict[Hashable, Hashable]][source]#

Convert the alignment errors to dictionaries, where the error classifications are converted to booleans.

This is useful if we, for example, want to know the number of many samples with at least one suspected diacritic error. However, it removes information about what the errors might be.

Returns:

summary

Return type:

list[dict[Hashable, Hashable]]

property edit_counts: dict[Literal['raw', 'combined'], Counter[AlignmentOperation]]#

Count the number of times each alignment operation representing edits occurs.

This is useful to identify common mistakes for a transcription model.

Returns:

The number of times each edit operation, i.e. alignment operations that represent edit (i.e. stringalign.align.Deleted, stringalign.align.Inserted, or stringalign.align.Replaced, occurs.

Return type:

Counter[AlignmentOperation]

property error_type_index: dict[ErrorType, Generator[AlignmentAnalyzer, None, None]]#

Mapping from error type to generators yielding AlignmentAnalyzer with at least one edit of that type.

Horisontal segmentation errors

An alignment is said to contain a horisontal segmentation error if there is an edit at the start or end of the alignment. See check_operation_for_horizontal_segmentation_error() for more information.

Token duplication errors

An alignment is said to contain a duplication error if at least one token is duplicated in the prediction where it is not duplicated in the reference. For example, transcribing "hello" as "helllo" would correspond to a duplication error. See check_operation_for_ngram_duplication_error() for more information.

Missed duplicated token errors

An alignment is said to contain a removed duplicate token error if at least one token is duplicated in the reference where it is duplicated in the prediction. For example, transcribing "hello" as "helo" would correspond to a removed duplicate token error. See check_operation_for_ngram_duplication_error() and stringalign.error_classification.duplication_error.check_ngram_duplication_errors() for more information.

Missing diacritic errors

An alignment is said to contain a diacritic error if at least one of the edits would change into a Kept if we remove all diacritics. Note that this function also resolves confusables to be able to correctly remove diacritics. See check_operation_for_diacritic_error() and stringalign.error_classification.diacritic_error.count_diacritic_errors() for more information.

Confusable character errors

An alignment is said to contain a confusable error if at least one of the edits would change into a Kept if we resolve confusables. See check_operation_for_confusable_error() and stringalign.error_classification.confusable_error.count_confusable_errors() for more information.

Case errors

An alignment is said to contain a case error if at least one of the edits would change into a Kept if we case fold the contents. See check_operation_for_case_error() and stringalign.error_classification.case_error.count_case_errors() for more information.

Return type:

dict[ErrorType, Generator[AlignmentAnalyzer, None, None]]

property false_negative_index: dict[str, frozenset[AlignmentAnalyzer]][source]#

Mapping from tokens to sets of AlignmentAnalyzer with that false negative token

property false_positive_index: dict[str, frozenset[AlignmentAnalyzer]][source]#

Mapping from tokens to sets of AlignmentAnalyzer with that false positive token

classmethod from_strings(references: Iterable[str], predictions: Iterable[str], tokenizer: Tokenizer | None = None, metadata: Iterable[Mapping[Hashable, Hashable] | None] | None = None, randomize_alignment: bool = False, random_state: Generator | int | None = None) Self[source]#

Creates a transcription evaluator from iterables containing references and predictions.

Parameters:
  • references – Iterable containing the reference strings.

  • predictions – Iterable containing the strings to align with the references.

  • tokenizer (optional) – A tokenizer that turns a string into an iterable of tokens. For this function, it is sufficient that it is a callable that turns a string into an iterable of tokens. If not provided, then stringalign.tokenize.DEFAULT_TOKENIZER is used instead, which by default is a grapheme cluster (character) tokenizer.

  • metadata – Additional metadata about the sample, e.g. sample id.

  • randomize_alignment – If True, then a random optimal alignment is chosen (slightly slower if enabled)

  • random_state – The NumPy RNG or a seed to create a NumPy RNG used for picking the optimal alignment. If None, then the default RNG will be used instead.

Returns:

transcription_evaluator

Return type:

MultiAlignmentAnalyzer

property not_unique_alignments: Generator[AlignmentAnalyzer]#

AlignmentAnalyzer instances whose alignments are not unique.

This is useful to assess why alignments might not be unique. For example, whether the non-uniqueness stems from duplicated or transposed tokens.

Yields:

AlignmentAnalyzer

predictions: tuple[str, ...]#
references: tuple[str, ...]#
tokenizer: Tokenizer#
stringalign.evaluate.check_operation_for_case_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) int[source]#

Check if this alignment operation is an edit due to mistaken casing.

This function resolves resolves case errors by casefolding. This means that certain characters are changed even if they are lowercase already (like 'ß' being changed into 'ss').

Note

Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.

Parameters:
  • previous_operation – The previous alignment operation. If current_operation is the first alignment operation in an alignment, this is None.

  • current_operation – The alignment operation to check for case errors.

  • next_operation – The next alignment operation. If current_operation is the last alignment operation in an alignment, this is None.

  • tokenizer – The tokenizer used for the original alignment.

Returns:

The number of edits that are due to mistaken diacritics.

Return type:

int

stringalign.evaluate.check_operation_for_confusable_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None, *, tokenizer: Tokenizer) int[source]#

Check if this alignment operation is an edit due to confusable characters.

This function uses the "confusables"-list. If you want to check with a different set of confusables, then you should use stringalign.error_classification.confusable_error.count_confusable_errors() directly.

Note

Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.

Parameters:
  • previous_operation – The previous alignment operation. If current_operation is the first alignment operation in an alignment, this is None

  • current_operation – The alignment operation to check for confusable errors.

  • next_operation – The next alignment operation. If current_operation is the last alignment operation in an alignment, this is None

  • tokenizer – The tokenizer used for the original alignment.

Returns:

The number of edits that are due to confusable characters.

Return type:

int

stringalign.evaluate.check_operation_for_diacritic_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) int[source]#

Check if this alignment operation is an edit due to mistaken diacritics.

This function resolves confusables with the "confusables"-list as well (otherwise it would not be possible to remove the diacritics).

Note

Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.

Parameters:
  • previous_operation – The previous alignment operation. If current_operation is the first alignment operation in an alignment, this is None

  • current_operation – The alignment operation to check for diacritic errors.

  • next_operation – The next alignment operation. If current_operation is the last alignment operation in an alignment, this is None

  • tokenizer – The tokenizer used for the original alignment.

Returns:

The number of edits that are due to mistaken diacritics.

Return type:

int

stringalign.evaluate.check_operation_for_horizontal_segmentation_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) bool[source]#

Check if the alignment error is likely due to a horisontal segmentation error.

This is checked by seeing if the alignment operation is an edit at the start or end of the string.

Note

Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.

Parameters:
  • previous_operation – The previous alignment operation. If current_operation is the first alignment operation in an alignment, this is None

  • current_operation – The alignment operation to check for horisontal segmentation errors.

  • next_operation – The next alignment operation. If current_operation is the last alignment operation in an alignment, this is None

Returns:

True if the alignment error is likely due to a horisontal segmentation error. Else false.

Return type:

bool

stringalign.evaluate.check_operation_for_ngram_duplication_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None, *, n: int, error_type: Literal['inserted', 'deleted'] = 'inserted', tokenizer: Tokenizer) bool[source]#

Check if this alignment operation is an n-gram duplication error or missing duplicate n-gram error.

This function checks if the only reason for the alignment operation is due to an n-gram duplication or missing duplicate n-gram.

Note

Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.

Parameters:
  • previous_operation – The previous alignment operation. If current_operation is the first alignment operation in an alignment, this is None.

  • current_operation – The alignment operation to check for n-gram duplication errors.

  • next_operation – The next alignment operation. If current_operation is the last alignment operation in an alignment, this is None.

  • n – The number of tokens in the n-grams we evaluate. For single token duplication errors, this should be 1.

  • error_type"inserted" if we are checking for inserted duplicates and "deleted" if we are checking for deleted duplicates.

  • tokenizer – The tokenizer used for the original alignment.

Returns:

True if the only reason for the alignment operation is due to an n-gram duplication or missing duplicate n-gram. Else false.

Return type:

bool

stringalign.evaluate.compute_cer(reference: str, predicted: str) tuple[float, AlignmentAnalyzer][source]#

Compute the CER for two strings.

This is just a convenience function that creates an AlignmentAnalyzer with a stringalign.tokenize.GraphemeClusterTokenizer and computes the CER with the stringalign.statistics.StringConfusionMatrix.compute_token_error_rate() method of the AlignmentAnalyzer’s stringalign.statistics.StringConfusionMatrix.

For more information about the CER, see The character, word and token error rate.

Parameters:
  • reference – The reference string, also known as gold standard and ground truth

  • predicted – The predicted string

Returns:

  • float – The CER

  • AlignmentAnalyzer – The alignment analyzer used to compute the CER (via the token error rate)

Examples

>>> tokenizer = stringalign.tokenize.GraphemeClusterTokenizer()
>>> ter, analyzer = compute_cer("Hi there", "He there")
>>> ter
0.125
>>> analyzer.confusion_matrix.compute_token_error_rate()
0.125
>>> analyzer
AlignmentAnalyzer(
    reference='Hi there',
    predicted='He there',
    metadata=None,
    tokenizer=GraphemeClusterTokenizer(
        pre_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        ),
        post_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        )
    )
)
stringalign.evaluate.compute_ter(reference: str, predicted: str, tokenizer: Tokenizer) tuple[float, AlignmentAnalyzer][source]#

Compute the token error rate (TER) for two strings.

This is just a convenience function that creates an AlignmentAnalyzer and computes the TER with the stringalign.statistics.StringConfusionMatrix.compute_token_error_rate() method of the AlignmentAnalyzer’s stringalign.statistics.StringConfusionMatrix.

For more information about the TER, see The character, word and token error rate.

Parameters:
  • reference – The reference string, also known as gold standard and ground truth

  • predicted – The predicted string

  • tokenizer – Tokenizer to split the string into a iterable of tokens.

Returns:

  • float – The TER

  • AlignmentAnalyzer – The alignment analyzer used to compute the TER (token error rate)

Examples

If we use a stringalign.tokenize.GraphemeClusterTokenizer, we compute the character error rate:

>>> tokenizer = stringalign.tokenize.GraphemeClusterTokenizer()
>>> ter, analyzer = compute_ter("Hi there", "He there", tokenizer=tokenizer)
>>> ter
0.125
>>> analyzer.confusion_matrix.compute_token_error_rate()
0.125
>>> cer, _analyzer = compute_cer("Hi there", "He there")
>>> cer
0.125

And if we use a stringalign.tokenize.SplitAtWhitespaceTokenizer, we compute a word error rate:

>>> tokenizer = stringalign.tokenize.SplitAtWhitespaceTokenizer()
>>> ter, analyzer = compute_ter("Hi there", "He there", tokenizer=tokenizer)
>>> ter
0.5
>>> analyzer.confusion_matrix.compute_token_error_rate()
0.5
>>> wer, wer_analyzer = compute_wer("Hi there", "He there", word_definition="whitespace")
>>> wer
0.5
>>> wer_analyzer
AlignmentAnalyzer(
    reference='Hi there',
    predicted='He there',
    metadata=None,
    tokenizer=SplitAtWhitespaceTokenizer(
        pre_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        ),
        post_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        )
    )
)
stringalign.evaluate.compute_wer(reference: str, predicted: str, word_definition: Literal['whitespace', 'unicode', 'unicode_word_boundary'] = 'whitespace') tuple[float, AlignmentAnalyzer][source]#

Compute the WER for two strings.

This is just a convenience function that creates an AlignmentAnalyzer with an appropriate tokenizer and computes the WER with the stringalign.statistics.StringConfusionMatrix.compute_token_error_rate() method of the AlignmentAnalyzer’s stringalign.statistics.StringConfusionMatrix.

For more information about the WER, see The character, word and token error rate.

Parameters:
Returns:

  • float – The WER

  • AlignmentAnalyzer – The alignment analyzer used to compute the WER (via the token error rate)

Examples

>>> wer, analyzer = compute_wer("Hello world!", "Hello world")
>>> wer
0.5
>>> analyzer.confusion_matrix.compute_token_error_rate()
0.5
>>> analyzer
AlignmentAnalyzer(
    reference='Hello world!',
    predicted='Hello world',
    metadata=None,
    tokenizer=SplitAtWhitespaceTokenizer(
        pre_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        ),
        post_tokenization_normalizer=StringNormalizer(
            normalization='NFC',
            case_insensitive=False,
            normalize_whitespace=False,
            remove_whitespace=False,
            remove_non_word_characters=False,
            resolve_confusables=None,
        )
    )
)
stringalign.evaluate.join_windows(center_string: str, previous_operation: Kept | None, next_operation: Kept | None) str[source]#

Join the text in from the center alignment operation with the previous and next operation (if possible).

Paramters#

center_string

The text in the center string, must come from an edit operation.

previous_operation

The previous alignment operation. Since the error classification algorithms use combined alignments, this is guaranteed to be a stringalign.align.Kept-operation or None.

next_operation

The next alignment operation. Since the error classification algorithms use combined alignments, this is guaranteed to be a stringalign.align.Kept-operation or None.