stringalign.evaluate#
- class stringalign.evaluate.AlignmentAnalyzer(reference: str, predicted: str, combined_alignment: tuple[AlignmentOperation, ...], raw_alignment: tuple[AlignmentOperation, ...], unique_alignment: bool, heuristic_edit_classifications: FrozenDict, metadata: FrozenDict | None, tokenizer: Tokenizer)[source]#
Utility data class that represents the errors for a single sample (reference/predicted pair)
- Parameters:
reference (str) – The reference string, also known as gold standard and ground truth
predicted (str) – The predicted string
combined_alignment (tuple[stringalign.align.AlignmentOperation, ...]) – The combined alignment for the reference and predicted string
raw_alignment (tuple[stringalign.align.AlignmentOperation, ...]) – The uncombined alignment for the reference and predicted string
unique_alignment (bool) – Boolean flag which is true if the alignment is unique.
horisontal_segmentation_errors – The alignment operations that likely are wrong due to segmentation errors. Corresponds to edits in the start or end of the string.
token_duplication_errors – Alignment operations that correspond to tokens that were repeated more times in the prediction than in the reference.
removed_duplicate_token_errors – Alignment operations that correspond to tokens that were repeated fewer times in the prediction than in the reference.
diacritic_errors – Alignment operations that correspond to diacritics being added or removed (e.g.
"ë" -> "e").confusable_errors – Alignment operations that correspond to confusable tplems being predicted.
case_errors – Alignment operations that correspond to case errors (i.e. errors that are resolved by casefolding the strings).
metadata (stringalign.evaluate.FrozenDict | None) – Optional metadata to include with the line error, useful if you e.g. want to include a text line ID.
tokenizer (stringalign.tokenize.Tokenizer) – The tokenizer used prior to alignment. Included for reproducibility purposes.
- combined_alignment: tuple[AlignmentOperation, ...]#
- compute_ter() float[source]#
Compute the token error rate (a generalisation of CER and WER).
The token error rate is the number of token edits divided by the total number of tokens in the reference.
If the tokenizer tokenizes the string into characters, this is equivalent to the character error rate (CER).
- property confusion_matrix: StringConfusionMatrix[source]#
The string confusion matrix for this string pair.
- Returns:
string_confusion_matrix
- Return type:
- classmethod from_strings(reference: str, predicted: str, tokenizer: Tokenizer | None, metadata: Mapping[Hashable, Hashable] | None = None, randomize_alignment: bool = False, random_state: Generator | int | None = None) Self[source]#
Create a AlignmentAnalyzer based on a reference string and a predicted string given a tokenizer.
- Parameters:
reference – The reference string, also known as gold standard or ground truth.
predicted – The string to align with the reference.
alignment – An optimal alignment for these strings
tokenizer (optional) – A tokenizer that turns a string into an iterable of tokens. For this function, it is sufficient that it is a callable that turns a string into an iterable of tokens. If not provided, then
stringalign.tokenize.DEFAULT_TOKENIZERis used instead, which by default is a grapheme cluster (character) tokenizer.metadata – Additional metadata about the sample, e.g. sample id.
randomize_alignment – If
True, then a random optimal alignment is chosen (slightly slower if enabled)random_state – The NumPy RNG or a seed to create a NumPy RNG used for picking the optimal alignment. If
None, then the default RNG will be used instead.
- Returns:
alignment_analyzer – The AlignmentAnalyzer object.
- Return type:
- heuristic_edit_classifications: FrozenDict#
- metadata: FrozenDict | None#
- raw_alignment: tuple[AlignmentOperation, ...]#
- summarise() dict[Hashable, Hashable][source]#
Convert this utility class to a dictionary, where the error classifications are converted to booleans.
This is useful if we, for example, want to know the number of many samples with at least one suspected diacritic error. However, it removes information about what the errors might be.
- Returns:
summary
- Return type:
dict[Hashable, Hashable]
- visualize(which: Literal['raw', 'combined'] = 'raw', space_alignment_ops: bool = False) HtmlString[source]#
Visualize the alignment (for Jupyter Notebooks).
This is a simple wrapper around
stringalign.visualize.create_alignment_html(). Use that function if you want to customise the visualisation further.See Visualizing alignments for an example.
- Parameters:
which – If
which="raw", then the raw alignment is visualised and ifwhich="combined"then the combined alignment is visualised.space_alignment_ops – If this is True, then there will be a small space between each alignment operation.
- Returns:
A special string type that is interpreted as HTML by Jupyter. It contains HTML for visualising the specified alignment.
- Return type:
- class stringalign.evaluate.ErrorType(*values)[source]#
Enum representing different edit types.
- CASE_ERROR = 'case_error'#
- CONFUSABLE_ERROR = 'confusable_error'#
- DIACRITIC_ERROR = 'diacritic_error'#
- HORISONTAL_SEGMENTATION_ERROR = 'horisontal_segmentation_error'#
- REMOVED_DUPLICATE_TOKEN_ERROR = 'removed_duplicate_token_error'#
- TOKEN_DUPLICATION_ERROR = 'token_duplication_error'#
- class stringalign.evaluate.FrozenDict(data: Mapping[Hashable, Any] | None = None)[source]#
An immutable and hashable dictionary.
Pickle is used to create hashes for non-hashable values.
- class stringalign.evaluate.MultiAlignmentAnalyzer(references: tuple[str, ...], predictions: tuple[str, ...], alignment_analyzers: tuple[AlignmentAnalyzer, ...], tokenizer: Tokenizer)[source]#
Utility class for evaluating all samples in a dataset.
- Parameters:
- alignment_analyzers: tuple[AlignmentAnalyzer, ...]#
- property alignment_operation_counts: dict[Literal['raw', 'combined'], Counter[AlignmentOperation]][source]#
Count the number of times each alignment operation occurs.
This is useful to identify common mistakes for a transcription model.
- Returns:
The number of times each alignment operation occurs
- Return type:
Counter[AlignmentOperation]
See also
- property alignment_operator_index: dict[Literal['raw', 'combined'], dict[AlignmentOperation, frozenset[AlignmentAnalyzer]]][source]#
Mapping from alignment ops. to sets of
AlignmentAnalyzerwith that operation in the combined alignment.This function is used to find all samples that contain specific alignment operations. It can, for example be used to identify all lines that contain a specific error a transcription model makes, which again can be useful for finding mistakes in the references.
- property confusion_matrix: StringConfusionMatrix[source]#
The micro-averaged confusion matrix for all samples.
- dump() list[dict[Hashable, Hashable]][source]#
Convert the alignment errors to dictionaries, where the error classifications are converted to booleans.
This is useful if we, for example, want to know the number of many samples with at least one suspected diacritic error. However, it removes information about what the errors might be.
- property edit_counts: dict[Literal['raw', 'combined'], Counter[AlignmentOperation]]#
Count the number of times each alignment operation representing edits occurs.
This is useful to identify common mistakes for a transcription model.
- Returns:
The number of times each edit operation, i.e. alignment operations that represent edit (i.e.
stringalign.align.Deleted,stringalign.align.Inserted, orstringalign.align.Replaced, occurs.- Return type:
Counter[AlignmentOperation]
See also
- property error_type_index: dict[ErrorType, Generator[AlignmentAnalyzer, None, None]]#
Mapping from error type to generators yielding
AlignmentAnalyzerwith at least one edit of that type.Horisontal segmentation errors
An alignment is said to contain a horisontal segmentation error if there is an edit at the start or end of the alignment. See
check_operation_for_horizontal_segmentation_error()for more information.Token duplication errors
An alignment is said to contain a duplication error if at least one token is duplicated in the prediction where it is not duplicated in the reference. For example, transcribing
"hello"as"helllo"would correspond to a duplication error. Seecheck_operation_for_ngram_duplication_error()for more information.Missed duplicated token errors
An alignment is said to contain a removed duplicate token error if at least one token is duplicated in the reference where it is duplicated in the prediction. For example, transcribing
"hello"as"helo"would correspond to a removed duplicate token error. Seecheck_operation_for_ngram_duplication_error()andstringalign.error_classification.duplication_error.check_ngram_duplication_errors()for more information.Missing diacritic errors
An alignment is said to contain a diacritic error if at least one of the edits would change into a Kept if we remove all diacritics. Note that this function also resolves confusables to be able to correctly remove diacritics. See
check_operation_for_diacritic_error()andstringalign.error_classification.diacritic_error.count_diacritic_errors()for more information.Confusable character errors
An alignment is said to contain a confusable error if at least one of the edits would change into a Kept if we resolve confusables. See
check_operation_for_confusable_error()andstringalign.error_classification.confusable_error.count_confusable_errors()for more information.Case errors
An alignment is said to contain a case error if at least one of the edits would change into a Kept if we case fold the contents. See
check_operation_for_case_error()andstringalign.error_classification.case_error.count_case_errors()for more information.- Return type:
dict[ErrorType, Generator[AlignmentAnalyzer, None, None]]
- property false_negative_index: dict[str, frozenset[AlignmentAnalyzer]][source]#
Mapping from tokens to sets of
AlignmentAnalyzerwith that false negative token
- property false_positive_index: dict[str, frozenset[AlignmentAnalyzer]][source]#
Mapping from tokens to sets of
AlignmentAnalyzerwith that false positive token
- classmethod from_strings(references: Iterable[str], predictions: Iterable[str], tokenizer: Tokenizer | None = None, metadata: Iterable[Mapping[Hashable, Hashable] | None] | None = None, randomize_alignment: bool = False, random_state: Generator | int | None = None) Self[source]#
Creates a transcription evaluator from iterables containing references and predictions.
- Parameters:
references – Iterable containing the reference strings.
predictions – Iterable containing the strings to align with the references.
tokenizer (optional) – A tokenizer that turns a string into an iterable of tokens. For this function, it is sufficient that it is a callable that turns a string into an iterable of tokens. If not provided, then
stringalign.tokenize.DEFAULT_TOKENIZERis used instead, which by default is a grapheme cluster (character) tokenizer.metadata – Additional metadata about the sample, e.g. sample id.
randomize_alignment – If
True, then a random optimal alignment is chosen (slightly slower if enabled)random_state – The NumPy RNG or a seed to create a NumPy RNG used for picking the optimal alignment. If
None, then the default RNG will be used instead.
- Returns:
transcription_evaluator
- Return type:
- property not_unique_alignments: Generator[AlignmentAnalyzer]#
AlignmentAnalyzerinstances whose alignments are not unique.This is useful to assess why alignments might not be unique. For example, whether the non-uniqueness stems from duplicated or transposed tokens.
- Yields:
AlignmentAnalyzer
- stringalign.evaluate.check_operation_for_case_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) int[source]#
Check if this alignment operation is an edit due to mistaken casing.
This function resolves resolves case errors by casefolding. This means that certain characters are changed even if they are lowercase already (like
'ß'being changed into'ss').Note
Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.
- Parameters:
previous_operation – The previous alignment operation. If
current_operationis the first alignment operation in an alignment, this isNone.current_operation – The alignment operation to check for case errors.
next_operation – The next alignment operation. If
current_operationis the last alignment operation in an alignment, this isNone.tokenizer – The tokenizer used for the original alignment.
- Returns:
The number of edits that are due to mistaken diacritics.
- Return type:
- stringalign.evaluate.check_operation_for_confusable_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None, *, tokenizer: Tokenizer) int[source]#
Check if this alignment operation is an edit due to confusable characters.
This function uses the
"confusables"-list. If you want to check with a different set of confusables, then you should usestringalign.error_classification.confusable_error.count_confusable_errors()directly.Note
Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.
- Parameters:
previous_operation – The previous alignment operation. If
current_operationis the first alignment operation in an alignment, this isNonecurrent_operation – The alignment operation to check for confusable errors.
next_operation – The next alignment operation. If
current_operationis the last alignment operation in an alignment, this isNonetokenizer – The tokenizer used for the original alignment.
- Returns:
The number of edits that are due to confusable characters.
- Return type:
- stringalign.evaluate.check_operation_for_diacritic_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) int[source]#
Check if this alignment operation is an edit due to mistaken diacritics.
This function resolves confusables with the
"confusables"-list as well (otherwise it would not be possible to remove the diacritics).Note
Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.
- Parameters:
previous_operation – The previous alignment operation. If
current_operationis the first alignment operation in an alignment, this isNonecurrent_operation – The alignment operation to check for diacritic errors.
next_operation – The next alignment operation. If
current_operationis the last alignment operation in an alignment, this isNonetokenizer – The tokenizer used for the original alignment.
- Returns:
The number of edits that are due to mistaken diacritics.
- Return type:
- stringalign.evaluate.check_operation_for_horizontal_segmentation_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None) bool[source]#
Check if the alignment error is likely due to a horisontal segmentation error.
This is checked by seeing if the alignment operation is an edit at the start or end of the string.
Note
Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.
- Parameters:
previous_operation – The previous alignment operation. If
current_operationis the first alignment operation in an alignment, this isNonecurrent_operation – The alignment operation to check for horisontal segmentation errors.
next_operation – The next alignment operation. If
current_operationis the last alignment operation in an alignment, this isNone
- Returns:
True if the alignment error is likely due to a horisontal segmentation error. Else false.
- Return type:
- stringalign.evaluate.check_operation_for_ngram_duplication_error(previous_operation: AlignmentOperation | None, current_operation: AlignmentOperation, next_operation: AlignmentOperation | None, *, n: int, error_type: Literal['inserted', 'deleted'] = 'inserted', tokenizer: Tokenizer) bool[source]#
Check if this alignment operation is an n-gram duplication error or missing duplicate n-gram error.
This function checks if the only reason for the alignment operation is due to an n-gram duplication or missing duplicate n-gram.
Note
Error classification should be performed on combined alignment operations so edit operations and kept operations alternate.
- Parameters:
previous_operation – The previous alignment operation. If
current_operationis the first alignment operation in an alignment, this isNone.current_operation – The alignment operation to check for n-gram duplication errors.
next_operation – The next alignment operation. If
current_operationis the last alignment operation in an alignment, this isNone.n – The number of tokens in the n-grams we evaluate. For single token duplication errors, this should be 1.
error_type –
"inserted"if we are checking for inserted duplicates and"deleted"if we are checking for deleted duplicates.tokenizer – The tokenizer used for the original alignment.
- Returns:
True if the only reason for the alignment operation is due to an n-gram duplication or missing duplicate n-gram. Else false.
- Return type:
- stringalign.evaluate.compute_cer(reference: str, predicted: str) tuple[float, AlignmentAnalyzer][source]#
Compute the CER for two strings.
This is just a convenience function that creates an
AlignmentAnalyzerwith astringalign.tokenize.GraphemeClusterTokenizerand computes the CER with thestringalign.statistics.StringConfusionMatrix.compute_token_error_rate()method of theAlignmentAnalyzer’sstringalign.statistics.StringConfusionMatrix.For more information about the CER, see The character, word and token error rate.
- Parameters:
reference – The reference string, also known as gold standard and ground truth
predicted – The predicted string
- Returns:
float – The CER
AlignmentAnalyzer – The alignment analyzer used to compute the CER (via the token error rate)
See also
stringalign.evaluate.compute_ter,stringalign.evaluate.compute_wer,stringalign.evaluate.AlignmentAnalyzer,stringalign.statistics.StringConfusionMatrixExamples
>>> tokenizer = stringalign.tokenize.GraphemeClusterTokenizer() >>> ter, analyzer = compute_cer("Hi there", "He there") >>> ter 0.125 >>> analyzer.confusion_matrix.compute_token_error_rate() 0.125 >>> analyzer AlignmentAnalyzer( reference='Hi there', predicted='He there', metadata=None, tokenizer=GraphemeClusterTokenizer( pre_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ), post_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ) ) )
- stringalign.evaluate.compute_ter(reference: str, predicted: str, tokenizer: Tokenizer) tuple[float, AlignmentAnalyzer][source]#
Compute the token error rate (TER) for two strings.
This is just a convenience function that creates an
AlignmentAnalyzerand computes the TER with thestringalign.statistics.StringConfusionMatrix.compute_token_error_rate()method of theAlignmentAnalyzer’sstringalign.statistics.StringConfusionMatrix.For more information about the TER, see The character, word and token error rate.
- Parameters:
reference – The reference string, also known as gold standard and ground truth
predicted – The predicted string
tokenizer – Tokenizer to split the string into a iterable of tokens.
- Returns:
float – The TER
AlignmentAnalyzer – The alignment analyzer used to compute the TER (token error rate)
See also
stringalign.evaluate.compute_cer,stringalign.evaluate.compute_wer,stringalign.evaluate.AlignmentAnalyzer,stringalign.statistics.StringConfusionMatrixExamples
If we use a
stringalign.tokenize.GraphemeClusterTokenizer, we compute the character error rate:>>> tokenizer = stringalign.tokenize.GraphemeClusterTokenizer() >>> ter, analyzer = compute_ter("Hi there", "He there", tokenizer=tokenizer) >>> ter 0.125 >>> analyzer.confusion_matrix.compute_token_error_rate() 0.125 >>> cer, _analyzer = compute_cer("Hi there", "He there") >>> cer 0.125
And if we use a
stringalign.tokenize.SplitAtWhitespaceTokenizer, we compute a word error rate:>>> tokenizer = stringalign.tokenize.SplitAtWhitespaceTokenizer() >>> ter, analyzer = compute_ter("Hi there", "He there", tokenizer=tokenizer) >>> ter 0.5 >>> analyzer.confusion_matrix.compute_token_error_rate() 0.5 >>> wer, wer_analyzer = compute_wer("Hi there", "He there", word_definition="whitespace") >>> wer 0.5 >>> wer_analyzer AlignmentAnalyzer( reference='Hi there', predicted='He there', metadata=None, tokenizer=SplitAtWhitespaceTokenizer( pre_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ), post_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ) ) )
- stringalign.evaluate.compute_wer(reference: str, predicted: str, word_definition: Literal['whitespace', 'unicode', 'unicode_word_boundary'] = 'whitespace') tuple[float, AlignmentAnalyzer][source]#
Compute the WER for two strings.
This is just a convenience function that creates an
AlignmentAnalyzerwith an appropriate tokenizer and computes the WER with thestringalign.statistics.StringConfusionMatrix.compute_token_error_rate()method of theAlignmentAnalyzer’sstringalign.statistics.StringConfusionMatrix.For more information about the WER, see The character, word and token error rate.
- Parameters:
reference – The reference string, also known as gold standard and ground truth
predicted – The predicted string
word_definition –
How words are defined for the WER. Used to select tokenizer:
"whitespace":stringalign.tokenize.SplitAtWhitespaceTokenizer(default)"unicode":stringalign.tokenize.UnicodeWordTokenizer"unicode_boundary":stringalign.tokenize.SplitAtWordBoundaryTokenizer
- Returns:
float – The WER
AlignmentAnalyzer – The alignment analyzer used to compute the WER (via the token error rate)
See also
stringalign.evaluate.compute_ter,stringalign.evaluate.compute_cer,stringalign.evaluate.AlignmentAnalyzer,stringalign.statistics.StringConfusionMatrixExamples
>>> wer, analyzer = compute_wer("Hello world!", "Hello world") >>> wer 0.5 >>> analyzer.confusion_matrix.compute_token_error_rate() 0.5 >>> analyzer AlignmentAnalyzer( reference='Hello world!', predicted='Hello world', metadata=None, tokenizer=SplitAtWhitespaceTokenizer( pre_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ), post_tokenization_normalizer=StringNormalizer( normalization='NFC', case_insensitive=False, normalize_whitespace=False, remove_whitespace=False, remove_non_word_characters=False, resolve_confusables=None, ) ) )
- stringalign.evaluate.join_windows(center_string: str, previous_operation: Kept | None, next_operation: Kept | None) str[source]#
Join the text in from the center alignment operation with the previous and next operation (if possible).
Paramters#
- center_string
The text in the center string, must come from an edit operation.
- previous_operation
The previous alignment operation. Since the error classification algorithms use combined alignments, this is guaranteed to be a
stringalign.align.Kept-operation orNone.- next_operation
The next alignment operation. Since the error classification algorithms use combined alignments, this is guaranteed to be a
stringalign.align.Kept-operation orNone.