Visualizing alignments#

Visualizing the alignment of the predicted and reference string is good way to gain some insight beyond summarized metrics. To aid this type of analysis, Stringalign has functionality to quickly display a lightweight visualization of an alignment.

import stringalign
from stringalign.evaluate import AlignmentAnalyzer

reference = "Hello world!"
predicted = "Hello w0rld!!"

tokenizer = stringalign.tokenize.GraphemeClusterTokenizer()
analyzer = AlignmentAnalyzer.from_strings(reference=reference, predicted=predicted, tokenizer=tokenizer)
analyzer.visualize()
Reference:Predicted:
H H
e e
l l
l l
o o
w w
o 0
r r
l l
d d
!
! !


The visualization is based on html and CSS and can easily be displayed in a notebook, in dashboard-frameworks that support html or in a web application.

Sometimes it can be beneficial to add extra spacing between the alignment operation (for example if your tokenizer removes spaces or your text contains non-spacing tokens). To add spacing between each token, you can use the space_alignment_ops flag.

tokenizer = stringalign.tokenize.SplitAtWhitespaceTokenizer()
analyzer = AlignmentAnalyzer.from_strings(reference=reference, predicted=predicted, tokenizer=tokenizer)
analyzer.visualize()
Reference:Predicted:
Hello Hello
world! w0rld!!


analyzer.visualize(space_alignment_ops=True)
Reference:Predicted:
Hello Hello
world! w0rld!!


Customize the visualization#

The stringalign.evaluate.AlignmentAnalyzer.visualize() method is a convenience wrapper around stringalign.visualize.create_alignment_html(). If you want more customization you can use stringalign.visualize.create_alignment_html() directly. Then you can, for example, change the text labels

stringalign.visualize.create_alignment_html(
    alignment=analyzer.raw_alignment,
    reference_label="Gold standard:",
    predicted_label="Model estimate:",
    space_alignment_ops=True,
)
Gold standard:Model estimate:
Hello Hello
world! w0rld!!


Customize the styling (advanced)#

You can also supply your own style sheet, which we demonstrate in this short example.

Total running time of the script: (0 minutes 0.004 seconds)

Gallery generated by Sphinx-Gallery