Levenshtein distance#
The Levenshtein distance [Nav01], also sometimes called edit distance, is a metric for measuring the difference between two strings. The Levenshtein distance between two strings is given by the minimum number of edits (insertions, deletions or replacements/substitutions) that you need to transform one string into the other. The higher the number, the more different the two strings are.
Example#
The Levenshtein distance between sun and sand is two because to turn sun into sand you need to replace one (u -> a) and add one (d at the end).
import stringalign
print(stringalign.align.levenshtein("sun", "sand"))
2
See the API documentation for stringalign.align.levenshtein_distance() for more information about this function.
Note
What constitutes “one edit” depends on how the string is tokenized. In general the Levenshtein distance is defined as the smallest number of “single-character” edits, so to get an as accurate as possible Levenstein distance you might need to normalize your strings and/or segment on grapheme clusters (Stringalign does both by default).