levenshtein:   Levenshtein Distance Metric
1 Introduction
2 Basic Comparisons
vector-levenshtein/  predicate/  get-scratch
vector-levenshtein/  predicate
vector-levenshtein/  eq
vector-levenshtein/  eqv
vector-levenshtein/  equal
vector-levenshtein
list-levenshtein/  predicate
list-levenshtein/  eq
list-levenshtein/  eqv
list-levenshtein/  equal
list-levenshtein
string-levenshtein
3 Type-Coercing Comparisons
levenshtein/  predicate
levenshtein
4 History
5 Legal
2:0

levenshtein: Levenshtein Distance MetricπŸ”—β„Ή

Neil Van Dyke

1 IntroductionπŸ”—β„Ή

This package provides a Racket implementation of the Levenshtein Distance algorithm, which is an edit distance metric of string similarity, due to Vladimir Levenshtein. This package’s implementation started as a transliteration of Jorge Mas Trullenque’s space-efficient Perl implementation, to R5RS Scheme.
The Levenshtein Distance is a function of two strings that represents a count of single-character insertions, deletions,and substitions that will change the first string to the second. More information is available in NIST DADS and the Michael Gilleland article, β€œLevenshtein Distance in Three Flavors.”
This implementation supports heterogeneous combinations of Scheme types (strings, lists, vectors), user-supplied predicate functions, and optionally reusable scratch vectors. Because this is an old Scheme library, the API isn’t entirely idiomatic for modern Racket.

2 Basic ComparisonsπŸ”—β„Ή

In the current implementation, all comparisons are done internally via vectors.

procedure

(vector-levenshtein/predicate/get-scratch a 
  b 
  pred 
  get-scratch) 
 β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
  pred : (-> any/c any/c boolean?)
  get-scratch : (-> exact-positive-integer? vector?)
Few, if any, programs will use this procedure directly. This is like vector-levenshtein/predicate, but allows get-scratch to be specified. get-scratch is a procedure of one term, n, that yields a vector of length n or greater, which is used for record-keeping during execution of the Levenshtein algorithm. make-vector can be used for get-scratch, although some programs comparing a large size or quantity of vectors may wish to reuse a record-keeping vector, rather than each time allocating a new one that will need to be garbage-collected.

procedure

(vector-levenshtein/predicate a b pred)

 β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
  pred : (-> any/c any/c boolean?)
(vector-levenshtein/eq a b) β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
(vector-levenshtein/eqv a b) β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
(vector-levenshtein/equal a b) β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
(vector-levenshtein a b) β†’ exact-nonnegative-integer?
  a : vector?
  b : vector?
Calculate the Levenshtein Distance of vectors a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. vector-levenshtein is an alias for vector-levenshtein/equal.
> (vector-levenshtein '#(6 6 6) '#(6 35 6 24 6 32))
  3

procedure

(list-levenshtein/predicate a b pred)

 β†’ exact-nonnegative-integer?
  a : list?
  b : list?
  pred : (-> any/c any/c boolean?)
(list-levenshtein/eq a b) β†’ exact-nonnegative-integer?
  a : list?
  b : list?
(list-levenshtein/eqv a b) β†’ exact-nonnegative-integer?
  a : list?
  b : list?
(list-levenshtein/equal a b) β†’ exact-nonnegative-integer?
  a : list?
  b : list?
(list-levenshtein a b) β†’ exact-nonnegative-integer?
  a : list?
  b : list?
Calculate the Levenshtein Distance of lists a and b. pred is the predicate procedure for determining if two elements are equal. The /eq, /eqv, and /equal variants correspond to the standard equivalence predicates, eq?, eqv?, and equal?. list-levenshtein is an alias for list-levenshtein/equal. Note that comparison of lists is less efficient than comparison of vectors.
> (list-levenshtein/eq '(b c e x f y) '(a b c d e f))
  4

procedure

(string-levenshtein a b) β†’ exact-nonnegative-integer?

  a : string?
  b : string?
Calculate the Levenshtein Distance of strings a and b.
> (string-levenshtein "adresse" "address")
  2

3 Type-Coercing ComparisonsπŸ”—β„Ή

Procedures levenshtein and levenshtein/predicate provide a convenient interface for comparing a combination of vectors, lists, and strings, the types of which might not be known until runtime.

procedure

(levenshtein/predicate a b pred) β†’ exact-nonnegative-integer?

  a : (or/c vector? string? list?)
  b : (or/c vector? string? list?)
  pred : (-> any/c any/c boolean?)
Calculates the Levenshtein Distance of two objects a and b,which are vectors, lists, or strings. a and b need not be of the same type. pred is the element equivalence predicate used.
> (levenshtein/predicate '#(#\A #\B #\C #\D) "aBXcD" char-ci=?)
  1

procedure

(levenshtein a b) β†’ exact-nonnegative-integer?

  a : (or/c vector? string? list?)
  b : (or/c vector? string? list?)
Calculate the levenshtein distance of a and b, in a similar manner as using levenshtein/predicate with equal? as the predicate.

> (define g '#(#\g #\u #\m #\b #\o))

> (levenshtein g "gambol")
  2
> (levenshtein g "dumbo")
  1
> (levenshtein g "umbrage")
  5

4 HistoryπŸ”—β„Ή

5 LegalπŸ”—β„Ή

Copyright 2004, 2005, 2009, 2016 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.