LP#
1893997: Did you mean? Single word, single class
This commit embodies the first stage of a larger search suggestion
project. The bulk of the code is dedicated to providing an
implementation of the SymSpell[1] algorithm as the basis for very fast
word similarity testing for spelling suggestions as well as alternate
search suggestions.
The native in-memory algorithm specifies a hash table lookup using a
runtime-created dictionary. As it is untenable to create and maintain a
separate in-memory data structure in the distributed environment that
OpenSRF provides, and adds significantly to the administrative complexity
of such a configuration, we instead maintain a dictionary in the
authoritative Postgres database used by Evergreen. This dictionary is
based directly on indexed terms used for general search, and aims to
avoid zero-hit suggestions wherever possible while imposing as little
performance impact as can be managed.
In addition to the core SymSpell similarity metric, Damerau-Levenshtein
edit distance, we provide Soundex, Trigram, and QWERTY Keyboard
similarity measures. The importance of these can be adjusted relative
to one another, or turned off individually.
Global term frequncey data is captured for each of the Evergreen search
classes and is used to help decide when to use specific terms, and which
terms to use as suggestions.
Suggestions are provide in the OPAC, including the staff-embedded OPAC
view, the KPAC, and the Angular catalog.
Later development will add the ability to perform mult-word and
phrase-oriented suggestions, to suggest searching requested terms in
other search classes, and provide local thesaurus values and exclusion
term lists.
[1] https://medium.com/@wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-
8701fcd87a5f
NOTE: This development adds two new Perl module dependencies, and will
therefore require a dependency update at upgrade time.
Signed-off-by: Mike Rylander <mrylander@gmail.com>
Signed-off-by: Gina Monti <gmonti@biblio.org>
Signed-off-by: Jason Stephenson <jason@sigio.com>
Signed-off-by: Galen Charlton <gmc@equinoxinitiative.org>