LP#1893997: Did you mean? Single word, single class
authorMike Rylander <mrylander@gmail.com>
Thu, 24 Sep 2020 16:48:41 +0000 (12:48 -0400)
committerGalen Charlton <gmc@equinoxinitiative.org>
Mon, 15 Mar 2021 13:41:04 +0000 (09:41 -0400)
commit692121da2bef54919433cfc740f52d848e445f54
tree43227e35753c1d1cba0742ad8d06b1773d05cca8
parent3bc270e7f08fa03361a99be8e722e24100bd7bd5
LP#1893997: Did you mean? Single word, single class

This commit embodies the first stage of a larger search suggestion
project.  The bulk of the code is dedicated to providing an
implementation of the SymSpell[1] algorithm as the basis for very fast
word similarity testing for spelling suggestions as well as alternate
search suggestions.

The native in-memory algorithm specifies a hash table lookup using a
runtime-created dictionary.  As it is untenable to create and maintain a
separate in-memory data structure in the distributed environment that
OpenSRF provides, and adds significantly to the administrative complexity
of such a configuration, we instead maintain a dictionary in the
authoritative Postgres database used by Evergreen.  This dictionary is
based directly on indexed terms used for general search, and aims to
avoid zero-hit suggestions wherever possible while imposing as little
performance impact as can be managed.

In addition to the core SymSpell similarity metric, Damerau-Levenshtein
edit distance, we provide Soundex, Trigram, and QWERTY Keyboard
similarity measures.  The importance of these can be adjusted relative
to one another, or turned off individually.

Global term frequncey data is captured for each of the Evergreen search
classes and is used to help decide when to use specific terms, and which
terms to use as suggestions.

Suggestions are provide in the OPAC, including the staff-embedded OPAC
view, the KPAC, and the Angular catalog.

Later development will add the ability to perform mult-word and
phrase-oriented suggestions, to suggest searching requested terms in
other search classes, and provide local thesaurus values and exclusion
term lists.

[1] https://medium.com/@wolfgarbe/1000x-faster-spelling-correction-algorithm-2012-8701fcd87a5f

NOTE: This development adds two new Perl module dependencies, and will
therefore require a dependency update at upgrade time.

Signed-off-by: Mike Rylander <mrylander@gmail.com>
Signed-off-by: Gina Monti <gmonti@biblio.org>
Signed-off-by: Jason Stephenson <jason@sigio.com>
Signed-off-by: Galen Charlton <gmc@equinoxinitiative.org>
20 files changed:
Open-ILS/src/eg2/src/app/staff/catalog/search-form.component.html
Open-ILS/src/extras/install/Makefile.debian-buster
Open-ILS/src/extras/install/Makefile.debian-jessie
Open-ILS/src/extras/install/Makefile.debian-stretch
Open-ILS/src/extras/install/Makefile.fedora
Open-ILS/src/extras/install/Makefile.ubuntu-bionic
Open-ILS/src/extras/install/Makefile.ubuntu-focal
Open-ILS/src/perlmods/lib/OpenILS/Application/Search/Biblio.pm
Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader/Search.pm
Open-ILS/src/sql/Pg/000.functions.general.sql
Open-ILS/src/sql/Pg/300.schema.staged_search.sql
Open-ILS/src/sql/Pg/950.data.seed-values.sql
Open-ILS/src/sql/Pg/create_database_extensions.sql
Open-ILS/src/sql/Pg/upgrade/XXXX.schema.symspell.sql [new file with mode: 0644]
Open-ILS/src/support-scripts/symspell-sideload.pl [new file with mode: 0755]
Open-ILS/src/templates-bootstrap/opac/parts/result/lowhits.tt2
Open-ILS/src/templates-bootstrap/opac/parts/searchbar.tt2
Open-ILS/src/templates/kpac/parts/searchbox.tt2
Open-ILS/src/templates/opac/parts/result/lowhits.tt2
Open-ILS/src/templates/opac/parts/searchbar.tt2