From 17e6420adc000b8a344519e4288d0d0cf09ae18a Mon Sep 17 00:00:00 2001 From: Lebbeous Fogle-Weekley Date: Fri, 27 Jan 2012 14:15:26 -0500 Subject: [PATCH] AutoSuggest Get an autocompleting dialog underneath basic search boxes. If activated, the dialog appears in both the JavaScript OPAC and the Template Toolkit OPAC. Mike Rylander and Bill Erickson largely designed and spec'ed out this feature, especially at the database level. Lebbeous Fogle-Weekley and Mike Rylander did the programming. How it works, for site administrators ------------------------------------- Once the code is installed, including database stored procedures, (re)ingesting your bibliographic records will populate two new tables needed by AutoSuggest: metabib.browse_entry and metabib.browse_entry_def_map. Yes, for now this means that the size of your database in per-record terms will be greater. The browse_entry data we generate is extremely similar to facet data, so it's possible that in the future we can blend them into the same thing at the database level, and save some space. To control what gets indexed in the metabib.browse_* tables, set the boolean 'browse_field' column for each row in config.metabib_field as desired Numbers 15 (keyword|keyword) and 16 (subject|complete) should generally be set to false. The config.metabib_class table has a new 'bouyant' column. Setting this to true means that when a user is typing in the OPAC search box with a search class selector set to, say, 'Subject', and the cmc row for 'Subject' has 'bouyant' set to true, the top suggestions will definitely come from the Subject fields of MARC records (if any). 'restrict' is like 'bouyant', only stronger. In that case suggestions /only/ come from the selected class. This is the idea behind the 'restrict' column (in both the config.metabib_class table and the config.metabib_field table): For any given class or field, if restrict is true AND the user at the OPAC has selected that same class or field from the search class dropdown, then autosuggestions will only come from that class or field in the record. You never want this column to be true for keyword, because we don't index keywords for browse/autosuggest purposes (we would get giant blobs of information as autosuggestions, and you don't want that (really)). You don't really want it to be true for most classes and fields. If the user sets the dropdown to "subject" and start typing "harry potter", the user really wants matches from the "title" class too (although other things being equal, matches from the "subject" class will rise to the top). If you have a speciality index, you *may* want to set restrict to true for that particular field. For a song title index, for example. To take advantage of this, you would also have to customize your OPAC to have an entry in the search class dropdown for 'title|songtitle' (or whatever's appropriate for your specialty index). This is easy to do and should be covered somewhere in documentation for both the JSPAC and the TPAC. The order of suggestions otherwise has mostly to do with how well what users have typed matches what's in your bib records, and the value of the weight column of the relevant config.metabib_field row. Examine the code if you're curious to know more or want to make adjustments. The master on/off switch for AutoSuggest is a new row in in config.global_flag named 'opac.use_autosuggest'. Set its 'enabled' column to false to turn off AutoSuggest. If you don't want your suggestions to be limited to whatever's OPAC visible at the search org, set the 'value' column to anything that does not contain the string 'opac_visible'. This could be a good idea if AutoSuggest seems slow on a large site. Turning this on means introducing Dojo in your TPAC. I tried to keep it minimal, for load time's sake. There's no fieldmapper or IDL stuff play. For now, turning this feature off will avoid loading Dojo in your TPAC. Managing the little stuff ------------------------- CSS classes that affect the look of suggestions are in this file for the TPAC: Open-ILS/web/css/skin/default/opac/style.css and in this file for the JSPAC: Open-ILS/web/opac/skin/default/css/layout.css At bottom, autosuggest is shaped by PostgreSQL's Full Text Search features. Some of the arguments that are passed to TS_RANK_CD() and TS_HEADLINE() are expoposed via options to the openils.widget.AutoSuggest and openils.AutoSuggestStore modules, which have comments. You can customize your OPAC files to invoke these with your desired options without touching much actual code. Signed-off-by: Lebbeous Fogle-Weekley Signed-off-by: Mike Rylander --- Open-ILS/examples/apache/eg_vhost.conf | 8 + Open-ILS/examples/fm_IDL.xml | 28 + .../src/perlmods/lib/OpenILS/WWW/AutoSuggest.pm | 206 ++++++ .../src/perlmods/lib/OpenILS/WWW/EGCatLoader.pm | 1 + .../perlmods/lib/OpenILS/WWW/EGCatLoader/Util.pm | 40 +- Open-ILS/src/sql/Pg/000.functions.general.sql | 5 + Open-ILS/src/sql/Pg/002.schema.config.sql | 21 +- Open-ILS/src/sql/Pg/030.schema.metabib.sql | 369 ++++++++++- Open-ILS/src/sql/Pg/950.data.seed-values.sql | 22 +- .../sql/Pg/upgrade/YYYY.schema.bib_autosuggest.sql | 711 +++++++++++++++++++++ Open-ILS/src/templates/opac/parts/base.tt2 | 8 +- Open-ILS/src/templates/opac/parts/header.tt2 | 11 + Open-ILS/src/templates/opac/parts/js.tt2 | 27 + .../src/templates/opac/parts/qtype_selector.tt2 | 2 +- Open-ILS/src/templates/opac/parts/searchbar.tt2 | 17 +- Open-ILS/web/css/skin/default/opac/style.css | 6 + Open-ILS/web/js/dojo/openils/AutoSuggestStore.js | 378 +++++++++++ Open-ILS/web/js/dojo/openils/Util.js | 10 + Open-ILS/web/js/dojo/openils/widget/AutoSuggest.js | 92 +++ Open-ILS/web/opac/skin/default/css/layout.css | 7 + Open-ILS/web/opac/skin/default/js/search_bar.js | 35 + docs/TechRef/AutoSuggest/README | 89 +++ 22 files changed, 2039 insertions(+), 54 deletions(-) create mode 100644 Open-ILS/src/perlmods/lib/OpenILS/WWW/AutoSuggest.pm create mode 100644 Open-ILS/src/sql/Pg/upgrade/YYYY.schema.bib_autosuggest.sql create mode 100644 Open-ILS/web/js/dojo/openils/AutoSuggestStore.js create mode 100644 Open-ILS/web/js/dojo/openils/widget/AutoSuggest.js create mode 100644 docs/TechRef/AutoSuggest/README diff --git a/Open-ILS/examples/apache/eg_vhost.conf b/Open-ILS/examples/apache/eg_vhost.conf index 82872dacba..a2ba0b2cf4 100644 --- a/Open-ILS/examples/apache/eg_vhost.conf +++ b/Open-ILS/examples/apache/eg_vhost.conf @@ -51,6 +51,14 @@ OSRFGatewayConfig /openils/conf/opensrf_core.xml allow from all +# Autosuggest for searches + + SetHandler perl-script + PerlHandler OpenILS::WWW::AutoSuggest + PerlSendHeader On + Allow from All + + # ---------------------------------------------------------------------------------- # Replace broken cover images with a transparent GIF by default # ---------------------------------------------------------------------------------- diff --git a/Open-ILS/examples/fm_IDL.xml b/Open-ILS/examples/fm_IDL.xml index c4c5daaaa2..1894f7759c 100644 --- a/Open-ILS/examples/fm_IDL.xml +++ b/Open-ILS/examples/fm_IDL.xml @@ -2063,6 +2063,8 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + + @@ -2090,6 +2092,9 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + + + @@ -2850,6 +2855,29 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA + + + + + + + + + + + + + + + + + + + + + + + SELECT * FROM metabib.author_field_entry diff --git a/Open-ILS/src/perlmods/lib/OpenILS/WWW/AutoSuggest.pm b/Open-ILS/src/perlmods/lib/OpenILS/WWW/AutoSuggest.pm new file mode 100644 index 0000000000..3e4d18247d --- /dev/null +++ b/Open-ILS/src/perlmods/lib/OpenILS/WWW/AutoSuggest.pm @@ -0,0 +1,206 @@ +package OpenILS::WWW::AutoSuggest; + +use strict; +use warnings; + +use Apache2::Log; +use Apache2::Const -compile => qw( + OK HTTP_NOT_ACCEPTABLE HTTP_INTERNAL_SERVER_ERROR :log +); +use XML::LibXML; +use Text::Glob; +use CGI qw(:all -utf8); + +use OpenSRF::Utils::JSON; +use OpenILS::Utils::CStoreEditor qw/:funcs/; + +# BEGIN package globals + +# We'll probably never need this fanciness for autosuggest, but +# you can add handlers for different requested content-types here, and +# you can weight them to control what matches requests for things like +# 'application/*' + +my $_output_handler_dispatch = { + "application/xml" => { + "prio" => 0, + "code" => sub { + my ($r, $data) = @_; + $r->content_type("application/xml; charset=utf-8"); + print suggestions_to_xml($data); + return Apache2::Const::OK; + } + }, + "application/json" => { + "prio" => 1, + "code" => sub { + my ($r, $data) = @_; + $r->content_type("application/json; charset=utf-8"); + print suggestions_to_json($data); + return Apache2::Const::OK; + } + } +}; + +my @_output_handler_types = sort { + $_output_handler_dispatch->{$a}->{prio} <=> + $_output_handler_dispatch->{$b}->{prio} +} keys %$_output_handler_dispatch; + +# END package globals + +# Given a string such as a user might type into a search box, prepare +# it for to_tsquery(). See +# http://www.postgresql.org/docs/9.0/static/textsearch-controls.html +sub prepare_for_tsquery { + my ($str) = shift; + + $str =~ s/[^\w\s]/ /ig; + $str .= ":*" unless $str =~ /\s$/; + + return join(" & ", split(/\s+/, $str)); +} + +# The third argument to our stored procedure, metabib.suggest_browse_entries(), +# is passed through directly to ts_headline() as the 'options' arugment. +sub prepare_headline_opts { + my ($css_prefix, $highlight_min, $highlight_max, $short_word_length) = @_; + + $css_prefix =~ s/[^\w]//g; + + my @parts = ( + qq{StartSel=""}, + "StopSel=" + ); + + push @parts, "MinWords=$highlight_min" if $highlight_min > 0; + push @parts, "MaxWords=$highlight_max" if $highlight_max > 0; + push @parts, "ShortWord=$short_word_length" if defined $short_word_length; + + return join(", ", @parts); +} + +# Get raw autosuggest data (rows returned from a stored procedure) from the DB. +sub get_suggestions { + my $editor = shift; + my $query = shift; + my $search_class = shift; + my $org_unit = shift; + my $css_prefix = shift || 'oils_AS'; + my $highlight_min = int(shift || 0); + my $highlight_max = int(shift || 0); + my $short_word_length = shift; + + my $normalization = int(shift || 14); # 14 is not totally arbitrary. + # See http://www.postgresql.org/docs/9.0/static/textsearch-controls.html#TEXTSEARCH-RANKING + + my $limit = int(shift || 10); + + $limit = 10 unless $limit > 0; + + my $headline_opts = prepare_headline_opts( + $css_prefix, $highlight_min, $highlight_max, + defined $short_word_length ? int($short_word_length) : undef + ); + + return $editor->json_query({ + "from" => [ + "metabib.suggest_browse_entries", + prepare_for_tsquery($query), + $search_class, + $headline_opts, + $org_unit, + $limit, + $normalization + ] + }); +} + +sub suggestions_to_xml { + my ($suggestions) = @_; + + my $dom = new XML::LibXML::Document("1.0", "UTF-8"); + my $as = $dom->createElement("as"); + $dom->setDocumentElement($as); + + foreach (@$suggestions) { + my $val = $dom->createElement("val"); + $val->setAttribute("term", $_->{value}); + $val->setAttribute("field", $_->{field}); + $val->appendText($_->{match}); + $as->addChild($val); + } + + # XML::LibXML::Document::toString() returns an encoded byte string, which + # is why we don't need to binmode STDOUT, ':utf8'. + return $dom->toString(); +} + +sub suggestions_to_json { + my ($suggestions) = @_; + + return OpenSRF::Utils::JSON->perl2JSON({ + "val" => [ + map { + +{ term => $_->{value}, field => $_->{field}, + match => $_->{match} } + } @$suggestions + ] + }); +} + +# Given data and the Apache request object, this sub picks a sub from a +# dispatch table based on the list of content-type encodings that the client +# has indicated it will accept, and calls that sub, which will deliver +# a response of appropriately encoded data. +sub output_handler { + my ($r, $data) = @_; + + foreach my $media_range (split /,/, $r->headers_in->{Accept}) { + $media_range =~ s/;.+$//; # keep type, subtype. lose parameters. + + my ($match) = grep { + Text::Glob::match_glob($media_range, $_) + } @_output_handler_types; + + if ($match) { + return $_output_handler_dispatch->{$match}{code}->($r, $data); + } + } + + return Apache2::Const::HTTP_NOT_ACCEPTABLE; +} + +sub handler { + my $r = shift; + my $cgi = new CGI; + + my $editor = new_editor; + my $suggestions = get_suggestions( + $editor, + map { scalar($cgi->param($_)) } qw( + query + search_class + org_unit + css_prefix + highlight_min + highlight_max + short_word_length + normalization + limit + ) + ); + + if (not $suggestions) { + $r->log->error( + "get_suggestions() failed: " . $editor->die_event->{textcode} + ); + return Apache2::Const::HTTP_INTERNAL_SERVER_ERROR; + } + + $editor->disconnect; + + return output_handler($r, $suggestions); +} + +1; diff --git a/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader.pm b/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader.pm index 0a1fd65a4d..a0413c6eab 100644 --- a/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader.pm +++ b/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader.pm @@ -254,6 +254,7 @@ sub load_common { $ctx->{search_ou} = $self->_get_search_lib(); $self->staff_saved_searches_set_expansion_state if $ctx->{is_staff}; + $self->load_eg_cache_hash; return Apache2::Const::OK; } diff --git a/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader/Util.pm b/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader/Util.pm index 5ea922f596..f10ccb796c 100644 --- a/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader/Util.pm +++ b/Open-ILS/src/perlmods/lib/OpenILS/WWW/EGCatLoader/Util.pm @@ -1,6 +1,7 @@ package OpenILS::WWW::EGCatLoader; use strict; use warnings; use Apache2::Const -compile => qw(OK DECLINED FORBIDDEN HTTP_INTERNAL_SERVER_ERROR REDIRECT HTTP_BAD_REQUEST); +use File::Spec; use OpenSRF::Utils::Logger qw/$logger/; use OpenILS::Utils::CStoreEditor qw/:funcs/; use OpenILS::Utils::Fieldmapper; @@ -13,7 +14,8 @@ our %cache = ( # cached data map => {aou => {}}, # others added dynamically as needed list => {}, search => {}, - org_settings => {} + org_settings => {}, + eg_cache_hash => undef ); sub init_ro_object_cache { @@ -302,4 +304,40 @@ sub _get_search_lib { return $self->ctx->{aou_tree}->()->id; } +# This is defensively coded since we don't do much manual reading from the +# file system in this module. +sub load_eg_cache_hash { + my ($self) = @_; + + # just a context helper + $self->ctx->{eg_cache_hash} = sub { return $cache{eg_cache_hash}; }; + + # Need to actually load the value? If already done, move on. + return if defined $cache{eg_cache_hash}; + + # In this way even if we fail, we won't slow things down by ever trying + # again within this Apache process' lifetime. + $cache{eg_cache_hash} = 0; + + my $path = File::Spec->catfile( + $self->apache->document_root, "eg_cache_hash" + ); + + if (not open FH, "<$path") { + $self->apache->log->warn("error opening $path : $!"); + return; + } else { + my $buf; + my $rv = read FH, $buf, 64; # defensive + close FH; + + if (not defined $rv) { # error + $self->apache->log->warn("error reading $path : $!"); + } elsif ($rv > 0) { # no error, something read + chomp $buf; + $cache{eg_cache_hash} = $buf; + } + } +} + 1; diff --git a/Open-ILS/src/sql/Pg/000.functions.general.sql b/Open-ILS/src/sql/Pg/000.functions.general.sql index 40ab727269..6a46d1b40c 100644 --- a/Open-ILS/src/sql/Pg/000.functions.general.sql +++ b/Open-ILS/src/sql/Pg/000.functions.general.sql @@ -28,6 +28,11 @@ CREATE OR REPLACE FUNCTION evergreen.xml_escape(str TEXT) RETURNS text AS $$ '>', '>'); $$ LANGUAGE SQL IMMUTABLE; +CREATE OR REPLACE FUNCTION evergreen.regexp_split_to_array(TEXT, TEXT) +RETURNS TEXT[] AS $$ + return encode_array_literal([split $_[1], $_[0]]); +$$ LANGUAGE PLPERLU STRICT IMMUTABLE; + -- Provide a named type for patching functions CREATE TYPE evergreen.patch AS (patch TEXT); diff --git a/Open-ILS/src/sql/Pg/002.schema.config.sql b/Open-ILS/src/sql/Pg/002.schema.config.sql index f596fa28ba..ce262201fd 100644 --- a/Open-ILS/src/sql/Pg/002.schema.config.sql +++ b/Open-ILS/src/sql/Pg/002.schema.config.sql @@ -165,8 +165,10 @@ INSERT INTO config.biblio_fingerprint (name, xpath, format, first_word) ); CREATE TABLE config.metabib_class ( - name TEXT PRIMARY KEY, - label TEXT NOT NULL UNIQUE + name TEXT PRIMARY KEY, + label TEXT NOT NULL UNIQUE, + bouyant BOOL DEFAULT FALSE NOT NULL, + restrict BOOL DEFAULT FALSE NOT NULL ); CREATE TABLE config.metabib_field ( @@ -179,7 +181,10 @@ CREATE TABLE config.metabib_field ( format TEXT NOT NULL REFERENCES config.xml_transform (name) DEFAULT 'mods33', search_field BOOL NOT NULL DEFAULT TRUE, facet_field BOOL NOT NULL DEFAULT FALSE, - facet_xpath TEXT + browse_field BOOL NOT NULL DEFAULT TRUE, + browse_xpath TEXT, + facet_xpath TEXT, + restrict BOOL DEFAULT FALSE NOT NULL ); COMMENT ON TABLE config.metabib_field IS $$ XPath used for record indexing ingest @@ -802,12 +807,14 @@ BEGIN END LOOP; END IF; - IF REGEXP_REPLACE(VERSION(),E'^.+?(\\d+\\.\\d+).*?$',E'\\1')::FLOAT > 8.2 THEN - NEW.index_vector = to_tsvector((TG_ARGV[0])::regconfig, value); - ELSE - NEW.index_vector = to_tsvector(TG_ARGV[0], value); + IF TG_TABLE_NAME::TEXT ~ 'browse_entry$' THEN + value := ARRAY_TO_STRING( + evergreen.regexp_split_to_array(value, E'\\W+'), ' ' + ); END IF; + NEW.index_vector = to_tsvector((TG_ARGV[0])::regconfig, value); + RETURN NEW; END; $$ LANGUAGE PLPGSQL; diff --git a/Open-ILS/src/sql/Pg/030.schema.metabib.sql b/Open-ILS/src/sql/Pg/030.schema.metabib.sql index 0ea948ea78..4cc72c0266 100644 --- a/Open-ILS/src/sql/Pg/030.schema.metabib.sql +++ b/Open-ILS/src/sql/Pg/030.schema.metabib.sql @@ -135,6 +135,25 @@ CREATE INDEX metabib_facet_entry_field_idx ON metabib.facet_entry (field); CREATE INDEX metabib_facet_entry_value_idx ON metabib.facet_entry (SUBSTRING(value,1,1024)); CREATE INDEX metabib_facet_entry_source_idx ON metabib.facet_entry (source); +CREATE TABLE metabib.browse_entry ( + id BIGSERIAL PRIMARY KEY, + value TEXT unique, + index_vector tsvector +); +CREATE INDEX metabib_browse_entry_index_vector_idx ON metabib.browse_entry USING GIST (index_vector); +CREATE TRIGGER metabib_browse_entry_fti_trigger + BEFORE INSERT OR UPDATE ON metabib.browse_entry + FOR EACH ROW EXECUTE PROCEDURE oils_tsearch2('keyword'); + + +CREATE TABLE metabib.browse_entry_def_map ( + id BIGSERIAL PRIMARY KEY, + entry BIGINT REFERENCES metabib.browse_entry (id), + def INT REFERENCES config.metabib_field (id), + source BIGINT REFERENCES biblio.record_entry (id) +); + + CREATE OR REPLACE FUNCTION metabib.facet_normalize_trigger () RETURNS TRIGGER AS $$ DECLARE normalizer RECORD; @@ -301,10 +320,14 @@ CREATE INDEX metabib_metarecord_source_map_source_record_idx ON metabib.metareco CREATE TYPE metabib.field_entry_template AS ( field_class TEXT, field INT, + facet_field BOOL, + search_field BOOL, + browse_field BOOL, source BIGINT, value TEXT ); + CREATE OR REPLACE FUNCTION biblio.extract_metabib_field_entry ( rid BIGINT, default_joiner TEXT ) RETURNS SETOF metabib.field_entry_template AS $func$ DECLARE bib biblio.record_entry%ROWTYPE; @@ -315,6 +338,7 @@ DECLARE xml_node TEXT; xml_node_list TEXT[]; facet_text TEXT; + browse_text TEXT; raw_text TEXT; curr_text TEXT; joiner TEXT := default_joiner; -- XXX will index defs supply a joiner? @@ -372,6 +396,25 @@ BEGIN raw_text := COALESCE(raw_text,'') || curr_text; + -- autosuggest/metabib.browse_entry + IF idx.browse_field THEN + + IF idx.browse_xpath IS NOT NULL AND idx.browse_xpath <> '' THEN + browse_text := oils_xpath_string( idx.browse_xpath, xml_node, joiner, ARRAY[ARRAY[xfrm.prefix, xfrm.namespace_uri]] ); + ELSE + browse_text := curr_text; + END IF; + + output_row.field_class = idx.field_class; + output_row.field = idx.id; + output_row.source = rid; + output_row.value = BTRIM(REGEXP_REPLACE(browse_text, E'\\s+', ' ', 'g')); + + output_row.browse_field = TRUE; + RETURN NEXT output_row; + output_row.browse_field = FALSE; + END IF; + -- insert raw node text for faceting IF idx.facet_field THEN @@ -386,7 +429,9 @@ BEGIN output_row.source = rid; output_row.value = BTRIM(REGEXP_REPLACE(facet_text, E'\\s+', ' ', 'g')); + output_row.facet_field = TRUE; RETURN NEXT output_row; + output_row.facet_field = FALSE; END IF; END LOOP; @@ -400,12 +445,77 @@ BEGIN output_row.source = rid; output_row.value = BTRIM(REGEXP_REPLACE(raw_text, E'\\s+', ' ', 'g')); + output_row.search_field = TRUE; RETURN NEXT output_row; END IF; END LOOP; END; + +$func$ LANGUAGE PLPGSQL; + +CREATE OR REPLACE FUNCTION metabib.reingest_metabib_field_entries( bib_id BIGINT ) RETURNS VOID AS $func$ +DECLARE + fclass RECORD; + ind_data metabib.field_entry_template%ROWTYPE; + mbe_row metabib.browse_entry%ROWTYPE; + mbe_id BIGINT; +BEGIN + PERFORM * FROM config.internal_flag WHERE name = 'ingest.assume_inserts_only' AND enabled; + IF NOT FOUND THEN + FOR fclass IN SELECT * FROM config.metabib_class LOOP + -- RAISE NOTICE 'Emptying out %', fclass.name; + EXECUTE $$DELETE FROM metabib.$$ || fclass.name || $$_field_entry WHERE source = $$ || bib_id; + END LOOP; + DELETE FROM metabib.facet_entry WHERE source = bib_id; + DELETE FROM metabib.browse_entry_def_map WHERE source = bib_id; + END IF; + + FOR ind_data IN SELECT * FROM biblio.extract_metabib_field_entry( bib_id ) LOOP + IF ind_data.field < 0 THEN + ind_data.field = -1 * ind_data.field; + END IF; + + IF ind_data.facet_field THEN + INSERT INTO metabib.facet_entry (field, source, value) + VALUES (ind_data.field, ind_data.source, ind_data.value); + END IF; + + IF ind_data.browse_field THEN + -- A caveat about this SELECT: this should take care of replacing + -- old mbe rows when data changes, but not if normalization (by + -- which I mean specifically the output of + -- evergreen.oils_tsearch2()) changes. It may or may not be + -- expensive to add a comparison of index_vector to index_vector + -- to the WHERE clause below. + SELECT INTO mbe_row * FROM metabib.browse_entry WHERE value = ind_data.value; + IF FOUND THEN + mbe_id := mbe_row.id; + ELSE + INSERT INTO metabib.browse_entry (value) VALUES + (metabib.browse_normalize(ind_data.value, ind_data.field)); + mbe_id := CURRVAL('metabib.browse_entry_id_seq'::REGCLASS); + END IF; + + INSERT INTO metabib.browse_entry_def_map (entry, def, source) + VALUES (mbe_id, ind_data.field, ind_data.source); + END IF; + + IF ind_data.search_field THEN + EXECUTE $$ + INSERT INTO metabib.$$ || ind_data.field_class || $$_field_entry (field, source, value) + VALUES ($$ || + quote_literal(ind_data.field) || $$, $$ || + quote_literal(ind_data.source) || $$, $$ || + quote_literal(ind_data.value) || + $$);$$; + END IF; + + END LOOP; + + RETURN; +END; $func$ LANGUAGE PLPGSQL; -- default to a space joiner @@ -736,41 +846,6 @@ BEGIN END; $func$ LANGUAGE PLPGSQL; -CREATE OR REPLACE FUNCTION metabib.reingest_metabib_field_entries( bib_id BIGINT ) RETURNS VOID AS $func$ -DECLARE - fclass RECORD; - ind_data metabib.field_entry_template%ROWTYPE; -BEGIN - PERFORM * FROM config.internal_flag WHERE name = 'ingest.assume_inserts_only' AND enabled; - IF NOT FOUND THEN - FOR fclass IN SELECT * FROM config.metabib_class LOOP - -- RAISE NOTICE 'Emptying out %', fclass.name; - EXECUTE $$DELETE FROM metabib.$$ || fclass.name || $$_field_entry WHERE source = $$ || bib_id; - END LOOP; - DELETE FROM metabib.facet_entry WHERE source = bib_id; - END IF; - - FOR ind_data IN SELECT * FROM biblio.extract_metabib_field_entry( bib_id ) LOOP - IF ind_data.field < 0 THEN - ind_data.field = -1 * ind_data.field; - INSERT INTO metabib.facet_entry (field, source, value) - VALUES (ind_data.field, ind_data.source, ind_data.value); - ELSE - EXECUTE $$ - INSERT INTO metabib.$$ || ind_data.field_class || $$_field_entry (field, source, value) - VALUES ($$ || - quote_literal(ind_data.field) || $$, $$ || - quote_literal(ind_data.source) || $$, $$ || - quote_literal(ind_data.value) || - $$);$$; - END IF; - - END LOOP; - - RETURN; -END; -$func$ LANGUAGE PLPGSQL; - CREATE OR REPLACE FUNCTION biblio.extract_located_uris( bib_id BIGINT, marcxml TEXT, editor_id INT ) RETURNS VOID AS $func$ DECLARE uris TEXT[]; @@ -965,6 +1040,7 @@ BEGIN DELETE FROM metabib.record_attr WHERE id = NEW.id; -- Kill the attrs hash, useless on deleted records DELETE FROM authority.bib_linking WHERE bib = NEW.id; -- Avoid updating fields in bibs that are no longer visible DELETE FROM biblio.peer_bib_copy_map WHERE peer_record = NEW.id; -- Separate any multi-homed items + DELETE FROM metabib.browse_entry_def_map WHERE source = NEW.id; -- Don't auto-suggest deleted bibs RETURN NEW; -- and we're done END IF; @@ -1109,4 +1185,227 @@ BEGIN END; $func$ LANGUAGE PLPGSQL; +CREATE OR REPLACE FUNCTION metabib.browse_normalize(facet_text TEXT, mapped_field INT) RETURNS TEXT AS $$ +DECLARE + normalizer RECORD; +BEGIN + + FOR normalizer IN + SELECT n.func AS func, + n.param_count AS param_count, + m.params AS params + FROM config.index_normalizer n + JOIN config.metabib_field_index_norm_map m ON (m.norm = n.id) + WHERE m.field = mapped_field AND m.pos < 0 + ORDER BY m.pos LOOP + + EXECUTE 'SELECT ' || normalizer.func || '(' || + quote_literal( facet_text ) || + CASE + WHEN normalizer.param_count > 0 + THEN ',' || REPLACE(REPLACE(BTRIM(normalizer.params,'[]'),E'\'',E'\\\''),E'"',E'\'') + ELSE '' + END || + ')' INTO facet_text; + + END LOOP; + + RETURN facet_text; +END; + +$$ LANGUAGE PLPGSQL; + + +-- This mimics a specific part of QueryParser, turning the first part of a +-- classed search (search_class) into a set of classes and possibly fields. +-- search_class might look like "author" or "title|proper" or "ti|uniform" +-- or "au" or "au|corporate|personal" or anything like that, where the first +-- element of the list you get by separating on the "|" character is either +-- a registered class (config.metabib_class) or an alias +-- (config.metabib_search_alias), and the rest of any such elements are +-- fields (config.metabib_field). +CREATE OR REPLACE + FUNCTION metabib.search_class_to_registered_components(search_class TEXT) + RETURNS SETOF RECORD AS $func$ +DECLARE + search_parts TEXT[]; + field_name TEXT; + search_part_count INTEGER; + rec RECORD; + registered_class config.metabib_class%ROWTYPE; + registered_alias config.metabib_search_alias%ROWTYPE; + registered_field config.metabib_field%ROWTYPE; +BEGIN + search_parts := REGEXP_SPLIT_TO_ARRAY(search_class, E'\\|'); + + search_part_count := ARRAY_LENGTH(search_parts, 1); + IF search_part_count = 0 THEN + RETURN; + ELSE + SELECT INTO registered_class + * FROM config.metabib_class WHERE name = search_parts[1]; + IF FOUND THEN + IF search_part_count < 2 THEN -- all fields + rec := (registered_class.name, NULL::INTEGER); + RETURN NEXT rec; + RETURN; -- done + END IF; + FOR field_name IN SELECT * + FROM UNNEST(search_parts[2:search_part_count]) LOOP + SELECT INTO registered_field + * FROM config.metabib_field + WHERE name = field_name AND + field_class = registered_class.name; + IF FOUND THEN + rec := (registered_class.name, registered_field.id); + RETURN NEXT rec; + END IF; + END LOOP; + ELSE + -- maybe we have an alias? + SELECT INTO registered_alias + * FROM config.metabib_search_alias WHERE alias=search_parts[1]; + IF NOT FOUND THEN + RETURN; + ELSE + IF search_part_count < 2 THEN -- return w/e the alias says + rec := ( + registered_alias.field_class, registered_alias.field + ); + RETURN NEXT rec; + RETURN; -- done + ELSE + FOR field_name IN SELECT * + FROM UNNEST(search_parts[2:search_part_count]) LOOP + SELECT INTO registered_field + * FROM config.metabib_field + WHERE name = field_name AND + field_class = registered_alias.field_class; + IF FOUND THEN + rec := ( + registered_alias.field_class, + registered_field.id + ); + RETURN NEXT rec; + END IF; + END LOOP; + END IF; + END IF; + END IF; + END IF; +END; +$func$ LANGUAGE PLPGSQL; + + +CREATE OR REPLACE + FUNCTION metabib.suggest_browse_entries( + query_text TEXT, -- 'foo' or 'foo & ba:*',ready for to_tsquery() + search_class TEXT, -- 'alias' or 'class' or 'class|field..', etc + headline_opts TEXT, -- markup options for ts_headline() + visibility_org INTEGER,-- null if you don't want opac visibility test + query_limit INTEGER,-- use in LIMIT clause of interal query + normalization INTEGER -- argument to TS_RANK_CD() + ) RETURNS TABLE ( + value TEXT, -- plain + field INTEGER, + bouyant_and_class_match BOOL, + field_match BOOL, + field_weight INTEGER, + rank REAL, + bouyant BOOL, + match TEXT -- marked up + ) AS $func$ +DECLARE + query TSQUERY; + opac_visibility_join TEXT; + search_class_join TEXT; + r_fields RECORD; +BEGIN + query := TO_TSQUERY('keyword', query_text); + + IF visibility_org IS NOT NULL THEN + opac_visibility_join := ' + JOIN asset.opac_visible_copies aovc ON ( + aovc.record = mbedm.source AND + aovc.circ_lib IN (SELECT id FROM actor.org_unit_descendants($4)) + )'; + ELSE + opac_visibility_join := ''; + END IF; + + -- The following determines whether we only provide suggestsons matching + -- the user's selected search_class, or whether we show other suggestions + -- too. The reason for MIN() is that for search_classes like + -- 'title|proper|uniform' you would otherwise get multiple rows. The + -- implication is that if title as a class doesn't have restrict, + -- nor does the proper field, but the uniform field does, you're going + -- to get 'false' for your overall evaluation of 'should we restrict?' + -- To invert that, change from MIN() to MAX(). + + SELECT + INTO r_fields + MIN(cmc.restrict::INT) AS restrict_class, + MIN(cmf.restrict::INT) AS restrict_field + FROM metabib.search_class_to_registered_components(search_class) + AS _registered (field_class TEXT, field INT) + JOIN + config.metabib_class cmc ON (cmc.name = _registered.field_class) + LEFT JOIN + config.metabib_field cmf ON (cmf.id = _registered.field); + + -- evaluate 'should we restrict?' + IF r_fields.restrict_field::BOOL OR r_fields.restrict_class::BOOL THEN + search_class_join := ' + JOIN + metabib.search_class_to_registered_components($2) + AS _registered (field_class TEXT, field INT) ON ( + (_registered.field IS NULL AND + _registered.field_class = cmf.field_class) OR + (_registered.field = cmf.id) + ) + '; + ELSE + search_class_join := ' + LEFT JOIN + metabib.search_class_to_registered_components($2) + AS _registered (field_class TEXT, field INT) ON ( + _registered.field_class = cmc.name + ) + '; + END IF; + + RETURN QUERY EXECUTE 'SELECT *, TS_HEADLINE(value, $1, $3) FROM (SELECT DISTINCT + mbe.value, + cmf.id, + cmc.bouyant AND _registered.field_class IS NOT NULL, + _registered.field = cmf.id, + cmf.weight, + TS_RANK_CD(mbe.index_vector, $1, $6), + cmc.bouyant + FROM metabib.browse_entry_def_map mbedm + JOIN metabib.browse_entry mbe ON (mbe.id = mbedm.entry) + JOIN config.metabib_field cmf ON (cmf.id = mbedm.def) + JOIN config.metabib_class cmc ON (cmf.field_class = cmc.name) + ' || search_class_join || opac_visibility_join || + ' WHERE $1 @@ mbe.index_vector + ORDER BY 3 DESC, 4 DESC NULLS LAST, 5 DESC, 6 DESC, 7 DESC, 1 ASC + LIMIT $5) x + ORDER BY 3 DESC, 4 DESC NULLS LAST, 5 DESC, 6 DESC, 7 DESC, 1 ASC + ' -- sic, repeat the order by clause in the outer select too + USING + query, search_class, headline_opts, + visibility_org, query_limit, normalization + ; + + -- sort order: + -- bouyant AND chosen class = match class + -- chosen field = match field + -- field weight + -- rank + -- bouyancy + -- value itself + +END; +$func$ LANGUAGE PLPGSQL; + COMMIT; diff --git a/Open-ILS/src/sql/Pg/950.data.seed-values.sql b/Open-ILS/src/sql/Pg/950.data.seed-values.sql index 918dce1a39..79f88d53fd 100644 --- a/Open-ILS/src/sql/Pg/950.data.seed-values.sql +++ b/Open-ILS/src/sql/Pg/950.data.seed-values.sql @@ -141,10 +141,10 @@ INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath, (14, 'subject', 'topic', oils_i18n_gettext(14, 'Topic Subject', 'cmf', 'label'), 'mods32', $$//mods32:mods/mods32:subject/mods32:topic$$, TRUE ); --INSERT INTO config.metabib_field ( id, field_class, name, format, xpath ) VALUES -- ( id, field_class, name, xpath ) VALUES ( 'subject', 'genre', 'mods32', $$//mods32:mods/mods32:genre$$ ); -INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath ) VALUES - (15, 'keyword', 'keyword', oils_i18n_gettext(15, 'General Keywords', 'cmf', 'label'), 'mods32', $$//mods32:mods/*[not(local-name()='originInfo')]$$ ); -- /* to fool vim */; -INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath ) VALUES - (16, 'subject', 'complete', oils_i18n_gettext(16, 'All Subjects', 'cmf', 'label'), 'mods32', $$//mods32:mods/mods32:subject$$ ); +INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath, browse_field ) VALUES + (15, 'keyword', 'keyword', oils_i18n_gettext(15, 'General Keywords', 'cmf', 'label'), 'mods32', $$//mods32:mods/*[not(local-name()='originInfo')]$$, FALSE ); -- /* to fool vim */; +INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath, browse_field ) VALUES + (16, 'subject', 'complete', oils_i18n_gettext(16, 'All Subjects', 'cmf', 'label'), 'mods32', $$//mods32:mods/mods32:subject$$, FALSE ); INSERT INTO config.metabib_field ( id, field_class, name, label, format, xpath ) VALUES (17, 'identifier', 'accession', oils_i18n_gettext(17, 'Accession Number', 'cmf', 'label'), 'marcxml', $$//marc:controlfield[@tag='001']$$ ); @@ -8674,6 +8674,20 @@ INSERT INTO config.global_flag (name, label, enabled) FALSE ); +INSERT INTO config.global_flag (name, label, value, enabled) + VALUES ( + 'opac.use_autosuggest', + oils_i18n_gettext( + 'opac.use_autosuggest', + 'OPAC: Show auto-completing suggestions dialog under basic search box (put ''opac_visible'' into the value field to limit suggestions to OPAC-visible items, or blank the field for a possible performance improvement)', + 'cgf', + 'label' + ), + 'opac_visible', + TRUE + ); + + INSERT INTO config.usr_setting_type (name,opac_visible,label,description,datatype) VALUES ( 'history.circ.retention_age', diff --git a/Open-ILS/src/sql/Pg/upgrade/YYYY.schema.bib_autosuggest.sql b/Open-ILS/src/sql/Pg/upgrade/YYYY.schema.bib_autosuggest.sql new file mode 100644 index 0000000000..ce652b51a0 --- /dev/null +++ b/Open-ILS/src/sql/Pg/upgrade/YYYY.schema.bib_autosuggest.sql @@ -0,0 +1,711 @@ +BEGIN; + +SELECT evergreen.upgrade_deps_block_check('YYYY', :eg_version); + +INSERT INTO config.global_flag (name, label, enabled, value) VALUES ( + 'opac.use_autosuggest', + 'OPAC: Show auto-completing suggestions dialog under basic search box (put ''opac_visible'' into the value field to limit suggestions to OPAC-visible items, or blank the field for a possible performance improvement)', + TRUE, + 'opac_visible' +); + +CREATE TABLE metabib.browse_entry ( + id BIGSERIAL PRIMARY KEY, + value TEXT unique, + index_vector tsvector +); +CREATE INDEX metabib_browse_entry_index_vector_idx ON metabib.browse_entry USING GIST (index_vector); +CREATE TRIGGER metabib_browse_entry_fti_trigger + BEFORE INSERT OR UPDATE ON metabib.browse_entry + FOR EACH ROW EXECUTE PROCEDURE oils_tsearch2('keyword'); + + +CREATE TABLE metabib.browse_entry_def_map ( + id BIGSERIAL PRIMARY KEY, + entry BIGINT REFERENCES metabib.browse_entry (id), + def INT REFERENCES config.metabib_field (id), + source BIGINT REFERENCES biblio.record_entry (id) +); + +ALTER TABLE config.metabib_field ADD COLUMN browse_field BOOLEAN DEFAULT TRUE NOT NULL; +ALTER TABLE config.metabib_field ADD COLUMN browse_xpath TEXT; + +ALTER TABLE config.metabib_class ADD COLUMN bouyant BOOLEAN DEFAULT FALSE NOT NULL; +ALTER TABLE config.metabib_class ADD COLUMN restrict BOOLEAN DEFAULT FALSE NOT NULL; +ALTER TABLE config.metabib_field ADD COLUMN restrict BOOLEAN DEFAULT FALSE NOT NULL; + +-- one good exception to default true: +UPDATE config.metabib_field + SET browse_field = FALSE + WHERE (field_class = 'keyword' AND name = 'keyword') OR + (field_class = 'subject' AND name = 'complete'); + +-- AFTER UPDATE OR INSERT trigger for biblio.record_entry +-- We're only touching it here to add a DELETE statement to the IF NEW.deleted +-- block. + +CREATE OR REPLACE FUNCTION biblio.indexing_ingest_or_delete () RETURNS TRIGGER AS $func$ +DECLARE + transformed_xml TEXT; + prev_xfrm TEXT; + normalizer RECORD; + xfrm config.xml_transform%ROWTYPE; + attr_value TEXT; + new_attrs HSTORE := ''::HSTORE; + attr_def config.record_attr_definition%ROWTYPE; +BEGIN + + IF NEW.deleted IS TRUE THEN -- If this bib is deleted + DELETE FROM metabib.metarecord_source_map WHERE source = NEW.id; -- Rid ourselves of the search-estimate-killing linkage + DELETE FROM metabib.record_attr WHERE id = NEW.id; -- Kill the attrs hash, useless on deleted records + DELETE FROM authority.bib_linking WHERE bib = NEW.id; -- Avoid updating fields in bibs that are no longer visible + DELETE FROM biblio.peer_bib_copy_map WHERE peer_record = NEW.id; -- Separate any multi-homed items + DELETE FROM metabib.browse_entry_def_map WHERE source = NEW.id; -- Don't auto-suggest deleted bibs + RETURN NEW; -- and we're done + END IF; + + IF TG_OP = 'UPDATE' THEN -- re-ingest? + PERFORM * FROM config.internal_flag WHERE name = 'ingest.reingest.force_on_same_marc' AND enabled; + + IF NOT FOUND AND OLD.marc = NEW.marc THEN -- don't do anything if the MARC didn't change + RETURN NEW; + END IF; + END IF; + + -- Record authority linking + PERFORM * FROM config.internal_flag WHERE name = 'ingest.disable_authority_linking' AND enabled; + IF NOT FOUND THEN + PERFORM biblio.map_authority_linking( NEW.id, NEW.marc ); + END IF; + + -- Flatten and insert the mfr data + PERFORM * FROM config.internal_flag WHERE name = 'ingest.disable_metabib_full_rec' AND enabled; + IF NOT FOUND THEN + PERFORM metabib.reingest_metabib_full_rec(NEW.id); + + -- Now we pull out attribute data, which is dependent on the mfr for all but XPath-based fields + PERFORM * FROM config.internal_flag WHERE name = 'ingest.disable_metabib_rec_descriptor' AND enabled; + IF NOT FOUND THEN + FOR attr_def IN SELECT * FROM config.record_attr_definition ORDER BY format LOOP + + IF attr_def.tag IS NOT NULL THEN -- tag (and optional subfield list) selection + SELECT ARRAY_TO_STRING(ARRAY_ACCUM(value), COALESCE(attr_def.joiner,' ')) INTO attr_value + FROM (SELECT * FROM metabib.full_rec ORDER BY tag, subfield) AS x + WHERE record = NEW.id + AND tag LIKE attr_def.tag + AND CASE + WHEN attr_def.sf_list IS NOT NULL + THEN POSITION(subfield IN attr_def.sf_list) > 0 + ELSE TRUE + END + GROUP BY tag + ORDER BY tag + LIMIT 1; + + ELSIF attr_def.fixed_field IS NOT NULL THEN -- a named fixed field, see config.marc21_ff_pos_map.fixed_field + attr_value := biblio.marc21_extract_fixed_field(NEW.id, attr_def.fixed_field); + + ELSIF attr_def.xpath IS NOT NULL THEN -- and xpath expression + + SELECT INTO xfrm * FROM config.xml_transform WHERE name = attr_def.format; + + -- See if we can skip the XSLT ... it's expensive + IF prev_xfrm IS NULL OR prev_xfrm <> xfrm.name THEN + -- Can't skip the transform + IF xfrm.xslt <> '---' THEN + transformed_xml := oils_xslt_process(NEW.marc,xfrm.xslt); + ELSE + transformed_xml := NEW.marc; + END IF; + + prev_xfrm := xfrm.name; + END IF; + + IF xfrm.name IS NULL THEN + -- just grab the marcxml (empty) transform + SELECT INTO xfrm * FROM config.xml_transform WHERE xslt = '---' LIMIT 1; + prev_xfrm := xfrm.name; + END IF; + + attr_value := oils_xpath_string(attr_def.xpath, transformed_xml, COALESCE(attr_def.joiner,' '), ARRAY[ARRAY[xfrm.prefix, xfrm.namespace_uri]]); + + ELSIF attr_def.phys_char_sf IS NOT NULL THEN -- a named Physical Characteristic, see config.marc21_physical_characteristic_*_map + SELECT m.value INTO attr_value + FROM biblio.marc21_physical_characteristics(NEW.id) v + JOIN config.marc21_physical_characteristic_value_map m ON (m.id = v.value) + WHERE v.subfield = attr_def.phys_char_sf + LIMIT 1; -- Just in case ... + + END IF; + + -- apply index normalizers to attr_value + FOR normalizer IN + SELECT n.func AS func, + n.param_count AS param_count, + m.params AS params + FROM config.index_normalizer n + JOIN config.record_attr_index_norm_map m ON (m.norm = n.id) + WHERE attr = attr_def.name + ORDER BY m.pos LOOP + EXECUTE 'SELECT ' || normalizer.func || '(' || + COALESCE( quote_literal( attr_value ), 'NULL' ) || + CASE + WHEN normalizer.param_count > 0 + THEN ',' || REPLACE(REPLACE(BTRIM(normalizer.params,'[]'),E'\'',E'\\\''),E'"',E'\'') + ELSE '' + END || + ')' INTO attr_value; + + END LOOP; + + -- Add the new value to the hstore + new_attrs := new_attrs || hstore( attr_def.name, attr_value ); + + END LOOP; + + IF TG_OP = 'INSERT' OR OLD.deleted THEN -- initial insert OR revivication + INSERT INTO metabib.record_attr (id, attrs) VALUES (NEW.id, new_attrs); + ELSE + UPDATE metabib.record_attr SET attrs = new_attrs WHERE id = NEW.id; + END IF; + + END IF; + END IF; + + -- Gather and insert the field entry data + PERFORM metabib.reingest_metabib_field_entries(NEW.id); + + -- Located URI magic + IF TG_OP = 'INSERT' THEN + PERFORM * FROM config.internal_flag WHERE name = 'ingest.disable_located_uri' AND enabled; + IF NOT FOUND THEN + PERFORM biblio.extract_located_uris( NEW.id, NEW.marc, NEW.editor ); + END IF; + ELSE + PERFORM * FROM config.internal_flag WHERE name = 'ingest.disable_located_uri' AND enabled; + IF NOT FOUND THEN + PERFORM biblio.extract_located_uris( NEW.id, NEW.marc, NEW.editor ); + END IF; + END IF; + + -- (re)map metarecord-bib linking + IF TG_OP = 'INSERT' THEN -- if not deleted and performing an insert, check for the flag + PERFORM * FROM config.internal_flag WHERE name = 'ingest.metarecord_mapping.skip_on_insert' AND enabled; + IF NOT FOUND THEN + PERFORM metabib.remap_metarecord_for_bib( NEW.id, NEW.fingerprint ); + END IF; + ELSE -- we're doing an update, and we're not deleted, remap + PERFORM * FROM config.internal_flag WHERE name = 'ingest.metarecord_mapping.skip_on_update' AND enabled; + IF NOT FOUND THEN + PERFORM metabib.remap_metarecord_for_bib( NEW.id, NEW.fingerprint ); + END IF; + END IF; + + RETURN NEW; +END; +$func$ LANGUAGE PLPGSQL; + +CREATE OR REPLACE FUNCTION metabib.browse_normalize(facet_text TEXT, mapped_field INT) RETURNS TEXT AS $$ +DECLARE + normalizer RECORD; +BEGIN + + FOR normalizer IN + SELECT n.func AS func, + n.param_count AS param_count, + m.params AS params + FROM config.index_normalizer n + JOIN config.metabib_field_index_norm_map m ON (m.norm = n.id) + WHERE m.field = mapped_field AND m.pos < 0 + ORDER BY m.pos LOOP + + EXECUTE 'SELECT ' || normalizer.func || '(' || + quote_literal( facet_text ) || + CASE + WHEN normalizer.param_count > 0 + THEN ',' || REPLACE(REPLACE(BTRIM(normalizer.params,'[]'),E'\'',E'\\\''),E'"',E'\'') + ELSE '' + END || + ')' INTO facet_text; + + END LOOP; + + RETURN facet_text; +END; + +$$ LANGUAGE PLPGSQL; + +DROP FUNCTION biblio.extract_metabib_field_entry(bigint, text); +DROP FUNCTION biblio.extract_metabib_field_entry(bigint); + +DROP TYPE metabib.field_entry_template; +CREATE TYPE metabib.field_entry_template AS ( + field_class TEXT, + field INT, + facet_field BOOL, + search_field BOOL, + browse_field BOOL, + source BIGINT, + value TEXT +); + + +CREATE OR REPLACE FUNCTION biblio.extract_metabib_field_entry ( rid BIGINT, default_joiner TEXT ) RETURNS SETOF metabib.field_entry_template AS $func$ +DECLARE + bib biblio.record_entry%ROWTYPE; + idx config.metabib_field%ROWTYPE; + xfrm config.xml_transform%ROWTYPE; + prev_xfrm TEXT; + transformed_xml TEXT; + xml_node TEXT; + xml_node_list TEXT[]; + facet_text TEXT; + browse_text TEXT; + raw_text TEXT; + curr_text TEXT; + joiner TEXT := default_joiner; -- XXX will index defs supply a joiner? + output_row metabib.field_entry_template%ROWTYPE; +BEGIN + + -- Get the record + SELECT INTO bib * FROM biblio.record_entry WHERE id = rid; + + -- Loop over the indexing entries + FOR idx IN SELECT * FROM config.metabib_field ORDER BY format LOOP + + SELECT INTO xfrm * from config.xml_transform WHERE name = idx.format; + + -- See if we can skip the XSLT ... it's expensive + IF prev_xfrm IS NULL OR prev_xfrm <> xfrm.name THEN + -- Can't skip the transform + IF xfrm.xslt <> '---' THEN + transformed_xml := oils_xslt_process(bib.marc,xfrm.xslt); + ELSE + transformed_xml := bib.marc; + END IF; + + prev_xfrm := xfrm.name; + END IF; + + xml_node_list := oils_xpath( idx.xpath, transformed_xml, ARRAY[ARRAY[xfrm.prefix, xfrm.namespace_uri]] ); + + raw_text := NULL; + FOR xml_node IN SELECT x FROM unnest(xml_node_list) AS x LOOP + CONTINUE WHEN xml_node !~ E'^\\s*<'; + + curr_text := ARRAY_TO_STRING( + oils_xpath( '//text()', + REGEXP_REPLACE( -- This escapes all &s not followed by "amp;". Data ise returned from oils_xpath (above) in UTF-8, not entity encoded + REGEXP_REPLACE( -- This escapes embeded [^<]+)(<)([^>]+<)$re$, + E'\\1<\\3', + 'g' + ), + '&(?!amp;)', + '&', + 'g' + ) + ), + ' ' + ); + + CONTINUE WHEN curr_text IS NULL OR curr_text = ''; + + IF raw_text IS NOT NULL THEN + raw_text := raw_text || joiner; + END IF; + + raw_text := COALESCE(raw_text,'') || curr_text; + + -- autosuggest/metabib.browse_entry + IF idx.browse_field THEN + + IF idx.browse_xpath IS NOT NULL AND idx.browse_xpath <> '' THEN + browse_text := oils_xpath_string( idx.browse_xpath, xml_node, joiner, ARRAY[ARRAY[xfrm.prefix, xfrm.namespace_uri]] ); + ELSE + browse_text := curr_text; + END IF; + + output_row.field_class = idx.field_class; + output_row.field = idx.id; + output_row.source = rid; + output_row.value = BTRIM(REGEXP_REPLACE(browse_text, E'\\s+', ' ', 'g')); + + output_row.browse_field = TRUE; + RETURN NEXT output_row; + output_row.browse_field = FALSE; + END IF; + + -- insert raw node text for faceting + IF idx.facet_field THEN + + IF idx.facet_xpath IS NOT NULL AND idx.facet_xpath <> '' THEN + facet_text := oils_xpath_string( idx.facet_xpath, xml_node, joiner, ARRAY[ARRAY[xfrm.prefix, xfrm.namespace_uri]] ); + ELSE + facet_text := curr_text; + END IF; + + output_row.field_class = idx.field_class; + output_row.field = -1 * idx.id; + output_row.source = rid; + output_row.value = BTRIM(REGEXP_REPLACE(facet_text, E'\\s+', ' ', 'g')); + + output_row.facet_field = TRUE; + RETURN NEXT output_row; + output_row.facet_field = FALSE; + END IF; + + END LOOP; + + CONTINUE WHEN raw_text IS NULL OR raw_text = ''; + + -- insert combined node text for searching + IF idx.search_field THEN + output_row.field_class = idx.field_class; + output_row.field = idx.id; + output_row.source = rid; + output_row.value = BTRIM(REGEXP_REPLACE(raw_text, E'\\s+', ' ', 'g')); + + output_row.search_field = TRUE; + RETURN NEXT output_row; + END IF; + + END LOOP; + +END; +$func$ LANGUAGE PLPGSQL; + +-- default to a space joiner +CREATE OR REPLACE FUNCTION biblio.extract_metabib_field_entry ( BIGINT ) RETURNS SETOF metabib.field_entry_template AS $func$ + SELECT * FROM biblio.extract_metabib_field_entry($1, ' '); + $func$ LANGUAGE SQL; + + +CREATE OR REPLACE FUNCTION metabib.reingest_metabib_field_entries( bib_id BIGINT ) RETURNS VOID AS $func$ +DECLARE + fclass RECORD; + ind_data metabib.field_entry_template%ROWTYPE; + mbe_row metabib.browse_entry%ROWTYPE; + mbe_id BIGINT; +BEGIN + PERFORM * FROM config.internal_flag WHERE name = 'ingest.assume_inserts_only' AND enabled; + IF NOT FOUND THEN + FOR fclass IN SELECT * FROM config.metabib_class LOOP + -- RAISE NOTICE 'Emptying out %', fclass.name; + EXECUTE $$DELETE FROM metabib.$$ || fclass.name || $$_field_entry WHERE source = $$ || bib_id; + END LOOP; + DELETE FROM metabib.facet_entry WHERE source = bib_id; + DELETE FROM metabib.browse_entry_def_map WHERE source = bib_id; + END IF; + + FOR ind_data IN SELECT * FROM biblio.extract_metabib_field_entry( bib_id ) LOOP + IF ind_data.field < 0 THEN + ind_data.field = -1 * ind_data.field; + END IF; + + IF ind_data.facet_field THEN + INSERT INTO metabib.facet_entry (field, source, value) + VALUES (ind_data.field, ind_data.source, ind_data.value); + END IF; + + IF ind_data.browse_field THEN + SELECT INTO mbe_row * FROM metabib.browse_entry WHERE value = ind_data.value; + IF FOUND THEN + mbe_id := mbe_row.id; + ELSE + INSERT INTO metabib.browse_entry (value) VALUES + (metabib.browse_normalize(ind_data.value, ind_data.field)); + mbe_id := CURRVAL('metabib.browse_entry_id_seq'::REGCLASS); + END IF; + + INSERT INTO metabib.browse_entry_def_map (entry, def, source) + VALUES (mbe_id, ind_data.field, ind_data.source); + END IF; + + IF ind_data.search_field THEN + EXECUTE $$ + INSERT INTO metabib.$$ || ind_data.field_class || $$_field_entry (field, source, value) + VALUES ($$ || + quote_literal(ind_data.field) || $$, $$ || + quote_literal(ind_data.source) || $$, $$ || + quote_literal(ind_data.value) || + $$);$$; + END IF; + + END LOOP; + + RETURN; +END; +$func$ LANGUAGE PLPGSQL; + +-- This mimics a specific part of QueryParser, turning the first part of a +-- classed search (search_class) into a set of classes and possibly fields. +-- search_class might look like "author" or "title|proper" or "ti|uniform" +-- or "au" or "au|corporate|personal" or anything like that, where the first +-- element of the list you get by separating on the "|" character is either +-- a registered class (config.metabib_class) or an alias +-- (config.metabib_search_alias), and the rest of any such elements are +-- fields (config.metabib_field). +CREATE OR REPLACE + FUNCTION metabib.search_class_to_registered_components(search_class TEXT) + RETURNS SETOF RECORD AS $func$ +DECLARE + search_parts TEXT[]; + field_name TEXT; + search_part_count INTEGER; + rec RECORD; + registered_class config.metabib_class%ROWTYPE; + registered_alias config.metabib_search_alias%ROWTYPE; + registered_field config.metabib_field%ROWTYPE; +BEGIN + search_parts := REGEXP_SPLIT_TO_ARRAY(search_class, E'\\|'); + + search_part_count := ARRAY_LENGTH(search_parts, 1); + IF search_part_count = 0 THEN + RETURN; + ELSE + SELECT INTO registered_class + * FROM config.metabib_class WHERE name = search_parts[1]; + IF FOUND THEN + IF search_part_count < 2 THEN -- all fields + rec := (registered_class.name, NULL::INTEGER); + RETURN NEXT rec; + RETURN; -- done + END IF; + FOR field_name IN SELECT * + FROM UNNEST(search_parts[2:search_part_count]) LOOP + SELECT INTO registered_field + * FROM config.metabib_field + WHERE name = field_name AND + field_class = registered_class.name; + IF FOUND THEN + rec := (registered_class.name, registered_field.id); + RETURN NEXT rec; + END IF; + END LOOP; + ELSE + -- maybe we have an alias? + SELECT INTO registered_alias + * FROM config.metabib_search_alias WHERE alias=search_parts[1]; + IF NOT FOUND THEN + RETURN; + ELSE + IF search_part_count < 2 THEN -- return w/e the alias says + rec := ( + registered_alias.field_class, registered_alias.field + ); + RETURN NEXT rec; + RETURN; -- done + ELSE + FOR field_name IN SELECT * + FROM UNNEST(search_parts[2:search_part_count]) LOOP + SELECT INTO registered_field + * FROM config.metabib_field + WHERE name = field_name AND + field_class = registered_alias.field_class; + IF FOUND THEN + rec := ( + registered_alias.field_class, + registered_field.id + ); + RETURN NEXT rec; + END IF; + END LOOP; + END IF; + END IF; + END IF; + END IF; +END; +$func$ LANGUAGE PLPGSQL; + + +CREATE OR REPLACE + FUNCTION metabib.suggest_browse_entries( + query_text TEXT, -- 'foo' or 'foo & ba:*',ready for to_tsquery() + search_class TEXT, -- 'alias' or 'class' or 'class|field..', etc + headline_opts TEXT, -- markup options for ts_headline() + visibility_org INTEGER,-- null if you don't want opac visibility test + query_limit INTEGER,-- use in LIMIT clause of interal query + normalization INTEGER -- argument to TS_RANK_CD() + ) RETURNS TABLE ( + value TEXT, -- plain + field INTEGER, + bouyant_and_class_match BOOL, + field_match BOOL, + field_weight INTEGER, + rank REAL, + bouyant BOOL, + match TEXT -- marked up + ) AS $func$ +DECLARE + query TSQUERY; + opac_visibility_join TEXT; + search_class_join TEXT; + r_fields RECORD; +BEGIN + query := TO_TSQUERY('keyword', query_text); + + IF visibility_org IS NOT NULL THEN + opac_visibility_join := ' + JOIN asset.opac_visible_copies aovc ON ( + aovc.record = mbedm.source AND + aovc.circ_lib IN (SELECT id FROM actor.org_unit_descendants($4)) + )'; + ELSE + opac_visibility_join := ''; + END IF; + + -- The following determines whether we only provide suggestsons matching + -- the user's selected search_class, or whether we show other suggestions + -- too. The reason for MIN() is that for search_classes like + -- 'title|proper|uniform' you would otherwise get multiple rows. The + -- implication is that if title as a class doesn't have restrict, + -- nor does the proper field, but the uniform field does, you're going + -- to get 'false' for your overall evaluation of 'should we restrict?' + -- To invert that, change from MIN() to MAX(). + + SELECT + INTO r_fields + MIN(cmc.restrict::INT) AS restrict_class, + MIN(cmf.restrict::INT) AS restrict_field + FROM metabib.search_class_to_registered_components(search_class) + AS _registered (field_class TEXT, field INT) + JOIN + config.metabib_class cmc ON (cmc.name = _registered.field_class) + LEFT JOIN + config.metabib_field cmf ON (cmf.id = _registered.field); + + -- evaluate 'should we restrict?' + IF r_fields.restrict_field::BOOL OR r_fields.restrict_class::BOOL THEN + search_class_join := ' + JOIN + metabib.search_class_to_registered_components($2) + AS _registered (field_class TEXT, field INT) ON ( + (_registered.field IS NULL AND + _registered.field_class = cmf.field_class) OR + (_registered.field = cmf.id) + ) + '; + ELSE + search_class_join := ' + LEFT JOIN + metabib.search_class_to_registered_components($2) + AS _registered (field_class TEXT, field INT) ON ( + _registered.field_class = cmc.name + ) + '; + END IF; + + RETURN QUERY EXECUTE 'SELECT *, TS_HEADLINE(value, $1, $3) FROM (SELECT DISTINCT + mbe.value, + cmf.id, + cmc.bouyant AND _registered.field_class IS NOT NULL, + _registered.field = cmf.id, + cmf.weight, + TS_RANK_CD(mbe.index_vector, $1, $6), + cmc.bouyant + FROM metabib.browse_entry_def_map mbedm + JOIN metabib.browse_entry mbe ON (mbe.id = mbedm.entry) + JOIN config.metabib_field cmf ON (cmf.id = mbedm.def) + JOIN config.metabib_class cmc ON (cmf.field_class = cmc.name) + ' || search_class_join || opac_visibility_join || + ' WHERE $1 @@ mbe.index_vector + ORDER BY 3 DESC, 4 DESC NULLS LAST, 5 DESC, 6 DESC, 7 DESC, 1 ASC + LIMIT $5) x + ORDER BY 3 DESC, 4 DESC NULLS LAST, 5 DESC, 6 DESC, 7 DESC, 1 ASC + ' -- sic, repeat the order by clause in the outer select too + USING + query, search_class, headline_opts, + visibility_org, query_limit, normalization + ; + + -- sort order: + -- bouyant AND chosen class = match class + -- chosen field = match field + -- field weight + -- rank + -- bouyancy + -- value itself + +END; +$func$ LANGUAGE PLPGSQL; + +-- The advantage of this over the stock regexp_split_to_array() is that it +-- won't degrade unicode strings. +CREATE OR REPLACE FUNCTION evergreen.regexp_split_to_array(TEXT, TEXT) +RETURNS TEXT[] AS $$ + return encode_array_literal([split $_[1], $_[0]]); +$$ LANGUAGE PLPERLU STRICT IMMUTABLE; + + +-- Adds some logic for browse_entry to split on non-word chars for index_vector, post-normalize +CREATE OR REPLACE FUNCTION oils_tsearch2 () RETURNS TRIGGER AS $$ +DECLARE + normalizer RECORD; + value TEXT := ''; +BEGIN + + value := NEW.value; + + IF TG_TABLE_NAME::TEXT ~ 'field_entry$' THEN + FOR normalizer IN + SELECT n.func AS func, + n.param_count AS param_count, + m.params AS params + FROM config.index_normalizer n + JOIN config.metabib_field_index_norm_map m ON (m.norm = n.id) + WHERE field = NEW.field AND m.pos < 0 + ORDER BY m.pos LOOP + EXECUTE 'SELECT ' || normalizer.func || '(' || + quote_literal( value ) || + CASE + WHEN normalizer.param_count > 0 + THEN ',' || REPLACE(REPLACE(BTRIM(normalizer.params,'[]'),E'\'',E'\\\''),E'"',E'\'') + ELSE '' + END || + ')' INTO value; + + END LOOP; + + NEW.value := value; + END IF; + + IF NEW.index_vector = ''::tsvector THEN + RETURN NEW; + END IF; + + IF TG_TABLE_NAME::TEXT ~ 'field_entry$' THEN + FOR normalizer IN + SELECT n.func AS func, + n.param_count AS param_count, + m.params AS params + FROM config.index_normalizer n + JOIN config.metabib_field_index_norm_map m ON (m.norm = n.id) + WHERE field = NEW.field AND m.pos >= 0 + ORDER BY m.pos LOOP + EXECUTE 'SELECT ' || normalizer.func || '(' || + quote_literal( value ) || + CASE + WHEN normalizer.param_count > 0 + THEN ',' || REPLACE(REPLACE(BTRIM(normalizer.params,'[]'),E'\'',E'\\\''),E'"',E'\'') + ELSE '' + END || + ')' INTO value; + + END LOOP; + END IF; + + IF TG_TABLE_NAME::TEXT ~ 'browse_entry$' THEN + value := ARRAY_TO_STRING( + evergreen.regexp_split_to_array(value, E'\\W+'), ' ' + ); + END IF; + + NEW.index_vector = to_tsvector((TG_ARGV[0])::regconfig, value); + + RETURN NEW; +END; +$$ LANGUAGE PLPGSQL; + +COMMIT; diff --git a/Open-ILS/src/templates/opac/parts/base.tt2 b/Open-ILS/src/templates/opac/parts/base.tt2 index 4af9a87084..239ab30bea 100644 --- a/Open-ILS/src/templates/opac/parts/base.tt2 +++ b/Open-ILS/src/templates/opac/parts/base.tt2 @@ -9,9 +9,15 @@ [% l('Catalog - [_1]', ctx.page_title) %] + [% IF want_dojo %] + + [% END %] [% INCLUDE 'opac/parts/goog_analytics.tt2' %] - + [% content %] [% INCLUDE 'opac/parts/footer.tt2' %] [% INCLUDE 'opac/parts/js.tt2' %] diff --git a/Open-ILS/src/templates/opac/parts/header.tt2 b/Open-ILS/src/templates/opac/parts/header.tt2 index ce519006b9..4efaa30e5c 100644 --- a/Open-ILS/src/templates/opac/parts/header.tt2 +++ b/Open-ILS/src/templates/opac/parts/header.tt2 @@ -69,4 +69,15 @@ cgi.url("-path" => 1, "-query" => 1); END; END; + + # Whether we want Dojo or not may one day be a wholly distinct + # concern from whether we want autosuggest, so let's get used to + # defining that separately. + want_dojo = 0; + + use_autosuggest = ctx.get_cgf("opac.use_autosuggest"); + + IF use_autosuggest.enabled == "t"; + want_dojo = 1; + END; %] diff --git a/Open-ILS/src/templates/opac/parts/js.tt2 b/Open-ILS/src/templates/opac/parts/js.tt2 index 35ff2bb590..91efa6adc3 100644 --- a/Open-ILS/src/templates/opac/parts/js.tt2 +++ b/Open-ILS/src/templates/opac/parts/js.tt2 @@ -40,3 +40,30 @@ IF CGI.https; url = url.replace('^http:', 'https:'); END; %] [%- END %] +[%- IF want_dojo; -%] + + + + +[%- # So the following works in Mozilla and Chrome, but not in IE8. + # Seems like it /should/ work anywhere, though, and obviate the + # next three script tags: %] + + + + + + +[% IF use_autosuggest.enabled == "t"; %] + +[% END; # use_autosuggest %] + +[%- END; # want_dojo -%] diff --git a/Open-ILS/src/templates/opac/parts/qtype_selector.tt2 b/Open-ILS/src/templates/opac/parts/qtype_selector.tt2 index 50896063fc..98567d4ce6 100644 --- a/Open-ILS/src/templates/opac/parts/qtype_selector.tt2 +++ b/Open-ILS/src/templates/opac/parts/qtype_selector.tt2 @@ -6,7 +6,7 @@ {value => "series", label => l("Series")}, {value => "id|bibcn", label => l("Bib Call Number")} ] %] - [% query_type = query_type || CGI.param('qtype'); FOR qt IN query_types -%]