LP#
1998355: reduce growth of DYM dictionary
This patch reduces the number of updates to search.sympell_dictionary
rows that would not change the contents of those rows, thereby
reducing the potential for certain record maintenance operations to
significantlly bloat that table.
In particular, it adjust the upsert to update the row for an existing
prefix only if there would be a net change in at least one of the *_count
columns or the list of suggestions. (Note that if a row is the target of
an UPDATE statement, PostgreSQL will _always_ create a row version, even
if there is no change to the contents of the row.)
It should be noted that while this patch is useful in and of itself, there
is a longer-term fix that would have additional benefits: adjust the
overall reingest logic so that it minimizes changes to all large tables
derived from the bib record when a bib gets reingested. A row that never
gets touched because it doesn't have to be can never become bloat.
To test
-------
[1] In a Concerto database, ensure that idempotent updates of the MARC
in biblio.record_entry will nonetheless force a reingest by running:
update config.internal_flag set enabled = true where name = 'ingest.reingest.force_on_same_marc';
[2] Note the size of search.symspell_dictionary by running:
select pg_size_pretty(pg_total_relation_size('search.symspell_dictionary'));
[3] Run a few rounds of the following update that forces a reingest of the bibs:
update biblio.record_entry set id = id;
[4] For the sake of fairness, run a vacuum on the table:
VACUUM ANALYZE search.sympsell_dictionary
[5] Run the size measuremeant again and not that it's significantly larger.
[6] Run the following the reset the table size:
VACUUM FULL search.symspell_dictionary;
[7] Note the size, apply the patch, and repeat step 3.
[8] This time, the table size should be the same (or close to the same) as it
was at the beginning of step 7.
Signed-off-by: Galen Charlton <gmc@equinoxOLI.org>
Signed-off-by: Jason Boyer <JBoyer@equinoxOLI.org>