From f956df518e07abed2f153e3c33621000401a6ac5 Mon Sep 17 00:00:00 2001 From: Dan Scott Date: Mon, 7 Jul 2014 09:58:01 -0400 Subject: [PATCH] LP# 1330784: Release notes for sitemap builder More like documentation than release notes, but more is probably better than less. Signed-off-by: Dan Scott Signed-off-by: Ben Shum --- docs/RELEASE_NOTES_NEXT/OPAC/sitemap_builder.txt | 51 ++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 docs/RELEASE_NOTES_NEXT/OPAC/sitemap_builder.txt diff --git a/docs/RELEASE_NOTES_NEXT/OPAC/sitemap_builder.txt b/docs/RELEASE_NOTES_NEXT/OPAC/sitemap_builder.txt new file mode 100644 index 0000000000..527a5ed364 --- /dev/null +++ b/docs/RELEASE_NOTES_NEXT/OPAC/sitemap_builder.txt @@ -0,0 +1,51 @@ +Sitemap generator +^^^^^^^^^^^^^^^^^ +A http://www.sitemaps.org[sitemap] directs search engines to the pages of +interest in a web site so that the search engines can intelligently crawl +your site. In the case of Evergreen, the primary pages of interest are the +bibliographic record detail pages. + +The sitemap generator script creates sitemaps that adhere to the +http://sitemaps.org specification, including: + +* limiting the number of URLs per sitemap file to no more than 50,000 URLs; +* providing the date that the bibliographic record was last edited, so + that once a search engine has crawled all of your sites' record detail pages, + it only has to reindex those pages that are new or have changed since the last + crawl; +* generating a sitemap index file that points to each of the sitemap files. + +Running the sitemap generator ++++++++++++++++++++++++++++++ +The `sitemap_generator` script must be invoked with the following argument: + +* `--lib-hostname`: specifies the hostname for the catalog (for example, + `--lib-hostname https://catalog.example.com`); all URLs will be generated + appended to this hostname + +Therefore, the following arguments are useful for generating multiple sitemaps +per Evergreen instance: + +* `--lib-shortname`: limit the list of record URLs to those which have copies + owned by the designated library or any of its children; +* `--prefix`: provides a prefix for the sitemap index file names + +Other options enable you to override the OpenSRF configuration file and the +database connection credentials, but the default settings are generally fine. + +Note that on very large Evergreen instances, sitemaps can consume hundreds of +megabytes of disk space, so ensure that your Evergreen instance has enough room +before running the script. + +Scheduling +++++++++++ +To enable search engines to maintain a fresh index of your bibliographic +records, you may want to include the script in your cron jobs on a nightly or +weekly basis. + +Sitemap files are generated in the same directory from which the script is +invoked, so a cron entry will look something like: + +------------------------------------------------------------------------ +12 2 * * * cd /openils/var/web && /openils/bin/sitemap_generator +------------------------------------------------------------------------ -- 2.11.0