From 9ee6e6ced0fc5e39ce6c7c925ffc269c6bae3522 Mon Sep 17 00:00:00 2001 From: Jane Sandberg Date: Sun, 17 Sep 2017 13:11:13 -0700 Subject: [PATCH] Docs: adding info about TPAC microdata + linked data Signed-off-by: Jane Sandberg --- docs/admin/sitemap_admin.adoc | 42 +++++++ .../designing_your_catalog.adoc | 71 ------------ docs/admin_initial_setup/troubleshooting_tpac.adoc | 19 ++++ docs/opac/sitemap.adoc | 18 +++ docs/opac/visibility_on_the_web.adoc | 123 +++++++++++++++++++++ docs/root_command_line_admin.adoc | 6 + docs/root_opac.adoc | 10 +- 7 files changed, 217 insertions(+), 72 deletions(-) create mode 100644 docs/admin/sitemap_admin.adoc create mode 100644 docs/admin_initial_setup/troubleshooting_tpac.adoc create mode 100644 docs/opac/sitemap.adoc create mode 100644 docs/opac/visibility_on_the_web.adoc diff --git a/docs/admin/sitemap_admin.adoc b/docs/admin/sitemap_admin.adoc new file mode 100644 index 0000000000..50bcd82186 --- /dev/null +++ b/docs/admin/sitemap_admin.adoc @@ -0,0 +1,42 @@ +Running the sitemap generator +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +The `sitemap_generator` script must be invoked with the following argument: + +* `--lib-hostname`: specifies the hostname for the catalog (for example, + `--lib-hostname https://catalog.example.com`); all URLs will be generated + appended to this hostname + +Therefore, the following arguments are useful for generating multiple sitemaps +per Evergreen instance: + +* `--lib-shortname`: limit the list of record URLs to those which have copies + owned by the designated library or any of its children; +* `--prefix`: provides a prefix for the sitemap index file names + +Other options enable you to override the OpenSRF configuration file and the +database connection credentials, but the default settings are generally fine. + +Note that on very large Evergreen instances, sitemaps can consume hundreds of +megabytes of disk space, so ensure that your Evergreen instance has enough room +before running the script. + +Sitemap details +~~~~~~~~~~~~~~~ + +The sitemap generator script includes located URIs as well as copies + listed in the `asset.opac_visible_copies` materialized view, and checks + the children or ancestors of the requested libraries for holdings as well. + +Scheduling +~~~~~~~~~~ +To enable search engines to maintain a fresh index of your bibliographic +records, you may want to include the script in your cron jobs on a nightly or +weekly basis. + +Sitemap files are generated in the same directory from which the script is +invoked, so a cron entry will look something like: + +------------------------------------------------------------------------ +12 2 * * * cd /openils/var/web && /openils/bin/sitemap_generator +------------------------------------------------------------------------ + diff --git a/docs/admin_initial_setup/designing_your_catalog.adoc b/docs/admin_initial_setup/designing_your_catalog.adoc index 9a5b9d4dc4..22df88e261 100644 --- a/docs/admin_initial_setup/designing_your_catalog.adoc +++ b/docs/admin_initial_setup/designing_your_catalog.adoc @@ -800,74 +800,3 @@ The system doesn't need the file extension to know what kind of file it is. Reload the bib record summary in the web catalog and your new image will display. -Sitemap generator ------------------ -A http://www.sitemaps.org[sitemap] directs search engines to the pages of -interest in a web site so that the search engines can intelligently crawl -your site. In the case of Evergreen, the primary pages of interest are the -bibliographic record detail pages. - -The sitemap generator script creates sitemaps that adhere to the -http://sitemaps.org specification, including: - -* limiting the number of URLs per sitemap file to no more than 50,000 URLs; -* providing the date that the bibliographic record was last edited, so - that once a search engine has crawled all of your sites' record detail pages, - it only has to reindex those pages that are new or have changed since the last - crawl; -* generating a sitemap index file that points to each of the sitemap files. - -Running the sitemap generator -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The `sitemap_generator` script must be invoked with the following argument: - -* `--lib-hostname`: specifies the hostname for the catalog (for example, - `--lib-hostname https://catalog.example.com`); all URLs will be generated - appended to this hostname - -Therefore, the following arguments are useful for generating multiple sitemaps -per Evergreen instance: - -* `--lib-shortname`: limit the list of record URLs to those which have copies - owned by the designated library or any of its children; -* `--prefix`: provides a prefix for the sitemap index file names - -Other options enable you to override the OpenSRF configuration file and the -database connection credentials, but the default settings are generally fine. - -Note that on very large Evergreen instances, sitemaps can consume hundreds of -megabytes of disk space, so ensure that your Evergreen instance has enough room -before running the script. - -Scheduling -~~~~~~~~~~ -To enable search engines to maintain a fresh index of your bibliographic -records, you may want to include the script in your cron jobs on a nightly or -weekly basis. - -Sitemap files are generated in the same directory from which the script is -invoked, so a cron entry will look something like: - ------------------------------------------------------------------------- -12 2 * * * cd /openils/var/web && /openils/bin/sitemap_generator ------------------------------------------------------------------------- - -Troubleshooting TPAC errors ---------------------------- - -If there is a problem such as a TT syntax error, it generally shows up as an -ugly server failure page. If you check the Apache error logs, you will probably -find some solid clues about the reason for the failure. For example, in the -following example, the error message identifies the file in which the problem -occurred as well as the relevant line numbers. - -Example error message in Apache error logs: - ----- -bash# grep "template error" /var/log/apache2/error_log -[Tue Dec 06 02:12:09 2011] [warn] [client 127.0.0.1] egweb: template error: - file error - parse error - opac/parts/record/summary.tt2 line 112-121: - unexpected token (!=)\n [% last_cn = 0;\n FOR copy_info IN - ctx.copies;\n callnum = copy_info.call_number_label;\n ----- - diff --git a/docs/admin_initial_setup/troubleshooting_tpac.adoc b/docs/admin_initial_setup/troubleshooting_tpac.adoc new file mode 100644 index 0000000000..583517144f --- /dev/null +++ b/docs/admin_initial_setup/troubleshooting_tpac.adoc @@ -0,0 +1,19 @@ +Troubleshooting TPAC errors +--------------------------- + +If there is a problem such as a TT syntax error, it generally shows up as an +ugly server failure page. If you check the Apache error logs, you will probably +find some solid clues about the reason for the failure. For example, in the +following example, the error message identifies the file in which the problem +occurred as well as the relevant line numbers. + +Example error message in Apache error logs: + +---- +bash# grep "template error" /var/log/apache2/error_log +[Tue Dec 06 02:12:09 2011] [warn] [client 127.0.0.1] egweb: template error: + file error - parse error - opac/parts/record/summary.tt2 line 112-121: + unexpected token (!=)\n [% last_cn = 0;\n FOR copy_info IN + ctx.copies;\n callnum = copy_info.call_number_label;\n +---- + diff --git a/docs/opac/sitemap.adoc b/docs/opac/sitemap.adoc new file mode 100644 index 0000000000..e65663d8df --- /dev/null +++ b/docs/opac/sitemap.adoc @@ -0,0 +1,18 @@ +Sitemap generator +----------------- + +A http://www.sitemaps.org[sitemap] directs search engines to the pages of +interest in a web site so that the search engines can intelligently crawl +your site. In the case of Evergreen, the primary pages of interest are the +bibliographic record detail pages. + +The sitemap generator script creates sitemaps that adhere to the +http://sitemaps.org specification, including: + +* limiting the number of URLs per sitemap file to no more than 50,000 URLs; +* providing the date that the bibliographic record was last edited, so + that once a search engine has crawled all of your sites' record detail pages, + it only has to reindex those pages that are new or have changed since the last + crawl; +* generating a sitemap index file that points to each of the sitemap files. + diff --git a/docs/opac/visibility_on_the_web.adoc b/docs/opac/visibility_on_the_web.adoc new file mode 100644 index 0000000000..0ed5c530c5 --- /dev/null +++ b/docs/opac/visibility_on_the_web.adoc @@ -0,0 +1,123 @@ +Library visibility on the Web +----------------------------- + +Introduction +~~~~~~~~~~~~ + +Evergreen follows a number of best practices to +make Library data integrate with the rest of the +Web. Evergreen's public catalog pages are +designed so that search engines can easily extract +meaningful information about your library and +collections. Evergreen is also preparing for an +eventual shift toward linked open bibliographic +data. + +Catalog data in search engines +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each record in the catalog is displayed to search +engines using http://schema.org[schema.org] microdata. + +[IMPORTANT] +Make sure your system administrator has not added +a restrictive robots.txt file to your server. +These files restrict search engines, up to the +point of not allowing search engines to index your +site at all. + +Details of the schema.org mapping +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + * Each item is listed as a + http://schema.org/Offer[schema:Offer], which is + the same category that an online bookseller might + use to describe an item for sale. These Offers + are always listed with a price of $0.00. + * Subject headings are exposed as + http://schema.org/about[schema:about] + properties. + * Electronic resources are assigned a + http://schema.org/url[schema:url] + property, and any notes or link text + are assigned a + http://schema.org/description[schema:description] + property. + * Given a Library of Congress relator code for + 1xx and 7xx fields, Evergreen surfaces the URL + for that relator code along with the + http://schema.org/contributor[schema:contributor] + property to give machines a better chance + of understanding how the person or organization + actually contributed to this work. + * Linking out to related records: + ** Given an LCCN (010 field), Evergreen links to + the corresponding Library of Congress record + using http://schema.org/sameAs[schema:sameAs]. + ** Given an OCLC number (035 field, subfield `a` + beginning with `(OCoLC)`), Evergreen links to + the corresponding WorldCat record using + http://schema.org/sameAs[schema:sameAs]. + ** Given a URI (024 field, subfield 2 = `'uri'`), + Evergreen links to the corresponding OCLC + Work Entity record using + http://schema.org/exampleOfWork[schema:exampleOfWork]. + + +Viewing microdata +^^^^^^^^^^^^^^^^^ +You can learn more about how Evergreen publicizes +these data by viewing them directly. The +http://linter.structured-data.org[structured data linter] +is a helpful tool for viewing microdata. + +. Using your favorite Web browser, navigate to a + record in your public catalog. +. Copy the URL that displays in your browser's + address bar. +. Go to http://linter.structured-data.org +. Under the _Lint by URL_ tab, paste your URL + into the text box. +. Click _Submit_ + +Other helpful features for search engines +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + * Titles of catalog pages follow a + "Page title - Library name" pattern to provide + specific titles in search engine results pages, + browser bookmarks, and browser tabs. + * Links that robots should not crawl, such as search + result links, are marked with the + https://support.google.com/webmasters/answer/96569?hl=en[@rel="nofollow"] + property. + * Catalog pages for record details and for library + descriptions express a + https://support.google.com/webmasters/answer/139066?hl=en[@rel="canonical"] + link to simplify the number of variations of page + URLs that could otherwise have been derived from + different search parameters. + * Catalog pages that do not exist return a proper + 404 "HTTP_NOT_FOUND" HTTP status code, and record + detail pages for records that have been deleted + now return a proper 410 "HTTP_GONE" HTTP status code. + * Record detail and library pages include + http://ogp.me/[Open Graph Protocol] markup. + * Each library has its own page at + _http://localhost/eg/opac/library/LIBRARY_SHORTNAME_ + that provides machine-readable hours and contact + information. + +SKOS support +~~~~~~~~~~~~ + +Some vocabularies used (or which could be used) for +stock record attributes and coded value maps in Evergreen +are published on the web using SKOS. The record +attributes system can now associate Linked Data URIs +with specific attribute values. In particular, seed data +supplying URIs for the RDA Content Type, Media Type, and +Carrier Type has been added. + +This is an experimental, "under-the-hood" feature that +will be built upon in subsuquent releases. + diff --git a/docs/root_command_line_admin.adoc b/docs/root_command_line_admin.adoc index cb046fdc02..6fea46b490 100644 --- a/docs/root_command_line_admin.adoc +++ b/docs/root_command_line_admin.adoc @@ -104,6 +104,12 @@ include::admin/template_toolkit.adoc[] include::admin_initial_setup/designing_your_catalog.adoc[] +include::opac/sitemap.adoc[] + +include::admin/sitemap_admin.adoc[] + +include::admin_initial_setup/troubleshooting_tpac.adoc[] + :leveloffset: 0 include::admin/audio_alerts.adoc[] diff --git a/docs/root_opac.adoc b/docs/root_opac.adoc index d495726203..863688f9e3 100644 --- a/docs/root_opac.adoc +++ b/docs/root_opac.adoc @@ -16,7 +16,8 @@ workers in public services roles. It is organized into Parts, Chapters, and Sections addressing key aspects of the software. -Copies of this guide can be accessed in PDF and HTML formats from http://docs.evergreen-ils.org/. +Copies of this guide can be accessed in PDF and HTML formats from +http://docs.evergreen-ils.org/. @@ -44,8 +45,15 @@ include::opac/kids_opac.adoc[] include::opac/opensearch.adoc[] +include::opac/visibility_on_the_web.adoc[] + :leveloffset: 0 +include::opac/sitemap.adoc[] + +See the Command Line System Administration Manual for details about +running this script. + include::shared/attributions.adoc[] include::shared/end_matter.adoc[] -- 2.11.0