From 875febe7127a12f03e53cc90269087ac1b916c97 Mon Sep 17 00:00:00 2001 From: Robert Soulliere Date: Wed, 2 Feb 2011 10:20:07 -0500 Subject: [PATCH] Add development to 2.0 from 1.6 --- 2.0/development/OpenSRF_intro.xml | 1738 +++++++++++++++++++++++ 2.0/development/customize_opac.xml | 332 +++++ 2.0/development/customizingstaffclient.xml | 54 + 2.0/development/datamodelsandaccess.xml | 789 +++++++++++ 2.0/development/directoriesandFiles.xml | 140 ++ 2.0/development/installchecklist.xml | 84 ++ 2.0/development/introduction_to_sql.xml | 2126 ++++++++++++++++++++++++++++ 2.0/development/json.xml | 1863 ++++++++++++++++++++++++ 2.0/development/supercat.xml | 163 +++ 2.0/root.xml | 9 + 10 files changed, 7298 insertions(+) create mode 100644 2.0/development/OpenSRF_intro.xml create mode 100644 2.0/development/customize_opac.xml create mode 100644 2.0/development/customizingstaffclient.xml create mode 100644 2.0/development/datamodelsandaccess.xml create mode 100644 2.0/development/directoriesandFiles.xml create mode 100644 2.0/development/installchecklist.xml create mode 100644 2.0/development/introduction_to_sql.xml create mode 100644 2.0/development/json.xml create mode 100644 2.0/development/supercat.xml diff --git a/2.0/development/OpenSRF_intro.xml b/2.0/development/OpenSRF_intro.xml new file mode 100644 index 0000000..f345997 --- /dev/null +++ b/2.0/development/OpenSRF_intro.xml @@ -0,0 +1,1738 @@ + + + + OpenSRF + + + One of the claimed advantages of + Evergreen over alternative integrated library systems is the underlying Open + Service Request Framework (OpenSRF, pronounced "open surf") architecture. This + article introduces OpenSRF, demonstrates how to build OpenSRF services through + simple code examples, and explains the technical foundations on which OpenSRF + is built. This chapter was taken from Dan Scott's Easing gently into OpenSRF article, June, 2010. + +
+ Introducing OpenSRF + OpenSRF + OpenSRF is a message routing network that offers scalability and failover + support for individual services and entire servers with minimal development and + deployment overhead. You can use OpenSRF to build loosely-coupled applications + that can be deployed on a single server or on clusters of geographically + distributed servers using the same code and minimal configuration changes. + Although copyright statements on some of the OpenSRF code date back to Mike + Rylander’s original explorations in 2000, Evergreen was the first major + application to be developed with, and to take full advantage of, the OpenSRF + architecture starting in 2004. The first official release of OpenSRF was 0.1 in + February 2005 (http://evergreen-ils.org/blog/?p=21), but OpenSRF’s development + continues a steady pace of enhancement and refinement, with the release of + 1.0.0 in October 2008 and the most recent release of 1.2.2 in February 2010. + OpenSRF is a distinct break from the architectural approach used by previous + library systems and has more in common with modern Web applications. The + traditional "scale-up" approach to serve more transactions is to purchase a + server with more CPUs and more RAM, possibly splitting the load between a Web + server, a database server, and a business logic server. Evergreen, however, is + built on the Open Service Request Framework (OpenSRF) architecture, which + firmly embraces the "scale-out" approach of spreading transaction load over + cheap commodity servers. The initial GPLS + PINES hardware cluster, while certainly impressive, may have offered the + misleading impression that Evergreen requires a lot of hardware to run. + However, Evergreen and OpenSRF easily scale down to a single server; many + Evergreen libraries run their entire library system on a single server, and + most OpenSRF and Evergreen development occurs on a virtual machine running on a + single laptop or desktop image. + Another common concern is that the flexibility of OpenSRF’s distributed + architecture makes it complex to configure and to write new applications. This + article demonstrates that OpenSRF itself is an extremely simple architecture on + which one can easily build applications of many kinds – not just library + applications – and that you can use a number of different languages to call and + implement OpenSRF methods with a minimal learning curve. With an application + built on OpenSRF, when you identify a bottleneck in your application’s business + logic layer, you can adjust the number of the processes serving that particular + bottleneck on each of your servers; or if the problem is that your service is + resource-hungry, you could add an inexpensive server to your cluster and + dedicate it to running that resource-hungry service. + + Programming language support + If you need to develop an entirely new OpenSRF service, you can choose from a + number of different languages in which to implement that service. OpenSRF + client language bindings have been written for C, Java, JavaScript, Perl, and + Python, and service language bindings have been written for C, Perl, and Python. + This article uses Perl examples as a lowest common denominator programming + language. Writing an OpenSRF binding for another language is a relatively small + task if that language offers libraries that support the core technologies on + which OpenSRF depends: + + + + Extensible Messaging and Presence + Protocol (XMPP, sometimes referred to as Jabber) - provides the base messaging + infrastructure between OpenSRF clients and services + + XMPP + + + + JavaScript Object Notation (JSON) - serializes the content + of each XMPP message in a standardized and concise format + + + + + memcached - provides the caching service + + memcached + + + + syslog - the standard UNIX logging + service + + syslog + + + Unfortunately, the + OpenSRF + reference documentation, although augmented by the + OpenSRF + glossary, blog posts like the description + of OpenSRF and Jabber, and even this article, is not a sufficient substitute + for a complete specification on which one could implement a language binding. + The recommended option for would-be developers of another language binding is + to use the Python implementation as the cleanest basis for a port to another + language. + Python + +
+
+ Writing an OpenSRF Service + Imagine an application architecture in which 10 lines of Perl or Python, using + the data types native to each language, are enough to implement a method that + can then be deployed and invoked seamlessly across hundreds of servers. You + have just imagined developing with OpenSRF – it is truly that simple. Under the + covers, of course, the OpenSRF language bindings do an incredible amount of + work on behalf of the developer. An OpenSRF application consists of one or more + OpenSRF services that expose methods: for example, the opensrf.simple-text + demonstration + service exposes the opensrf.simple-text.split() and + opensrf.simple-text.reverse() methods. Each method accepts zero or more + arguments and returns zero or one results. The data types supported by OpenSRF + arguments and results are typical core language data types: strings, numbers, + booleans, arrays, and hashes. + To implement a new OpenSRF service, perform the following steps: + + + + Include the base OpenSRF support libraries + + + + + Write the code for each of your OpenSRF methods as separate procedures + + + + + Register each method + + + + + Add the service definition to the OpenSRF configuration files + + + + For example, the following code implements an OpenSRF service. The service + includes one method named opensrf.simple-text.reverse() that accepts one + string as input and returns the reversed version of that string: + +#!/usr/bin/perl + +package OpenSRF::Application::Demo::SimpleText; + +use strict; + +use OpenSRF::Application; +use parent qw/OpenSRF::Application/; + +sub text_reverse { + my ($self , $conn, $text) = @_; + my $reversed_text = scalar reverse($text); + return $reversed_text; +} + +__PACKAGE__->register_method( + method => 'text_reverse', + api_name => 'opensrf.simple-text.reverse' +); + + Ten lines of code, and we have a complete OpenSRF service that exposes a single + method and could be deployed quickly on a cluster of servers to meet your + application’s ravenous demand for reversed strings! If you’re unfamiliar with + Perl, the use OpenSRF::Application; use parent qw/OpenSRF::Application/; + lines tell this package to inherit methods and properties from the + OpenSRF::Application module. For example, the call to + __PACKAGE__->register_method() is defined in OpenSRF::Application but due to + inheritance is available in this package (named by the special Perl symbol + __PACKAGE__ that contains the current package name). The register_method() + procedure is how we introduce a method to the rest of the OpenSRF world. + + Registering a service with the OpenSRF configuration files + Two files control most of the configuration for OpenSRF: + + + + opensrf.xml contains the configuration for the service itself, as well as + a list of which application servers in your OpenSRF cluster should start + the service. + + + + + opensrf_core.xml (often referred to as the "bootstrap configuration" + file) contains the OpenSRF networking information, including the XMPP server + connection credentials for the public and private routers. You only need to touch + this for a new service if the new service needs to be accessible via the + public router. + + configuration filesopensrf_core.xml + + + Begin by defining the service itself in opensrf.xml. To register the + opensrf.simple-text service, add the following section to the <apps> + element (corresponding to the XPath /opensrf/default/apps/): + configuration filesopensrf.xml + +<apps> + <opensrf.simple-text> + <keepalive>3</keepalive> + <stateless>1</stateless> + <language>perl</language> + <implementation>OpenSRF::Application::Demo::SimpleText</implementation> + <max_requests>100</max_requests> + <unix_config> + <max_requests>1000</max_requests> + <unix_log>opensrf.simple-text_unix.log</unix_log> + <unix_sock>opensrf.simple-text_unix.sock</unix_sock> + <unix_pid>opensrf.simple-text_unix.pid</unix_pid> + <min_children>5</min_children> + <max_children>15</max_children> + <min_spare_children>2</min_spare_children> + <max_spare_children>5</max_spare_children> + </unix_config> + </opensrf.simple-text> + + <!-- other OpenSRF services registered here... --> +</apps> + + + + + The element name is the name that the OpenSRF control scripts use to refer + to the service. + + + + + The <keepalive> element specifies the interval (in seconds) between + checks to determine if the service is still running. + + + + + The <stateless> element specifies whether OpenSRF clients can call + methods from this service without first having to create a connection to a + specific service backend process for that service. If the value is 1, then + the client can simply issue a request and the router will forward the request + to an available service and the result will be returned directly to the client. + + + + + The <language> element specifies the programming language in which the + service is implemented. + + + + + The <implementation> element pecifies the name of the library or module + in which the service is implemented. + + + + + (C implementations only): The <max_requests> element, as a direct child + of the service element name, specifies the maximum number of requests a process + serves before it is killed and replaced by a new process. + + + + + (Perl implementations only): The <max_requests> element, as a direct + child of the <unix_config> element, specifies the maximum number of requests + a process serves before it is killed and replaced by a new process. + + + + + The <unix_log> element specifies the name of the log file for + language-specific log messages such as syntax warnings. + + + + + The <unix_sock> element specifies the name of the UNIX socket used for + inter-process communications. + + + + + The <unix_pid> element specifies the name of the PID file for the + master process for the service. + + + + + The <min_children> element specifies the minimum number of child + processes that should be running at any given time. + + + + + The <max_children> element specifies the maximum number of child + processes that should be running at any given time. + + + + + The <min_spare_children> element specifies the minimum number of idle + child processes that should be available to handle incoming requests. If there + are fewer than this number of spare child processes, new processes will be + spawned. + + + + + The`<max_spare_children>` element specifies the maximum number of idle + child processes that should be available to handle incoming requests. If there + are more than this number of spare child processes, the extra processes will be + killed. + + + + To make the service accessible via the public router, you must also + edit the opensrf_core.xml configuration file to add the service to the list + of publicly accessible services: + Making a service publicly accessible in <literal>opensrf_core.xml</literal> + +<router> + <!-- This is the public router. On this router, we only register applications + which should be accessible to everyone on the opensrf network --> + <name>router</name> + <domain>public.localhost</domain> + <services> + <service>opensrf.math</service> + <service>opensrf.simple-text</service> + </services> +</router> + + + + + + This section of the opensrf_core.xml file is located at XPath + /config/opensrf/routers/. + + + + + public.localhost is the canonical public router domain in the OpenSRF + installation instructions. + + + + + Each <service> element contained in the <services> element + offers their services via the public router as well as the private router. + + + + Once you have defined the new service, you must restart the OpenSRF Router + to retrieve the new configuration and start or restart the service itself. + + + Calling an OpenSRF method + srfsh + OpenSRF clients in any supported language can invoke OpenSRF services in any + supported language. So let’s see a few examples of how we can call our fancy + new opensrf.simple-text.reverse() method: + + Calling OpenSRF methods from the srfsh client + srfsh is a command-line tool installed with OpenSRF that you can use to call + OpenSRF methods. To call an OpenSRF method, issue the request command and + pass the OpenSRF service and method name as the first two arguments; then pass + one or more JSON objects delimited by commas as the arguments to the method + being invoked. + The following example calls the opensrf.simple-text.reverse method of the + opensrf.simple-text OpenSRF service, passing the string "foobar" as the + only method argument: + +$ srfsh +srfsh # request opensrf.simple-text opensrf.simple-text.reverse "foobar" + +Received Data: "raboof" + +=------------------------------------ +Request Completed Successfully +Request Time in seconds: 0.016718 +=------------------------------------ + + + + Getting documentation for OpenSRF methods from the srfsh client + The srfsh client also gives you command-line access to retrieving metadata + about OpenSRF services and methods. For a given OpenSRF method, for example, + you can retrieve information such as the minimum number of required arguments, + the data type and a description of each argument, the package or library in + which the method is implemented, and a description of the method. To retrieve + the documentation for an opensrf method from srfsh, issue the introspect + command, followed by the name of the OpenSRF service and (optionally) the + name of the OpenSRF method. If you do not pass a method name to the introspect + command, srfsh lists all of the methods offered by the service. If you pass + a partial method name, srfsh lists all of the methods that match that portion + of the method name. + The quality and availability of the descriptive information for each + method depends on the developer to register the method with complete and + accurate information. The quality varies across the set of OpenSRF and + Evergreen APIs, although some effort is being put towards improving the + state of the internal documentation. + +srfsh# introspect opensrf.simple-text "opensrf.simple-text.reverse" +--> opensrf.simple-text + +Received Data: { + "__c":"opensrf.simple-text", + "__p":{ + "api_level":1, + "stream":0, + "object_hint":"OpenSRF_Application_Demo_SimpleText", + "remote":0, + "package":"OpenSRF::Application::Demo::SimpleText", + "api_name":"opensrf.simple-text.reverse", + "server_class":"opensrf.simple-text", + "signature":{ + "params":[ + { + "desc":"The string to reverse", + "name":"text", + "type":"string" + } + ], + "desc":"Returns the input string in reverse order\n", + "return":{ + "desc":"Returns the input string in reverse order", + "type":"string" + } + }, + "method":"text_reverse", + "argc":1 + } +} + + + + + stream denotes whether the method supports streaming responses or not. + + + + + package identifies which package or library implements the method. + + + + + api_name identifies the name of the OpenSRF method. + + + + + signature is a hash that describes the parameters for the method. + + + + + params is an array of hashes describing each parameter in the method; + each parameter has a description (desc), name (name), and type (type). + + + + + desc is a string that describes the method itself. + + + + + return is a hash that describes the return value for the method; it + contains a description of the return value (desc) and the type of the + returned value (type). + + + + + method identifies the name of the function or method in the source + implementation. + + + + + argc is an integer describing the minimum number of arguments that + must be passed to this method. + + + + + + Calling OpenSRF methods from Perl applications + To call an OpenSRF method from Perl, you must connect to the OpenSRF service, + issue the request to the method, and then retrieve the results. + +#/usr/bin/perl +use strict; +use OpenSRF::AppSession; +use OpenSRF::System; + +OpenSRF::System->bootstrap_client(config_file => '/openils/conf/opensrf_core.xml'); + +my $session = OpenSRF::AppSession->create("opensrf.simple-text"); + +print "substring: Accepts a string and a number as input, returns a string\n"; +my $result = $session->request("opensrf.simple-text.substring", "foobar", 3); +my $request = $result->gather(); +print "Substring: $request\n\n"; + +print "split: Accepts two strings as input, returns an array of strings\n"; +$request = $session->request("opensrf.simple-text.split", "This is a test", " "); +my $output = "Split: ["; +my $element; +while ($element = $request->recv()) { + $output .= $element->content . ", "; +} +$output =~ s/, $/]/; +print $output . "\n\n"; + +print "statistics: Accepts an array of strings as input, returns a hash\n"; +my @many_strings = [ + "First I think I'll have breakfast", + "Then I think that lunch would be nice", + "And then seventy desserts to finish off the day" +]; + +$result = $session->request("opensrf.simple-text.statistics", @many_strings); +$request = $result->gather(); +print "Length: " . $result->{'length'} . "\n"; +print "Word count: " . $result->{'word_count'} . "\n"; + +$session->disconnect(); + + + + + The OpenSRF::System->bootstrap_client() method reads the OpenSRF + configuration information from the indicated file and creates an XMPP client + connection based on that information. + + + + + The OpenSRF::AppSession->create() method accepts one argument - the name + of the OpenSRF service to which you want to want to make one or more requests - + and returns an object prepared to use the client connection to make those + requests. + + + + + The OpenSRF::AppSession->request() method accepts a minimum of one + argument - the name of the OpenSRF method to which you want to make a request - + followed by zero or more arguments to pass to the OpenSRF method as input + values. This example passes a string and an integer to the + opensrf.simple-text.substring method defined by the opensrf.simple-text + OpenSRF service. + + + + + The gather() method, called on the result object returned by the + request() method, iterates over all of the possible results from the result + object and returns a single variable. + + + + + This request() call passes two strings to the opensrf.simple-text.split + method defined by the opensrf.simple-text OpenSRF service and returns (via + gather()) a reference to an array of results. + + + + + The opensrf.simple-text.split() method is a streaming method that + returns an array of results with one element per recv() call on the + result object. We could use the gather() method to retrieve all of the + results in a single array reference, but instead we simply iterate over + the result variable until there are no more results to retrieve. + + + + + While the gather() convenience method returns only the content of the + complete set of results for a given request, the recv() method returns an + OpenSRF result object with status, statusCode, and content fields as + we saw in the HTTP results example. + + + + + This request() call passes an array to the + opensrf.simple-text.statistics method defined by the opensrf.simple-text + OpenSRF service. + + + + + The result object returns a hash reference via gather(). The hash + contains the length and word_count keys we defined in the method. + + + + + The OpenSRF::AppSession->disconnect() method closes the XMPP client + connection and cleans up resources associated with the session. + + + + + + + Accepting and returning more interesting data types + Of course, the example of accepting a single string and returning a single + string is not very interesting. In real life, our applications tend to pass + around multiple arguments, including arrays and hashes. Fortunately, OpenSRF + makes that easy to deal with; in Perl, for example, returning a reference to + the data type does the right thing. In the following example of a method that + returns a list, we accept two arguments of type string: the string to be split, + and the delimiter that should be used to split the string. + Basic text splitting method + +sub text_split { + my $self = shift; + my $conn = shift; + my $text = shift; + my $delimiter = shift || ' '; + + my @split_text = split $delimiter, $text; + return \@split_text; +} + +__PACKAGE__->register_method( + method => 'text_split', + api_name => 'opensrf.simple-text.split' +); + + + We simply return a reference to the list, and OpenSRF does the rest of the work + for us to convert the data into the language-independent format that is then + returned to the caller. As a caller of a given method, you must rely on the + documentation used to register to determine the data structures - if the developer has + added the appropriate documentation. + + + Accepting and returning Evergreen objects + OpenSRF is agnostic about objects; its role is to pass JSON back and forth + between OpenSRF clients and services, and it allows the specific clients and + services to define their own semantics for the JSON structures. On top of that + infrastructure, Evergreen offers the fieldmapper: an object-relational mapper + that provides a complete definition of all objects, their properties, their + relationships to other objects, the permissions required to create, read, + update, or delete objects of that type, and the database table or view on which + they are based. + Fieldmapper + The Evergreen fieldmapper offers a great deal of convenience for working with + complex system objects beyond the basic mapping of classes to database + schemas. Although the result is passed over the wire as a JSON object + containing the indicated fields, fieldmapper-aware clients then turn those + JSON objects into native objects with setter / getter methods for each field. + All of this metadata about Evergreen objects is defined in the + fieldmapper configuration file (/openils/conf/fm_IDL.xml), and access to + these classes is provided by the open-ils.cstore, open-ils.pcrud, and + open-ils.reporter-store OpenSRF services which parse the fieldmapper + configuration file and dynamically register OpenSRF methods for creating, + reading, updating, and deleting all of the defined classes. + Example fieldmapper class definition for "Open User Summary" + +<class id="mous" controller="open-ils.cstore open-ils.pcrud" + oils_obj:fieldmapper="money::open_user_summary" + oils_persist:tablename="money.open_usr_summary" + reporter:label="Open User Summary"> + <fields oils_persist:primary="usr" oils_persist:sequence=""> + <field name="balance_owed" reporter:datatype="money" /> + <field name="total_owed" reporter:datatype="money" /> + <field name="total_paid" reporter:datatype="money" /> + <field name="usr" reporter:datatype="link"/> + </fields> + <links> + <link field="usr" reltype="has_a" key="id" map="" class="au"/> + </links> + <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1"> + <actions> + <retrieve permission="VIEW_USER"> + <context link="usr" field="home_ou"/> + </retrieve> + </actions> + </permacrud> +</class> + + + + + + The <class> element defines the class: + + + + + The id attribute defines the class hint that identifies the class both + elsewhere in the fieldmapper configuration file, such as in the value of the + field attribute of the <link> element, and in the JSON object itself when + it is instantiated. For example, an "Open User Summary" JSON object would have + the top level property of "__c":"mous". + + + + + The controller attribute identifies the services that have direct access + to this class. If open-ils.pcrud is not listed, for example, then there is + no means to directly access members of this class through a public service. + + + + + The oils_obj:fieldmapper attribute defines the name of the Perl + fieldmapper class that will be dynamically generated to provide setter and + getter methods for instances of the class. + + + + + The oils_persist:tablename attribute identifies the schema name and table + name of the database table that stores the data that represents the instances + of this class. In this case, the schema is money and the table is + open_usr_summary. + + + + + The reporter:label attribute defines a human-readable name for the class + used in the reporting interface to identify the class. These names are defined + in English in the fieldmapper configuration file; however, they are extracted + so that they can be translated and served in the user’s language of choice. + + + + + + + The <fields> element lists all of the fields that belong to the object. + + + + + The oils_persist:primary attribute identifies the field that acts as the + primary key for the object; in this case, the field with the name usr. + + + + + The oils_persist:sequence attribute identifies the sequence object + (if any) in this database provides values for new instances of this class. In + this case, the primary key is defined by a field that is linked to a different + table, so no sequence is used to populate these instances. + + + + + + + Each <field> element defines a single field with the following attributes: + + + + + The name attribute identifies the column name of the field in the + underlying database table as well as providing a name for the setter / getter + method that can be invoked in the JSON or native version of the object. + + + + + The reporter:datatype attribute defines how the reporter should treat + the contents of the field for the purposes of querying and display. + + + + + The reporter:label attribute can be used to provide a human-readable name + for each field; without it, the reporter falls back to the value of the name + attribute. + + + + + + + The <links> element contains a set of zero or more <link> elements, + each of which defines a relationship between the class being described and + another class. + + + + + The field attribute identifies the field named in this class that links + to the external class. + + + + + The reltype attribute identifies the kind of relationship between the + classes; in the case of has_a, each value in the usr field is guaranteed + to have a corresponding value in the external class. + + + + + The key attribute identifies the name of the field in the external + class to which this field links. + + + + + The rarely-used map attribute identifies a second class to which + the external class links; it enables this field to define a direct + relationship to an external class with one degree of separation, to + avoid having to retrieve all of the linked members of an intermediate + class just to retrieve the instances from the actual desired target class. + + + + + The class attribute identifies the external class to which this field + links. + + + + + + + The <permacrud> element defines the permissions that must have been + granted to a user to operate on instances of this class. + + + + + The <retrieve> element is one of four possible children of the + <actions> element that define the permissions required for each action: + create, retrieve, update, and delete. + + + + + The permission attribute identifies the name of the permission that must + have been granted to the user to perform the action. + + + + + The contextfield attribute, if it exists, defines the field in this class + that identifies the library within the system for which the user must have + prvileges to work. If a user has been granted a given permission, but has not been + granted privileges to work at a given library, they can not perform the action + at that library. + + + + + + + The rarely-used <context> element identifies a linked field (link + attribute) in this class which links to an external class that holds the field + (field attribute) that identifies the library within the system for which the + user must have privileges to work. + + + + When you retrieve an instance of a class, you can ask for the result to + flesh some or all of the linked fields of that class, so that the linked + instances are returned embedded directly in your requested instance. In that + same request you can ask for the fleshed instances to in turn have their linked + fields fleshed. By bundling all of this into a single request and result + sequence, you can avoid the network overhead of requiring the client to request + the base object, then request each linked object in turn. + You can also iterate over a collection of instances and set the automatically + generated isdeleted, isupdated, or isnew properties to indicate that + the given instance has been deleted, updated, or created respectively. + Evergreen can then act in batch mode over the collection to perform the + requested actions on any of the instances that have been flagged for action. + + + Returning streaming results + In the previous implementation of the opensrf.simple-text.split method, we + returned a reference to the complete array of results. For small values being + delivered over the network, this is perfectly acceptable, but for large sets of + values this can pose a number of problems for the requesting client. Consider a + service that returns a set of bibliographic records in response to a query like + "all records edited in the past month"; if the underlying database is + relatively active, that could result in thousands of records being returned as + a single network request. The client would be forced to block until all of the + results are returned, likely resulting in a significant delay, and depending on + the implementation, correspondingly large amounts of memory might be consumed + as all of the results are read from the network in a single block. + OpenSRF offers a solution to this problem. If the method returns results that + can be divided into separate meaningful units, you can register the OpenSRF + method as a streaming method and enable the client to loop over the results one + unit at a time until the method returns no further results. In addition to + registering the method with the provided name, OpenSRF also registers an additional + method with .atomic appended to the method name. The .atomic variant gathers + all of the results into a single block to return to the client, giving the caller + the ability to choose either streaming or atomic results from a single method + definition. + In the following example, the text splitting method has been reimplemented to + support streaming; very few changes are required: + Text splitting method - streaming mode + +sub text_split { + my $self = shift; + my $conn = shift; + my $text = shift; + my $delimiter = shift || ' '; + + my @split_text = split $delimiter, $text; + foreach my $string (@split_text) { + $conn->respond($string); + } + return undef; +} + +__PACKAGE__->register_method( + method => 'text_split', + api_name => 'opensrf.simple-text.split', + stream => 1 +); + + + + + + Rather than returning a reference to the array, a streaming method loops + over the contents of the array and invokes the respond() method of the + connection object on each element of the array. + + + + + Registering the method as a streaming method instructs OpenSRF to also + register an atomic variant (opensrf.simple-text.split.atomic). + + + + + + Error! Warning! Info! Debug! + As hard as it may be to believe, it is true: applications sometimes do not + behave in the expected manner, particularly when they are still under + development. The service language bindings for OpenSRF include integrated + support for logging messages at the levels of ERROR, WARNING, INFO, DEBUG, and + the extremely verbose INTERNAL to either a local file or to a syslogger + service. The destination of the log files, and the level of verbosity to be + logged, is set in the opensrf_core.xml configuration file. To add logging to + our Perl example, we just have to add the OpenSRF::Utils::Logger package to our + list of used Perl modules, then invoke the logger at the desired logging level. + You can include many calls to the OpenSRF logger; only those that are higher + than your configured logging level will actually hit the log. The following + example exercises all of the available logging levels in OpenSRF: + +use OpenSRF::Utils::Logger; +my $logger = OpenSRF::Utils::Logger; +# some code in some function +{ + $logger->error("Hmm, something bad DEFINITELY happened!"); + $logger->warn("Hmm, something bad might have happened."); + $logger->info("Something happened."); + $logger->debug("Something happened; here are some more details."); + $logger->internal("Something happened; here are all the gory details.") +} + + If you call the mythical OpenSRF method containing the preceding OpenSRF logger + statements on a system running at the default logging level of INFO, you will + only see the INFO, WARN, and ERR messages, as follows: + Results of logging calls at the default level of INFO + +[2010-03-17 22:27:30] opensrf.simple-text [ERR :5681:SimpleText.pm:277:] +[2010-03-17 22:27:30] opensrf.simple-text [WARN:5681:SimpleText.pm:278:] +[2010-03-17 22:27:30] opensrf.simple-text [INFO:5681:SimpleText.pm:279:] + + + If you then increase the the logging level to INTERNAL (5), the logs will + contain much more information, as follows: + Results of logging calls at the default level of INTERNAL + +[2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:277:] +[2010-03-17 22:48:11] opensrf.simple-text [WARN:5934:SimpleText.pm:278:] +[2010-03-17 22:48:11] opensrf.simple-text [INFO:5934:SimpleText.pm:279:] +[2010-03-17 22:48:11] opensrf.simple-text [DEBG:5934:SimpleText.pm:280:] +[2010-03-17 22:48:11] opensrf.simple-text [INTL:5934:SimpleText.pm:281:] +[2010-03-17 22:48:11] opensrf.simple-text [ERR :5934:SimpleText.pm:283:] +[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:Cache.pm:125:] +[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:579:] +[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:586:] +[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:Application.pm:190:] +[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:780:] Calling queue_wait(0) +[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:769:] Resending...0 +[2010-03-17 22:48:21] opensrf.simple-text [INTL:5934:AppSession.pm:450:] In send +[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] +[2010-03-17 22:48:21] opensrf.simple-text [DEBG:5934:AppSession.pm:506:] +... + + + To see everything that is happening in OpenSRF, try leaving your logging level + set to INTERNAL for a few minutes - just ensure that you have a lot of free disk + space available if you have a moderately busy system! + + + Caching results: one secret of scalability + search resultscaching + If you have ever used an application that depends on a remote Web service + outside of your control — say, if you need to retrieve results from a + microblogging service — you know the pain of latency and dependability (or the + lack thereof). To improve the response time for OpenSRF services, you can take + advantage of the support offered by the OpenSRF::Utils::Cache module for + communicating with a local instance or cluster of memcache daemons to store + and retrieve persistent values. The following example demonstrates caching + by sleeping for 10 seconds the first time it receives a given cache key and + cannot retrieve a corresponding value from the cache: + Simple caching OpenSRF service + +use OpenSRF::Utils::Cache; +sub test_cache { + my $self = shift; + my $conn = shift; + my $test_key = shift; + my $cache = OpenSRF::Utils::Cache->new('global'); + my $cache_key = "opensrf.simple-text.test_cache.$test_key"; + my $result = $cache->get_cache($cache_key) || undef; + if ($result) { + $logger->info("Resolver found a cache hit"); + return $result; + } + sleep 10; + my $cache_timeout = 300; + $cache->put_cache($cache_key, "here", $cache_timeout); + return "There was no cache hit."; +} + + + + + + The OpenSRF::Utils::Cache module provides access to the built-in caching + support in OpenSRF. + + + + + The constructor for the cache object accepts a single argument to define + the cache type for the object. Each cache type can use a separate memcache + server to keep the caches separated. Most Evergreen services use the global + cache, while the anon cache is used for Web sessions. + + + + + The cache key is simply a string that uniquely identifies the value you + want to store or retrieve. This line creates a cache key based on the OpenSRF + method name and request input value. + + + + + The get_cache() method checks to see if the cache key already exists. If + a matching key is found, the service immediately returns the stored value. + + + + + If the cache key does not exist, the code sleeps for 10 seconds to + simulate a call to a slow remote Web service or an intensive process. + + + + + The $cache_timeout variable represents a value for the lifetime of the + cache key in seconds. + + + + + After the code retrieves its value (or, in the case of this example, + finishes sleeping), it creates the cache entry by calling the put_cache() + method. The method accepts three arguments: the cache key, the value to be + stored ("here"), and the timeout value in seconds to ensure that we do not + return stale data on subsequent calls. + + + + + + Initializing the service and its children: child labour + When an OpenSRF service is started, it looks for a procedure called + initialize() to set up any global variables shared by all of the children of + the service. The initialize() procedure is typically used to retrieve + configuration settings from the opensrf.xml file. + An OpenSRF service spawns one or more children to actually do the work + requested by callers of the service. For every child process an OpenSRF service + spawns, the child process clones the parent environment and then each child + process runs the child_init() process (if any) defined in the OpenSRF service + to initialize any child-specific settings. + When the OpenSRF service kills a child process, it invokes the child_exit() + procedure (if any) to clean up any resources associated with the child process. + Similarly, when the OpenSRF service is stopped, it calls the DESTROY() + procedure to clean up any remaining resources. + + + Retrieving configuration settings + The settings for OpenSRF services are maintained in the opensrf.xml XML + configuration file. The structure of the XML document consists of a root + element <opensrf> containing two child elements: + + + + The <default> element contains an <apps> element describing all + OpenSRF services running on this system — see --, as + well as any other arbitrary XML descriptions required for global configuration + purposes. For example, Evergreen uses this section for email notification and + inter-library patron privacy settings. + + + + + The <hosts> element contains one element per host that participates in + this OpenSRF system. Each host element must include an <activeapps> element + that lists all of the services to start on this host when the system starts + up. Each host element can optionally override any of the default settings. + + + + OpenSRF includes a service named opensrf.settings to provide distributed + cached access to the configuration settings with a simple API: + + + + opensrf.settings.default_config.get accepts zero arguments and returns + the complete set of default settings as a JSON document. + + + + + opensrf.settings.host_config.get accepts one argument (hostname) and + returns the complete set of settings, as customized for that hostname, as a + JSON document. + + + + + opensrf.settings.xpath.get accepts one argument (an + XPath expression) and returns the portion of + the configuration file that matches the expression as a JSON document. + + + + For example, to determine whether an Evergreen system uses the opt-in + support for sharing patron information between libraries, you could either + invoke the opensrf.settings.default_config.get method and parse the + JSON document to determine the value, or invoke the opensrf.settings.xpath.get + method with the XPath /opensrf/default/share/user/opt_in argument to + retrieve the value directly. + In practice, OpenSRF includes convenience libraries in all of its client + language bindings to simplify access to configuration values. C offers + osrfConfig.c, Perl offers OpenSRF::Utils::SettingsClient, Java offers + org.opensrf.util.SettingsClient, and Python offers osrf.set. These + libraries locally cache the configuration file to avoid network roundtrips for + every request and enable the developer to request specific values without + having to manually construct XPath expressions. + +
+
+ OpenSRF Communication Flows + OpenSRFCommunication Flows + Now that you have seen that it truly is easy to create an OpenSRF service, we + can take a look at what is going on under the covers to make all of this work + for you. + + Get on the messaging bus - safely + One of the core innovations of OpenSRF was to use the Extensible Messaging and + Presence Protocol (XMPP, more colloquially known as Jabber) as the messaging + bus that ties OpenSRF services together across servers. XMPP is an "XML + protocol for near-real-time messaging, presence, and request-response services" + (http://www.ietf.org/rfc/rfc3920.txt) that OpenSRF relies on to handle most of + the complexity of networked communications. OpenSRF requres an XMPP server + that supports multiple domains such as ejabberd. + Multiple domain support means that a single server can support XMPP virtual + hosts with separate sets of users and access privileges per domain. By + routing communications through separate public and private XMPP domains, + OpenSRF services gain an additional layer of security. + The OpenSRF + installation documentation instructs you to create two separate hostnames + (private.localhost and public.localhost) to use as XMPP domains. OpenSRF + can control access to its services based on the domain of the client and + whether a given service allows access from clients on the public domain. When + you start OpenSRF, the first XMPP clients that connect to the XMPP server are + the OpenSRF public and private routers. OpenSRF routers maintain a list of + available services and connect clients to available services. When an OpenSRF + service starts, it establishes a connection to the XMPP server and registers + itself with the private router. The OpenSRF configuration contains a list of + public OpenSRF services, each of which must also register with the public + router. + + + OpenSRF communication flows over XMPP + XMPP + In a minimal OpenSRF deployment, two XMPP users named "router" connect to the + XMPP server, with one connected to the private XMPP domain and one connected to + the public XMPP domain. Similarly, two XMPP users named "opensrf" connect to + the XMPP server via the private and public XMPP domains. When an OpenSRF + service is started, it uses the "opensrf" XMPP user to advertise its + availability with the corresponding router on that XMPP domain; the XMPP server + automatically assigns a Jabber ID (JID) based on the client hostname to each + service’s listener process and each connected drone process waiting to carry + out requests. When an OpenSRF router receives a request to invoke a method on a + given service, it connects the requester to the next available listener in the + list of registered listeners for that service. + Services and clients connect to the XMPP server using a single set of XMPP + client credentials (for example, opensrf@private.localhost), but use XMPP + resource identifiers to differentiate themselves in the JID for each + connection. For example, the JID for a copy of the opensrf.simple-text + service with process ID 6285 that has connected to the private.localhost + domain using the opensrf XMPP client credentials could be + opensrf@private.localhost/opensrf.simple-text_drone_at_localhost_6285. By + convention, the user name for OpenSRF clients is opensrf, and the user name + for OpenSRF routers is router, so the XMPP server for OpenSRF will have four + separate users registered: + * opensrf@private.localhost is an OpenSRF client that connects with these + credentials and which can access any OpenSRF service. + * opensrf@public.localhost is an OpenSRF client that connects with these + credentials and which can only access OpenSRF services that have registered + with the public router. + * router@private.localhost is the private OpenSRF router with which all + services register. + * router@public.localhost is the public OpenSRF router with which only + services that must be publicly accessible register. + All OpenSRF services automatically register themselves with the private XMPP + domain, but only those services that register themselves with the public XMPP + domain can be invoked from public OpenSRF clients. The OpenSRF client and + router user names, passwords, and domain names, along with the list of services + that should be public, are contained in the opensrf_core.xml configuration + file. + + + OpenSRF communication flows over HTTP + HTTPtranslator + In some contexts, access to a full XMPP client is not a practical option. For + example, while XMPP clients have been implemented in JavaScript, you might + be concerned about browser compatibility and processing overhead - or you might + want to issue OpenSRF requests from the command line with curl. Fortunately, + any OpenSRF service registered with the public router is accessible via the + OpenSRF HTTP Translator. The OpenSRF HTTP Translator implements the + OpenSRF-over-HTTP + proposed specification as an Apache module that translates HTTP requests into + OpenSRF requests and returns OpenSRF results as HTTP results to the initiating + HTTP client. + Issuing an HTTP POST request to an OpenSRF method via the OpenSRF HTTP Translator + +# curl request broken up over multiple lines for legibility +curl -H "X-OpenSRF-service: opensrf.simple-text" + --data 'osrf-msg=[ \ + {"__c":"osrfMessage","__p":{"threadTrace":0,"locale":"en-CA", + "type":"REQUEST","payload": {"__c":"osrfMethod","__p": + {"method":"opensrf.simple-text.reverse","params":["foobar"]} + }} + }]' +http://localhost/osrf-http-translator + + + + + + The X-OpenSRF-service header identifies the OpenSRF service of interest. + + + + + The POST request consists of a single parameter, the osrf-msg value, + which contains a JSON array. + + + + + The first object is an OpenSRF message ("__c":"osrfMessage") with a set of + parameters ("__p":{}). + + + + + The identifier for the request ("threadTrace":0); this value is echoed + back in the result. + + + + + The message type ("type":"REQUEST"). + + + + + The locale for the message; if the OpenSRF method is locale-sensitive, it + can check the locale for each OpenSRF request and return different information + depending on the locale. + + + + + The payload of the message ("payload":{}) containing the OpenSRF method + request ("__c":"osrfMethod") and its parameters ("__p:"{}). + + + + + The method name for the request ("method":"opensrf.simple-text.reverse"). + + + + + A set of JSON parameters to pass to the method ("params":["foobar"]); in + this case, a single string "foobar". + + + + + + + + + The URL on which the OpenSRF HTTP translator is listening, + /osrf-http-translator is the default location in the Apache example + configuration files shipped with the OpenSRF source, but this is configurable. + + + + Results from an HTTP POST request to an OpenSRF method via the OpenSRF HTTP Translator + +# HTTP response broken up over multiple lines for legibility +[{"__c":"osrfMessage","__p": + {"threadTrace":0, "payload": + {"__c":"osrfResult","__p": + {"status":"OK","content":"raboof","statusCode":200} + },"type":"RESULT","locale":"en-CA" + } +}, +{"__c":"osrfMessage","__p": + {"threadTrace":0,"payload": + {"__c":"osrfConnectStatus","__p": + {"status":"Request Complete","statusCode":205} + },"type":"STATUS","locale":"en-CA" + } +}] + + + + + + The OpenSRF HTTP Translator returns an array of JSON objects in its + response. Each object in the response is an OpenSRF message + ("__c":"osrfMessage") with a collection of response parameters ("__p":). + + + + + The OpenSRF message identifier ("threadTrace":0) confirms that this + message is in response to the request matching the same identifier. + + + + + The message includes a payload JSON object ("payload":) with an OpenSRF + result for the request ("__c":"osrfResult"). + + + + + The result includes a status indicator string ("status":"OK"), the content + of the result response - in this case, a single string "raboof" + ("content":"raboof") - and an integer status code for the request + ("statusCode":200). + + + + + The message also includes the message type ("type":"RESULT") and the + message locale ("locale":"en-CA"). + + + + + The second message in the set of results from the response. + + + + + Again, the message identifier confirms that this message is in response to + a particular request. + + + + + The payload of the message denotes that this message is an + OpenSRF connection status message ("__c":"osrfConnectStatus"), with some + information about the particular OpenSRF connection that was used for this + request. + + + + + The response parameters for an OpenSRF connection status message include a + verbose status ("status":"Request Complete") and an integer status code for + the connection status (`"statusCode":205). + + + + + The message also includes the message type ("type":"RESULT") and the + message locale ("locale":"en-CA"). + + + + Before adding a new public OpenSRF service, ensure that it does + not introduce privilege escalation or unchecked access to data. For example, + the Evergreen open-ils.cstore private service is an object-relational mapper + that provides read and write access to the entire Evergreen database, so it + would be catastrophic to expose that service publicly. In comparison, the + Evergreen open-ils.pcrud public service offers the same functionality as + open-ils.cstore to any connected HTTP client or OpenSRF client, but the + additional authentication and authorization layer in open-ils.pcrud prevents + unchecked access to Evergreen’s data. + + + Stateless and stateful connections + OpenSRF supports both stateless and stateful connections. When an OpenSRF + client issues a REQUEST message in a stateless connection, the router + forwards the request to the next available service and the service returns the + result directly to the client. + + When an OpenSRF client issues a CONNECT message to create a stateful conection, the + router returns the Jabber ID of the next available service to the client so + that the client can issue one or more REQUEST message directly to that + particular service and the service will return corresponding RESULT messages + directly to the client. Until the client issues a DISCONNECT message, that + particular service is only available to the requesting client. Stateful connections + are useful for clients that need to make many requests from a particular service, + as it avoids the intermediary step of contacting the router for each request, as + well as for operations that require a controlled sequence of commands, such as a + set of database INSERT, UPDATE, and DELETE statements within a transaction. + + + + Message body format + OpenSRF was an early adopter of JavaScript Object Notation (JSON). While XMPP + is an XML protocol, the Evergreen developers recognized that the compactness of + the JSON format offered a significant reduction in bandwidth for the volume of + messages that would be generated in an application of that size. In addition, + the ability of languages such as JavaScript, Perl, and Python to generate + native objects with minimal parsing offered an attractive advantage over + invoking an XML parser for every message. Instead, the body of the XMPP message + is a simple JSON structure. For a simple request, like the following example + that simply reverses a string, it looks like a significant overhead: but we get + the advantages of locale support and tracing the request from the requester + through the listener and responder (drone). + A request for opensrf.simple-text.reverse("foobar"): + +<message from='router@private.localhost/opensrf.simple-text' + to='opensrf@private.localhost/opensrf.simple-text_listener_at_localhost_6275' + router_from='opensrf@private.localhost/_karmic_126678.3719_6288' + router_to='' router_class='' router_command='' osrf_xid='' +> + <thread>1266781414.366573.12667814146288</thread> + <body> +[ + {"__c":"osrfMessage","__p": + {"threadTrace":"1","locale":"en-US","type":"REQUEST","payload": + {"__c":"osrfMethod","__p": + {"method":"opensrf.simple-text.reverse","params":["foobar"]} + } + } + } +] + </body> +</message> + + + A response from opensrf.simple-text.reverse("foobar") + +<message from='opensrf@private.localhost/opensrf.simple-text_drone_at_localhost_6285' + to='opensrf@private.localhost/_karmic_126678.3719_6288' + router_command='' router_class='' osrf_xid='' +> + <thread>1266781414.366573.12667814146288</thread> + <body> +[ + {"__c":"osrfMessage","__p": + {"threadTrace":"1","payload": + {"__c":"osrfResult","__p": + {"status":"OK","content":"raboof","statusCode":200} + } ,"type":"RESULT","locale":"en-US"} + }, + {"__c":"osrfMessage","__p": + {"threadTrace":"1","payload": + {"__c":"osrfConnectStatus","__p": + {"status":"Request Complete","statusCode":205} + },"type":"STATUS","locale":"en-US"} + } +] + </body> +</message> + + + The content of the <body> element of the OpenSRF request and result should + look familiar; they match the structure of the OpenSRF over HTTP examples that we previously dissected. + + + Registering OpenSRF methods in depth + Let’s explore the call to __PACKAGE__->register_method(); most of the members + of the hash are optional, and for the sake of brevity we omitted them in the + previous example. As we have seen in the results of the introspection call, a + verbose registration method call is recommended to better enable the internal + documentation. Here is the complete set of members that you should pass to + __PACKAGE__->register_method(): + + + + The method member specifies the name of the procedure in this module that is being registered as an OpenSRF method. + + + + + The api_name member specifies the invocable name of the OpenSRF method; by convention, the OpenSRF service name is used as the prefix. + + + + + The optional api_level member can be used for versioning the methods to allow the use of a deprecated API, but in practical use is always 1. + + + + + The optional argc member specifies the minimal number of arguments that the method expects. + + + + + The optional stream member, if set to any value, specifies that the method supports returning multiple values from a single call to + subsequent requests. OpenSRF automatically creates a corresponding method with ".atomic" appended to its name that returns the complete set of results in a + single request. Streaming methods are useful if you are returning hundreds of records and want to act on the results as they return. + + + + + The optional signature member is a hash that describes the method’s purpose, arguments, and return value. + + + + + The desc member of the signature hash describes the method’s purpose. + + + + + The params member of the signature hash is an array of hashes in which each array element describes the corresponding method + argument in order. + + + + + The name member of the argument hash specifies the name of the argument. + + + + + The desc member of the argument hash describes the argument’s purpose. + + + + + The type member of the argument hash specifies the data type of the argument: for example, string, integer, boolean, number, array, or hash. + + + + + + + The return member of the signature hash is a hash that describes the return value of the method. + + + + + The desc member of the return hash describes the return value. + + + + + The type member of the return hash specifies the data type of the return value: for example, string, integer, boolean, number, + array, or hash. + + + + + + + + +
+
+ Evergreen-specific OpenSRF services + Evergreen is currently the primary showcase for the use of OpenSRF as an + application architecture. Evergreen 1.6.1 includes the following + set of OpenSRF services: + + + + The open-ils.actor service supports common tasks for working with user + accounts and libraries. + + + + + The open-ils.auth service supports authentication of Evergreen users. + + + + + The open-ils.booking service supports the management of reservations + for bookable items. + + + + + The open-ils.cat service supports common cataloging tasks, such as + creating, modifying, and merging bibliographic and authority records. + + + + + The open-ils.circ service supports circulation tasks such as checking + out items and calculating due dates. + + + + + The open-ils.collections service supports tasks that assist collections + agencies in contacting users with outstanding fines above a certain + threshold. + + + + + The open-ils.cstore private service supports unrestricted access to + Evergreen fieldmapper objects. + + + + + The open-ils.ingest private service supports tasks for importing + data such as bibliographic and authority records. + + + + + The open-ils.pcrud service supports permission-based access to Evergreen + fieldmapper objects. + + + + + The open-ils.penalty penalty service supports the calculation of + penalties for users, such as being blocked from further borrowing, for + conditions such as having too many items checked out or too many unpaid + fines. + + + + + The open-ils.reporter service supports the creation and scheduling of + reports. + + + + + The open-ils.reporter-store private service supports access to Evergreen + fieldmapper objects for the reporting service. + + + + + The open-ils.search service supports searching across bibliographic + records, authority records, serial records, Z39.50 sources, and ZIP codes. + + + + + The open-ils.storage private service supports a deprecated method of + providing access to Evergreen fieldmapper objects. Implemented in Perl, + this service has largely been replaced by the much faster C-based + open-ils.cstore service. + + + + + The open-ils.supercat service supports transforms of MARC records into + other formats, such as MODS, as well as providing Atom and RSS feeds and + SRU access. + + + + + The open-ils.trigger private service supports event-based triggers for + actions such as overdue and holds available notification emails. + + + + + The open-ils.vandelay service supports the import and export of batches of + bibliographic and authority records. + + + + Of some interest is that the open-ils.reporter-store and open-ils.cstore + services have identical implementations. Surfacing them as separate services + enables a deployer of Evergreen to ensure that the reporting service does not + interfere with the performance-critical open-ils.cstore service. One can also + direct the reporting service to a read-only database replica to, again, avoid + interference with open-ils.cstore which must write to the master database. + There are only a few significant services that are not built on OpenSRF in + Evergreen 1.6.0, such as the SIP and Z39.50 servers. These services implement + different protocols and build on existing daemon architectures (Simple2ZOOM + for Z39.50), but still rely on the other OpenSRF services to provide access + to the Evergreen data. The non-OpenSRF services are reasonably self-contained + and can be deployed on different servers to deliver the same sort of deployment + flexibility as OpenSRF services, but have the disadvantage of not being + integrated into the same configuration and control infrastructure as the + OpenSRF services. +
+
diff --git a/2.0/development/customize_opac.xml b/2.0/development/customize_opac.xml new file mode 100644 index 0000000..bd3563a --- /dev/null +++ b/2.0/development/customize_opac.xml @@ -0,0 +1,332 @@ + + + + Customizing the OPAC + + While Evergreen is ready to go out of the box, libraries will want to customize Evergreen with their own color scheme, logos and layout. This chapter will explain how to + customize Evergreen to meet the needs of your users. For these task some knowledge of html and css is required. Many of these + instructions assume an installation of Evergreen using the default file locations. + + Be sure to save a backup copy of all files you edit in a location other than /openils/var/web/opac/ as files here could be + overwritten when you upgrade your copy of Evergreen. + +
+ Change the Color Scheme + OPACcustomizingchanging the color scheme + To change the color scheme of the default Evergreen skin, edit /openils/var/web/opac/theme/default/css/colors.css. From this one file you can + change the 4 base color scheme as well as colors of specific elements. + + You can also create alternate themes for your users. + + + Copy the css folder and its contents from the example alternate theme /openils/var/web/opac/theme/reddish/ + to a new folder /openils/var/web/opac/theme/[your new theme]/. + + + Edit /openils/var/web/opac/theme/[your new theme]/css/colors.css to use the colors you want. + + + Link to your new style sheet by adding the following to /openils/var/web/opac/skin/default/xml/common/css_common.xml. + +<link type='text/css' +rel="alternate stylesheet" +title='&opac.style.yourtheme;' +href="<!--#echo var='OILS_THEME_BASE'-->/yourtheme/css/colors.css" +name='Default' csstype='color'/> + + + + Give your new theme a name users can select by adding the following to /openils/var/web/opac/locale/ + [your locale]/opac.dtd. + <!ENTITY opac.style.yourtheme "YourTheme"> + + +
+
+ customizing Opac Text and Labels + OPACcustomizingtext and labels + To change text and links used throughout the OPAC, edit the following files: + + /openils/var/web/opac/locale/[your locale]/lang.dtd + /openils/var/web/opac/locale/[your locale]/opac.dtd + + + A better way to customize OPAC text is to create custom dtd files for your lang and opac customizations and then add a include + statement above the default dtd files. + + <!DOCTYPE html PUBLIC + "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ + <!--#include virtual="/opac/locale/${locale}/custom_opac.dtd"--> + <!--#include virtual="/opac/locale/${locale}/opac.dtd"--> + ]> + + position is important here. The first/top included dtd files will take precedence over the subsequent dtd includes. + + While it is possible to add text to the xml files itself, it is a good practice to use the DTD file to control the text and refer to the DTD elements in the xml/html code. + For example, the footer.xml file has this code to generate a copyright statement: + +<div id='copyright_text'> +<span>&footer.copyright;</span> + + The included opac.dtd file in the en-US locale directory has this setting for &footer.copyright text: + <!ENTITY footer.copyright "Copyright © 2006-2010 Georgia Public Library Service, and others"> +
+
+ Logo Images + To change the logos used by default to your own logos, replace the following files with images of your own, appropriately sized. + + Large main logo:/openils/var/web/opac/images/main_logo.jpg + Small logo:/openils/var/web/opac/images/small_logo.jpg + +
+
+ Added Content + OPACadded content + By default Evergreen includes customizable Added Content features to enhance the OPAC experience for your user. These features include Amazon book covers + and Google books searching. These features can be turned off or customized. + + Book Covers + The default install of Evergreen includes Amazon book covers. The settings for this are controlled by the <added_content> section of + /openils/conf/opensrf.xml. Here are the key elements of this configuration: + <module>OpenILS::WWW::AddedContent::Amazon</module> + This calls the Amazon perl module. If you wish to link to a different book cover service other than Amazon, you must create a new perl module and refer to it here. + You will also need to change other settings accordingly. There are some available book cover perl modules available in + trunk + <base_url>http://images.amazon.com/images/P/</base_url> + Base URL for Amazon added content fetching. This URL may need to be shortened when new (read: non-image) content fetching + capabilities are added. + <timeout>1</timeout> + Max number of seconds to wait for an added content request to return data. Data not returned within the timeout is considered a failure. + <retry_timeout>600</retry_timeout> + After added content lookups have been disabled due to too many lookup failures, this is the amount of time to wait before we try again. + <max_errors>15</max_errors> + Maximum number of consecutive lookup errors a given process can live before added content lookups are disabled for everyone. + <userid>MY_USER_ID</userid> + If a userid is required to access the added content. + + + Google Books Link + OPACadded contentGoogle Books + The results page will display a Browse in Google Books Search link for items in the results page which have corresponding entries + in Google Books. + This will link to Google Books content including table of contents and complete versions of the work if it exists in Google Books. Items not in Google Books will not + display a link. This feature can be turned off by changing the googleBooksLink variable setting to false in the file + /openils/var/web/opac/skin/default/js/result_common.js. By default, this feature is activated. + +
+
+ Customizing the Results Page + The results page is extremely customizable and allows some built in features to be activated with some simple edits or more advanced customizations can be done by more + experienced web developers. + There are several critical files to edit if you wish to customize the results page: + + /openils/var/web/opac/skin/default/js/result_common.js - This file controls the JavaScript for the top level elements on the results + page and should only be edited by experienced web developers except for the Google books link setting mentioned previously. + /openils/var/web/opac/skin/default/js/rresult.js - Has some good controls of results page settings at the top of this file but + requires web development skills for editing this file. + /openils/var/web/opac/skin/default/xml/result/rresult_table.xml - This controls the layout of the items table on results page. + +
+
+ Customizing the Details Page + OPACcustomizingdetails page + There are many options when customizing the details page in Evergreen. The default settings are effective for most libraries, but it is important to understand the full potential + of Evergreen when displaying the details of items. + Some quick features can be turned on and off by changing variable values in the file /openils/var/web/opac/skin/default/js/rdedail.js. + You will notice the section at the top of this file called Per-skin configuration settings. Changing setting in this section can control several features including + limiting results to local only or showing copy location or displaying serial holdings. Form this section you can also enable refworks and set the Refworks host URL. + Some copy level details settings can be turned on and off from /openils/var/web/opac/skin/default/js/copy_details.js including displaying certain fields + such as due date in the OPAC. + An important file is the /openils/var/web/opac/skin/default/xml/rdetail/rdetail_summary.xml file. This file allows you to control which field to display in + the details summary of the record. The new BibTemplate feature makes this file even more powerful by allowing you to display any marc fields + with a variety of formatting options. + The /openils/var/web/opac/skin/default/xml/rdetail/rdetail_copyinfo.xml file allows you to format the display of the copy information. +
+
+ BibTemplateBibTemplate + BibTemplate is an Evergreen-custom Dojo module which can be used to retrieve and format XML data served by the Evergreen unAPI service. unAPI is a protocol for requesting known objects in specific formats, and Evergreen uses this to supply data – bibliographic records, + metarecords, monograph holdings information, Located URIs, and more to come – in many different formats from MARCXML to MODS to custom XML applications. + unAPI MARCXMLMODS + Managing the display of information from raw XML can be difficult, and the purpose of BibTemplate is to make this simpler, as well as move the display closer to the + client and away from the source data. This is good from a separation-of-responsibilities perspective, and also makes it easier to contain and control local customization. + BibTemplate supports the following Evergreen meta data formats: + + MARCXML - datatype='marcxml-full' (default) + MODS 3.3: datatype='mods33' + Dublin Core: datatype='rdf_dc'Dublin Core + FGDC: datatype='fgdc'FGDC + + + HTML API + BibTemplate follows the Dojo convention of adding attributes to existing (X)HTML in order to progressively change its behavior. The 1.6.0 HTML API consists of a + set of attributes that are added to existing OPAC markup, and fall into two classes:Dojo toolkit + + The slot marker – Elements that denote the location of bibliographic data to insert. + The slot formatter – Elements that specify how the named data should be formatted for display. + + + + Slot Markerslot marker + A slot marker is any displayable HTML element that has a type attribute with a value starting with opac/slot-data. This element will become the container + for the formatted data. A slot marker is required in order to retrieve, format and display data using BibTemplate. A slot marker must also have an + attribute called query containing a CSS3 selector. This selector is applied to the XML returned by the unAPI service in order to gather the specific XML + Nodes that should be considered for formatting.CSS3 + The slot marker can also specify the format of the data to be returned from the unAPI service. This can be specified by adding +{format} to the type + attribute, as in opac/slot-data+mods33-full. The default data format is marcxml-uri, which is an augmented MARCXML record containing Located URI information + and unAPI links. + Example of a slot marker: + <p type='opac/slot-data' query='datafield[tag=245]'></p> + Most useful attribute match operators include: + + datafield[tag=245] - exact match + datafield[tag^=65] - match start of value + + Selectors always narrow, so select broadly and iterate through the NodeList + + + Slot Formatterslot formatter + A slot formatter is any invisible HTML element which has a type attribute with the value of opac/slot-format. (NOTE: before 1.6.0.4, only <script> + elements were supported, though this restriction is now removed to support Internet Explorer.) Only one slot formatter element is allowed in each slot. The text contents + of this element are wrapped in a JavaScriptJavaScript function and run for each node returned by the query CSS3 selector + specified on the slot marker. This function is passed + one argument, called item, which an XML Node captured by the selector. This function should return HTML text. The output for all runs of the slot formatter is + concatenated into a single string and used to replace the contents of the slot marker. + The slot formatter is optional, and if not supplied BibTemplate will create a simple function which extracts and returns the text content of the XML Nodes + specified in the CSS3 selector. + Example of a slot formatter: + + <td class='rdetail_item' id='rdetail_online' type='opac/slot-data' + query='volumes volume uris uri' join=", "> + <script type='opac/slot-format'><![CDATA[ + var link = '<a href="' + item.getAttribute('href') + '">' + item.getAttribute('label') + '</a>'; + if (item.getAttribute('use_restriction')) + link += ' (Use restriction: ' + item.getAttribute('use_restriction') + ')'; + return link; + ]]></script> + </td> + + + + JavaScript APIJavaScript + In order for BibTemplate to find the slot markers and invoke the slot formatters JavaScript renderer must be instantiated and called. This must be done + for each record that is to contribute to a pages display. The API for this is simple and straight-forward: + The slot formatter is optional, and if not supplied BibTemplate will create a simple function which extracts and returns the text content of the XML Nodes + specified in the CSS3 selector.CSS3 + Example of a slot formatter: + + dojo.require('openils.BibTemplate'); // Tell Dojo to load BibTemplate, if it is not already loaded + + // Create a renderer supplying the record id and the short name of the org unit, if known, + // and call the render() method + new openils.BibTemplate({ record : new CGI().param('r'), org_unit : here.shortname() }).render(); + + The argument hash supplied to the new openils.BibTemplate() constructor can have the following properties: + + record – The bibliographic record ID. + org_unit – The relevant Organizational Unit, used to restrict holdings scope as on a search result or record detail page. + root – The root element within the web page that BibTemplate should search for slot markers + + + + BibTemplate Examples + This is all that we had to add to display the contents of an arbitrary MARC field: + +<tr> + <td>Bibliography note</td> + <td type='opac/slot-data' query='datafield[tag=504]'></td> +</tr> + + If multiple fields match, they are displayed on consecutive lines within the same left-hand cell. + To display a specific MARC subfield, add that subfield to the query attribute.MARC + For example, subfield $a is the only user-oriented subfield in field 586 (Awards Note) + +<tr> + <td>Awards note</td> + <td type='opac/slot-data' query='datafield[tag=586] subfield[code=a]'></td> +</tr> + + Hide empty rows by default, and display them only if they have content: + + <tr class='hide_me' id='tag504'> + <td>Bibliographic note</td> + <td type='opac/slot-data' query='datafield[tag=504]'> + <script type='opac/slot-format'><![CDATA[ + dojo.query('#tag504').removeClass('hide_me'); + return '<span>' + dojox.data.dom.textContent(item) + + '</span><br/>'; + ]]></script> + </td></tr> + + + <![CDATA[ ... ]]> tells Evergreen Web server to treat the contents as literal character data - + avoids hilarity of entity substitution + <script type='opac/slot-format'>...</script>, contained within an opac/slot-data element, receives a variable named item + containing the results of the query (a NodeList) + + Suppressing a subfield: + +<tr class='hide_me' id='tag700'> + <td>Additional authors</td> + <td type='opac/slot-data' query='datafield[tag=700]'> + <script type='opac/slot-format'><![CDATA[ + dojo.query('#tag700').removeClass('hide_me'); + var text = ''; + var list = dojo.query('subfield:not([code=4])', item); + for (var i =0; i < list.length; i++) { + text += dojox.data.dom.textContent(list[i]) + ' '; + } + return '<span>' + text + '</span><br/>'; + ]]></script> + </td></tr> + + +
+
+ Customizing the Slimpac + The Slimpac is the an alternative OPAC display for browsers or devices without JavaScript or which may have screen size limitations. There is both a simple and advanced search + option for the Slimpac. + The html files for customizing the Slimpac search display are located in the folder /openils/var/web/opac/extras/slimpac. + start.html is the basic search display and advanced.html is the display for the advanced search option. + By default, the Slimpac files include the same locale dtd as the regular OPAC (opac.dtd). However, the slimpac files do not use the same CSS files as the + regular OPAC which means that if you change the OPAC color scheme, you must also edit the Slimpac files. + + Customizing the Slimpac Results Display + Two files control the display results for the slimpac. Edit the XSL stylesheet (/openils/var/xsl/ATOM2XHTML.xsl) to edit the elements of the + record as pulled from the XML output. + You may also change the style of the page by editing the CSS stylesheet for the results display (/openils/var/web/opac/extras/os.css). + + + Customizing the Slimpac Details/Holdings Display + It is also possible to customize the details page when viewing specific items from the results list. To edit the holdings display which contains the details of the specific + record linked from the results display, edit the CSS stylesheet for the holdings/details page + (/openils/var/web/opac/extras/htmlcard.css). You may also control the content of the record by editing MARC21slim2HTMLCard.xsl. + Holdings data may also be controlled by editing MARC21slim2HTMLCard-holdings.xsl. + +
+
+ Integrating a Evergreen Search Form on a Web Page + It is possible to embed a simple search form into an html page which will allow users to search for materials in your Evergreen catalog. Here is code which can be embedded + anywhere in the body of your web page: + +
+ Quick Catalog Search:
+ + + + + + + +
+ +]]> +
+ Replace [domain name] with the domain name of your Evergreen server and replace [locale] with the desired locale of + your Evergreen instance (e.g. en-US). This does a basic keyword search. Different types of searches and more advanced search forms can be developed. For further information on the url parameters used by Evergreen, see for more details. +
+
+ diff --git a/2.0/development/customizingstaffclient.xml b/2.0/development/customizingstaffclient.xml new file mode 100644 index 0000000..dbe1fa4 --- /dev/null +++ b/2.0/development/customizingstaffclient.xml @@ -0,0 +1,54 @@ + + + Customizing the Staff Client + staff clientcustomizing + This chapter will give you some guidance on customizing the staff client. + The files related to the staff client are located in the directory /openils/var/web/xul/[staff client version]/server/ +
+ Changing Colors and Images + To change or adjust the image on the main screen edit /openils/var/web/xul/index.xhtml. By default, the image on this page is + main_logo.jpg which is the same main logo used in the OPAC. + To adjust colors on various staff client pages edit the corresponding cascading style sheets located in + /openils/var/web/xul/[staff client version]/server/skin/. Other display aspects can also be adjusted using these cascading style sheets. +
+
+ Changing Labels and Messages + staff clientcustomizinglabels and messages + You can customize labels in the staff client by editing the corresponding DTD files. The staff client uses the same lang.dtd used by the OPAC. This file is located in /openils/var/web/opac/locale/[your locale]. Other labels are controlled by the staff client specific lang.dtd file in /openils/var/web/xul/client version]/server/locale/[your locale]/. +
+
+ Changing the Search Skin + There are a few ways to change the custom skin for OPAC searching in staff client. + + Changing the Search Skin on Server - Overriding Local Settings + To change the opac search skins used by the staff client create a file named custom.js and place it in the + /openils/var/web/xul/[staff client version]/server/skin/ directory. This will effect all staff clients since these settings will + override local settings. + For example, the following text in custom.js would set the staff client opac, details page, results page and browse function to the craftsman + skin: + +urls['opac'] = '/opac/' + LOCALE + '/skin/craftsman/xml/advanced.xml?nps=1'; +urls['opac_rdetail'] = '/opac/' + LOCALE + '/skin/craftsman/xml/rdetail.xml'; +urls['opac_rresult'] = '/opac/' + LOCALE + '/skin/craftsman/xml/rresult.xml'; +urls['browser'] = '/opac/' + LOCALE + '/skin/craftsman/xml/advanced.xml?nps=1'; + + Restart the staff client to see the changes. + + + Changing the Search Skin on an individual Machine + To change the search skin on an individual machine for personal preferences or needs, edit the file + /[Evergreen staff client path]/build/chrome/content/main/constants.js. + Find the lines which point to the urls for the OPAC and edit accordingly. For example, here is an example to set the opac, details page, results page and browse + function to the craftsman skin: + + 'opac' : '/opac/' + LOCALE + '/skin/craftsman/xml/advanced.xml?nps=1', +'opac_rdetail' : '/opac/' + LOCALE + '/skin/craftsman/xml/rdetail.xml', +'opac_rresult' : '/opac/' + LOCALE + '/skin/craftsman/xml/rresult.xml', +... +'browser' : '/opac/' + LOCALE + '/skin/craftsman/xml/advanced.xml?nps=1', + + After editing this file, save it and restart the staff client for the changes to take effect. + +
+
diff --git a/2.0/development/datamodelsandaccess.xml b/2.0/development/datamodelsandaccess.xml new file mode 100644 index 0000000..93ec246 --- /dev/null +++ b/2.0/development/datamodelsandaccess.xml @@ -0,0 +1,789 @@ + + + + Evergreen Data Models and Access + + + This chapter was taken from Dan Scott's Developer Workshop, February 2010. + +
+ Exploring the Database Schema + The database schema is tied pretty tightly to PostgreSQL. Although PostgreSQLdatabasesPostgreSQL + adheres closely to ANSI SQL standards, the use of schemas, SQL functionsANSI + implemented in both plpgsql and plperl, and PostgreSQL’s native full-text + search would make it… challenging… to port to other database platforms. + A few common PostgreSQL interfaces for poking around the schema and + manipulating data are: + + + + psql (the command line client)databasesPostgreSQLpsql + + + + + pgadminIII (a GUI client).databasesPostgreSQLpgadminIII + + + + Or you can read through the source files in Open-ILS/src/sql/Pg. + Let’s take a quick tour through the schemas, pointing out some highlights + and some key interdependencies: + + + + actor.org_unit → asset.copy_location + + + + + actor.usr → actor.card + + + + + biblio.record_entry → asset.call_number → asset.copy + + + + + config.metabib_field → metabib.*_field_entry + + + + This documentation also contains an Appendix for the Evergreen . +
+
+ Database access methods + You could use direct access to the database via Perl DBI, JDBC, etc, + but Evergreen offers several database CRUD services for + creating / retrieving / updating / deleting data. These avoid tying + you too tightly to the current database schema and they funnel database + access through the same mechanism, rather than tying up connections + with other interfaces. +
+
+ Evergreen Interface Definition Language (IDL) + Evergreen Interface Definition Language (IDL) + Defines properties and required permissions for Evergreen classes. + To reduce network overhead, a given object is identified via a + class-hint and serialized as a JSON array of properties (no named properties). + As of 1.6, fields will be serialized in the order in which they appear + in the IDL definition file, and the is_new / is_changed / is_deleted + properties are automatically added. This has greatly reduced the size of + the fm_IDL.xml file and makes DRY people happier :) + + + + … oils_persist:readonly tells us, if true, that the data lives in the database, but is pulled from the SELECT statement defined in the <oils_persist:source_definition> + child element + + + + + IDL basic example (config.language_map) + +<class id="clm" controller="open-ils.cstore open-ils.pcrud" + oils_obj:fieldmapper="config::language_map" + oils_persist:tablename="config.language_map" + reporter:label="Language Map" oils_persist:field_safe="true"> + <fields oils_persist:primary="code" oils_persist:sequence=""> + <field reporter:label="Language Code" name="code" + reporter:selector="value" reporter:datatype="text"/> + <field reporter:label="Language" name="value" + reporter:datatype="text" oils_persist:i18n="true"/> + </fields> + <links/> + <permacrud xmlns="http://open-ils.org/spec/opensrf/IDL/permacrud/v1"> + <actions> + <create global_required="true" permission="CREATE_MARC_CODE"> + <retrieve global_required="true" + permission="CREATE_MARC_CODE UPDATE_MARC_CODE DELETE_MARC_CODE"> + <update global_required="true" permission="UPDATE_MARC_CODE"> + <delete global_required="true" permission="DELETE_MARC_CODE"> + </actions> + </permacrud> +</class> + + + + + The class element defines the attributes and permissions for classes, + and relationships between classes. + + Evergreen Interface Definition Language (IDL)class element + + + + The id attribute on the class element defines the class hint that is + used everywhere in Evergreen. + + + + + The controller attribute defines the OpenSRF + services that provide access to the data for the class objects. + + + + + + + The oils_obj::fieldmapper attribute defines the name of the class that + is generated by OpenILS::Utils::Fieldmapper. + + + + + The oils_persist:tablename attribute defines the name of the table + that contains the data for the class objects. + + + + + The reporter interface uses reporter:label attribute values in + the source list to provide meaningful class and attribute names. The + open-ils.fielder service generates a set of methods that provide direct + access to the classes for which oils_persist:field_safe is true. For + example, + + + +srfsh# request open-ils.fielder open-ils.fielder.clm.atomic \ +{"query":{"code":{"=":"eng"}}} + +Received Data: [ + { + "value":"English", + "code":"eng" + } +] + + + + + + The fields element defines the list of fields for the class. + + Evergreen Interface Definition Language (IDL)fields element + + + + The oils_persist:primary attribute defines the column that acts as + the primary key for the table. + + + + + The oils_persist:sequence attribute holds the name of the database + sequence. + + + + + + + Each field element defines one property of the class. + + Evergreen Interface Definition Language (IDL)field element + + + + The name attribute defines the getter/setter method name for the field. + + + + + The reporter:label attribute defines the attribute name as used in + the reporter interface. + + + + + The reporter:selector attribute defines the field used in the reporter + filter interface to provide a selectable list. This gives the user a more + meaningful access point than the raw numeric ID or abstract code. + + + + + The reporter:datatype attribute defines the type of data held by + this property for the purposes of the reporter. + + + + + + + The oils_persist:i18n attribute, when true, means that + translated values for the field’s contents may be accessible in + different locales. + + + + + Evergreen Interface Definition Language (IDL)permacrud element + The permacrud element defines the permissions (if any) required + to create, retrieve, update, + and delete data for this + class. open-ils.permacrud must be defined as a controller for the class + for the permissions to be applied. + + + + + + Each action requires one or more permission values that the + user must possess to perform the action. + + + + + If the global_required attribute is true, then the user must + have been granted that permission globally (depth = 0) to perform + the action. + + + + + The context_field attribute denotes the <field> that identifies + the org_unit at which the user must have the pertinent permission. + + + + Evergreen Interface Definition Language (IDL)action element + + An action element may contain a <context_field> element that + defines the linked class (identified by the link attribute) and + the field in the linked class that identifies the org_unit where + the permission must be held. + + + + Evergreen Interface Definition Language (IDL)context_field element + + If the <context_field> element contains a jump attribute, + then it defines a link to a link to a class with a field identifying + the org_unit where the permission must be held. + + + + + + + + + + Reporter data types and their possible values + + + + bool: Boolean true or false + + + + + id: ID of the row in the database + + + + + int: integer value + + + + + interval: PostgreSQL time interval + + + + + link: link to another class, as defined in the <links> + element of the class definition + + + + + money: currency amount + + + + + org_unit: list of org_units + + + + + text: text value + + + + + timestamp: PostgreSQL timestamp + + + + + + IDL example with linked fields (actor.workstation) + Just as tables often include columns with foreign keys that point + to values stored in the column of a different table, IDL classes + can contain fields that link to fields in other classes. The <links> + element defines which fields link to fields in other classes, and + the nature of the relationship: + +<class id="aws" controller="open-ils.cstore" + oils_obj:fieldmapper="actor::workstation" + oils_persist:tablename="actor.workstation" + reporter:label="Workstation"> + <fields oils_persist:primary="id" + oils_persist:sequence="actor.workstation_id_seq"> + <field reporter:label="Workstation ID" name="id" + reporter:datatype="id"/> + <field reporter:label="Workstation Name" name="name" + reporter:datatype="text"/> + <field reporter:label="Owning Library" name="owning_lib" + reporter:datatype="org_unit"/> + <field reporter:label="Circulations" name="circulations" + oils_persist:virtual="true" reporter:datatype="link"/> + </fields> + <links> + <link field="owning_lib" reltype="has_a" key="id" + map="" class="aou"/> + <link field="circulations" reltype="has_many" key="workstation" + map="" class="circ"/> + <link field="circulation_checkins" reltype="has_many" + key="checkin_workstation" map="" class="circ"/> + </links> +</class> + + + + + This field includes an oils_persist:virtual attribute with the value of + true, meaning that the linked class circ is a virtual class. + + + + + The <links> element contains 0 or more <link> elements. + + + + + Each <link> element defines the field (field) that links to a different + class (class), the relationship (rel_type) between this field and the target + field (key). If the field in this class links to a virtual class, the (map) + attribute defines the field in the target class that returns a list of matching + objects for each object in this class. + + + + +
+
+ <literal>open-ils.cstore</literal> data access interfaces + cstore + For each class documented in the IDL, the open-ils.cstore service + automatically generates a set of data access methods, based on the + oils_persist:tablename class attribute. + For example, for the class hint clm, cstore generates the following + methods with the config.language_map qualifer: + + + + open-ils.cstore.direct.config.language_map.id_list {"code" { "like": "e%" } } + + Retrieves a list composed only of the IDs that match the query. + + + + open-ils.cstore.direct.config.language_map.retrieve "eng" + + Retrieves the object that matches a specific ID. + + + + open-ils.cstore.direct.config.language_map.search {"code" : "eng"} + + Retrieves a list of objects that match the query. + + + + open-ils.cstore.direct.config.language_map.create <_object_> + + Creates a new object from the passed in object. + + + + open-ils.cstore.direct.config.language_map.update <_object_> + + Updates the object that has been passed in. + + + + open-ils.cstore.direct.config.language_map.delete "eng" + + Deletes the object that matches the query. + + +
+
+ open-ils.pcrud data access interfaces + pcrud + For each class documented in the IDL, the open-ils.pcrud service + automatically generates a set of data access methods, based on the + oils_persist:tablename class attribute. + For example, for the class hint clm, open-ils.pcrud generates the following + methods that parallel the open-ils.cstore interface: + + + + open-ils.pcrud.id_list.clm <_authtoken_>, { "code": { "like": "e%" } } + + + + + open-ils.pcrud.retrieve.clm <_authtoken_>, "eng" + + + + + open-ils.pcrud.search.clm <_authtoken_>, { "code": "eng" } + + + + + open-ils.pcrud.create.clm <_authtoken_>, <_object_> + + + + + open-ils.pcrud.update.clm <_authtoken_>, <_object_> + + + + + open-ils.pcrud.delete.clm <_authtoken_>, "eng" + + + +
+
+ Transaction and savepoint control + Both open-ils.cstore and open-ils.pcrud enable you to control database transactions + to ensure that a set of operations either all succeed, or all fail, + atomically: + + + + open-ils.cstore.transaction.begin + + + + + open-ils.cstore.transaction.commit + + + + + open-ils.cstore.transaction.rollback + + + + + open-ils.pcrud.transaction.begin + + + + + open-ils.pcrud.transaction.commit + + + + + open-ils.pcrud.transaction.rollback + + + + At a more granular level, open-ils.cstore and open-ils.pcrud enable you to set database + savepoints to ensure that a set of operations either all succeed, or all + fail, atomically, within a given transaction: + + + + open-ils.cstore.savepoint.begin + + + + + open-ils.cstore.savepoint.commit + + + + + open-ils.cstore.savepoint.rollback + + + + + open-ils.pcrud.savepoint.begin + + + + + open-ils.pcrud.savepoint.commit + + + + + open-ils.pcrud.savepoint.rollback + + + + Transactions and savepoints must be performed within a stateful + connection to the open-ils.cstore and open-ils.pcrud services. + In srfsh, you can open a stateful connection using the open + command, and then close the stateful connection using the close + command - for example: + srfsh# open open-ils.cstore + ... perform various transaction-related work + srfsh# close open-ils.cstore + + JSON Queries + JSON + Beyond simply retrieving objects by their ID using the \*.retrieve + methods, you can issue queries against the \*.delete and \*.search + methods using JSON to filter results with simple or complex search + conditions. + For example, to generate a list of barcodes that are held in a + copy location that allows holds and is visible in the OPAC: + +srfsh# request open-ils.cstore open-ils.cstore.json_query + {"select": {"acp":["barcode"], "acpl":["name"]}, + "from": {"acp":"acpl"}, + "where": [ + {"+acpl": "holdable"}, + {"+acpl": "opac_visible"} + ]} + +Received Data: { + "barcode":"BARCODE1", + "name":"Stacks" +} + +Received Data: { + "barcode":"BARCODE2", + "name":"Stacks" +} + + + + + Invoke the json_query service. + + + + + Select the barcode field from the acp class and the name + field from the acpl class. + + + + + Join the acp class to the acpl class based on the linked field + defined in the IDL. + + + + + Add a where clause to filter the results. We have more than one + condition beginning with the same key, so we wrap the conditions inside + an array. + + + + + The first condition tests whether the boolean value of the holdable + field on the acpl class is true. + + + + + The second condition tests whether the boolean value of the + opac_visible field on the acpl class is true. + + + + For thorough coverage of the breadth of support offered by JSON + query syntax, see JSON Queries: A Tutorial. + + + Fleshing linked objects + A simplistic approach to retrieving a set of objects that are linked to + an object that you are retrieving - for example, a set of call numbers + linked to the barcodes that a given user has borrowed - would be to: + 1. Retrieve the list of circulation objects (circ class) + for a given user (usr class). + 2. For each circulation object, look up the target copy (target_copy + field, linked to the acp class). + 3. For each copy, look up the call number for that copy (call_number + field, linked to the acn class). + However, this would result in potentially hundreds of round-trip + queries from the client to the server. Even with low-latency connections, + the network overhead would be considerable. So, built into the open-ils.cstore and + open-ils.pcrud access methods is the ability to flesh linked fields - + that is, rather than return an identifier to a given linked field, + the method can return the entire object as part of the initial response. + Most of the interfaces that return class instances from the IDL offer the + ability to flesh returned fields. For example, the + open-ils.cstore.direct.\*.retrieve methods allow you to specify a + JSON structure defining the fields you wish to flesh in the returned object. + Fleshing fields in objects returned by <literal>open-ils.cstore</literal> + +srfsh# request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 1, + "flesh_fields": { + "acp": ["location"] + } + } + + + + + + The flesh argument is the depth at which objects should be fleshed. + For example, to flesh out a field that links to another object that includes + a field that links to another object, you would specify a depth of 2. + + + + + The flesh_fields argument contains a list of objects with the fields + to flesh for each object. + + + + Let’s flesh things a little deeper. In addition to the copy location, + let’s also flesh the call number attached to the copy, and then flesh + the bibliographic record attached to the call number. + Fleshing fields in fields of objects returned by <literal>open-ils.cstore</literal> + +request open-ils.cstore open-ils.cstore.direct.asset.copy.retrieve 1, \ + { + "flesh": 2, + "flesh_fields": { + "acp": ["location", "call_number"], + "acn": ["record"] + } + } + + + +
+
+ Adding an IDL entry for ResolverResolver + Most OpenSRF methods in Evergreen define their object interface in the + IDL. Without an entry in the IDL, the prospective caller of a given + method is forced to either call the method and inspect the returned + contents, or read the source to work out the structure of the JSON + payload. At this stage of the tutorial, we have not defined an entry + in the IDL to represent the object returned by the + open-ils.resolver.resolve_holdings method. It is time to complete + that task. + The open-ils.resolver service is unlike many of the other classes + defined in the IDL because its data is not stored in the Evergreen + database. Instead, the data is requested from an external Web service + and only temporarily cached in memcached. Fortunately, the IDL + enables us to represent this kind of class by setting the + oils_persist:virtual class attribute to true. + So, let’s add an entry to the IDL for the open-ils.resolver.resolve_holdings + service: + + And let’s make ResolverResolver.pm return an array composed of our new + rhr classes rather than raw JSON objects: + + Once we add the new entry to the IDL and copy the revised ResolverResolver.pm + Perl module to /openils/lib/perl5/OpenILS/Application/, we need to: + + + + Copy the updated IDL to both the /openils/conf/ and + /openils/var/web/reports/ directories. The Dojo approach to + parsing the IDL uses the IDL stored in the reports directory. + + + + + Restart the Perl services to make the new IDL visible to the services + and refresh the open-ils.resolver implementation + + + + + Rerun /openils/bin/autogen.sh to regenerate the JavaScript versionsautogen + of the IDL required by the HTTP translator and gateway. + + + + We also need to adjust our JavaScript client to use the nifty newJavaScript + objects that open-ils.resolver.resolve_holdings now returns. + The best approach is to use the support in Evergreen’s Dojo extensionsDojo toolkit + to generate the JavaScript classes directly from the IDL XML file. + Accessing classes defined in the IDL via Fieldmapper + + + + + + Load the Dojo core. + + + + + fieldmapper.AutoIDL reads /openils/var/reports/fm_IDL.xml to + generate a list of class properties. + + + + + fieldmapper.dojoData seems to provide a store for Evergreen data + accessed via Dojo. + + + + + fieldmapper.Fieldmapper converts the list of class properties into + actual classes. + + + + + fieldmapper.standardRequest invokes an OpenSRF method and returns + an array of objects. + + + + + The first argument to fieldmapper.standardRequest is an array + containing the OpenSRF service name and method name. + + + + + The second argument to fieldmapper.standardRequest is an array + containing the arguments to pass to the OpenSRF method. + + + + + As Fieldmapper has instantiated the returned objects based on their + class hints, we can invoke getter/setter methods on the objects. + + + +
+ +
diff --git a/2.0/development/directoriesandFiles.xml b/2.0/development/directoriesandFiles.xml new file mode 100644 index 0000000..eae71b7 --- /dev/null +++ b/2.0/development/directoriesandFiles.xml @@ -0,0 +1,140 @@ + + + + Evergreen File Structure and Configuration Files + This section will describe the basic file structure and cover key configuration files. Understanding the directory and file structure of Evergreen will allow you + to be able to customize your Evergreen software and take full advantage of many features. + + +
+ Evergreen Directory Structure + This is the top level directory structure of Evergreen located in the default installation directory /openils: + + Evergreen Directory Structure + + + + + + Directory + Description + + + + + bin + Contains many critical Perl and shell scripts such as autogen.sh and + oils.ctl. + + + conf + Contains the configuration scripts including the two most important base configuration files opensrf_core.xml and opensrf.xml. + + + include + Contains the header files used by the scripts written in C. + + + lib + Contains the core code of Evergreen including the C code and perl modules. In particular, the perl modules in the + subdirectoryperl5/OpenILS + are of particular interest to developers. + + + var + Largest directory and includes the web directories (web), lock pid fies + (run), circ setting files (circ) templates + (templates) and log (templates and + data) files. + + + +
+
+ Evergreen Configuration Files + + + Key Evergreen Configuration Files + + + + + + File + Description + + + + + /openils/conf/opensrf_core.xmlconfiguration files + opensrf_core.xml + + Files which controls which Evergreen services are run on the public and private routers. For a service to run, it must be registered in this file. This file also controls the loglevel and points to the log file for the services. An Evergreen restart is required for changes to take effect. + + + /openils/conf/opensrf.xmlconfiguration filesopensrf.xml + Use this file to set directory locations, the default locale, default notice settings and settings for all Evergreen services. + It is critical for any administrator to understand the settings in this file. An Evergreen restart is required for changes to take effect. + + + /openils/conf/fm_IDL.xml configuration filesfm_IDL.xml + + Used for linking the OpenSRF/Evergreen services to the Evergreen database tables. An Evergreen restart is required for changes to take + effect. Running autogen.sh is also required. + + + /etc/apache2/eg_vhost.confconfiguration filesApache + Controls the Evergreen virtual site. Allows to configure the skin for the OPAC or configure various directories within the Apache web server. + An Apache restart is required for changes to this file to take effect. + + + +
+ + Useful Evergreen Scripts + + + + + + File + Description + + + + + /openils/bin/autogen.shautogen + Used to update changes to org units and the fm_IDL.xml file. Will generate web and staff client pages based on contents of files and + Evergreen database entries. + + + /openils/bin/clark-kent.plreportsstarting + + Perl script for starting the reporter. + + + /openils/bin/action_trigger_runner.plaction triggers + runner + Perl script used to trigger the actions set up in the action trigger tool in the staff client. + + + /openils/bin/osrf_ctl.sh + The start up script for OpenSRF and Evergreen. + + + /openils/bin/reshelving_ complete.srfsh + Change status from reshelving to available for items which have been in reshelving for a certain amount of time. + + + /openils/bin/srfshsrfsh + + Used to start the OpenSRF shell. + + + + +
+
+
+
diff --git a/2.0/development/installchecklist.xml b/2.0/development/installchecklist.xml new file mode 100644 index 0000000..f4af922 --- /dev/null +++ b/2.0/development/installchecklist.xml @@ -0,0 +1,84 @@ + + + + Evergreen Installation Checklist + + This appendix is a checklist of things to do to istall and configure Evergreen. It will refer to the necessary chapter with the specific instructions for each item. + + + + + Install OpenSRFOpenSRF + + + Install Evergreen server software + + + Install Evergreen staff clientstaff client + + + Establish a back up strategy for Evergreen data and files + + + Configure PostgreSQL for better performancePostgreSQL + + + Configure Evergreen error logginglogs + + + Set up organizational unit typesorganizational unit types + + + Set up organizational unitsorganizational units + + + Customize localization and languages (optional)localization and languages + + + Add circ modifierscirculation modifiers + + + Configure copy statusescopy status + + + Add cataloguing templatescataloguing templates + + + Add user groups and assign permissionspermissions + + + Adjust various Local Administration Settings + + + Adjust circulation policies and penalty threshholds for groupspenalty threshholds + + + Add staff users + + + Customize OPAC as neededOPACcustomizing + + + Import datamigratingimporting data + + + Start the reporter service and set up reportsreportsstarting + + + Set up email notifications for holds and overdue itemsnotifications + + + Set up action triggersaction triggers + + + Set up Z39.50 server (optional)Z39.50 + + + Adjust search relevancy settings if required (optional)search relevancy + + + Install SIP server (optional) - for communications with automated devices such as self check stations, autmated sorters and other devices using SIPSIP + + + diff --git a/2.0/development/introduction_to_sql.xml b/2.0/development/introduction_to_sql.xml new file mode 100644 index 0000000..a6080d3 --- /dev/null +++ b/2.0/development/introduction_to_sql.xml @@ -0,0 +1,2126 @@ + + + + Introduction to SQL for Evergreen Administrators + + + This chapter was taken from Dan Scott's Introduction to SQL for Evergreen Administrators, February 2010. + +
+ Introduction to SQL Databases + sql + + Introduction + Over time, the SQL database has become the standard method of storing, + retrieving, and processing raw data for applications. Ranging from embedded + databases such as SQLite and Apache Derby, to enterprise databases such as + Oracle and IBM DB2, any SQL database offers basic advantages to application + developers such as standard interfaces (Structured Query Language (SQL), Java + Database Connectivity (JDBC), Open Database Connectivity (ODBC), Perl Database + Independent Interface (DBI)), a standard conceptual model of data (tables, + fields, relationships, constraints, etc), performance in storing and retrieving + data, concurrent access, etc. + Evergreen is built on PostgreSQL, an open source SQL database that began as + POSTGRES at the University of California at Berkeley in 1986 as a research + project led by Professor Michael Stonebraker. A SQL interface was added to a + fork of the original POSTGRES Berkelely code in 1994, and in 1996 the project + was renamed PostgreSQL. + + + Tables + sqltables + The table is the cornerstone of a SQL database. Conceptually, a database table + is similar to a single sheet in a spreadsheet: every table has one or more + columns, with each row in the table containing values for each column. Each + column in a table defines an attribute corresponding to a particular data type. + We’ll insert a row into a table, then display the resulting contents. Don’t + worry if the INSERT statement is completely unfamiliar, we’ll talk more about + the syntax of the insert statement later. + <literal>actor.usr_note</literal> database table + +evergreen=# INSERT INTO actor.usr_note (usr, creator, pub, title, value) + VALUES (1, 1, TRUE, 'Who is this guy?', 'He''s the administrator!'); + +evergreen=# select id, usr, creator, pub, title, value from actor.usr_note; + id | usr | creator | pub | title | value +----+-----+---------+-----+------------------+------------------------- + 1 | 1 | 1 | t | Who is this guy? | He's the administrator! +(1 rows) + + + PostgreSQL supports table inheritance, which lets you define tables that + inherit the column definitions of a given parent table. A search of the data in + the parent table includes the data in the child tables. Evergreen uses table + inheritance: for example, the action.circulation table is a child of the + money.billable_xact table, and the money.*_payment tables all inherit from + the money.payment parent table. + + + Schemas + PostgreSQL, like most SQL databases, supports the use of schema names to group + collections of tables and other database objects together. You might think of + schemas as namespaces if you’re a programmer; or you might think of the schema + / table / column relationship like the area code / exchange / local number + structure of a telephone number. + + Examples: database object names + + + + + + + + + + Full name + Schema name + Table name + Field name + + + + + actor.usr_note.title + actor + usr_note + title + + + biblio.record_entry.marc + biblio + record_entry + marc + + + +
+ The default schema name in PostgreSQL is public, so if you do not specify a + schema name when creating or accessing a database object, PostgreSQL will use + the public schema. As a result, you might not find the object that you’re + looking for if you don’t use the appropriate schema. + Example: Creating a table without a specific schema + +evergreen=# CREATE TABLE foobar (foo TEXT, bar TEXT); +CREATE TABLE +evergreen=# \d foobar + Table "public.foobar" + Column | Type | Modifiers +--------+------+----------- + foo | text | + bar | text | + + + Example: Trying to access a unqualified table outside of the public schema + evergreen=# SELECT * FROM usr_note; + ERROR: relation "usr_note" does not exist + LINE 1: SELECT * FROM usr_note; + ^ + + Evergreen uses schemas to organize all of its tables with mostly intuitive, + if short, schema names. Here’s the current (as of 2010-01-03) list of schemas + used by Evergreen: + + Evergreen schema names + + + + + + + + Schema name + Description + + + + + acq + Acquisitions + + + action + Circulation actions + + + action_trigger + Event mechanisms + + + actor + Evergreen users and organization units + + + asset + Call numbers and copies + + + auditor + Track history of changes to selected tables + + + authority + Authority records + + + biblio + Bibliographic records + + + booking + Resource bookings + + + config + Evergreen configurable options + + + container + Buckets for records, call numbers, copies, and users + + + extend_reporter + Extra views for report definitions + + + metabib + Metadata about bibliographic records + + + money + Fines and bills + + + offline + Offline transactions + + + permission + User permissions + + + query + Stored SQL statements + + + reporter + Report definitions + + + search + Search functions + + + serial + Serial MFHD records + + + stats + Convenient views of circulation and asset statistics + + + vandelay + MARC batch importer and exporter + + + +
+ The term schema has two meanings in the world of SQL databases. We have + discussed the schema as a conceptual grouping of tables and other database + objects within a given namespace; for example, "the actor schema contains the + tables and functions related to users and organizational units". Another common + usage of schema is to refer to the entire data model for a given database; + for example, "the Evergreen database schema". +
+ + Columns + Each column definition consists of: + + + + a data type + + + + + (optionally) a default value to be used whenever a row is inserted that + does not contain a specific value + + + + + (optionally) one or more constraints on the values beyond data type + + + + Although PostgreSQL supports dozens of data types, Evergreen makes our life + easier by only using a handful. + + PostgreSQL data types used by Evergreen + + + + + + + + + Type name + Description + Limits + + + + + INTEGER + Medium integer + -2147483648 to +2147483647 + + + BIGINT + Large integer + -9223372036854775808 to 9223372036854775807 + + + SERIAL + Sequential integer + 1 to 2147483647 + + + BIGSERIAL + Large sequential integer + 1 to 9223372036854775807 + + + TEXT + Variable length character data + Unlimited length + + + BOOL + Boolean + TRUE or FALSE + + + TIMESTAMP WITH TIME ZONE + Timestamp + 4713 BC to 294276 AD + + + TIME + Time + Expressed in HH:MM:SS + + + NUMERIC(precision, scale) + Decimal + Up to 1000 digits of precision. In Evergreen mostly used for money + values, with a precision of 6 and a scale of 2 (####.##). + + + +
+ Full details about these data types are available from the + data types section of + the PostgreSQL manual. +
+ + Constraints + + Prevent NULL values + A column definition may include the constraint NOT NULL to prevent NULL + values. In PostgreSQL, a NULL value is not the equivalent of zero or false or + an empty string; it is an explicit non-value with special properties. We’ll + talk more about how to work with NULL values when we get to queries. + + + Primary key + Every table can have at most one primary key. A primary key consists of one or + more columns which together uniquely identify each row in a table. If you + attempt to insert a row into a table that would create a duplicate or NULL + primary key entry, the database rejects the row and returns an error. + Natural primary keys are drawn from the intrinsic properties of the data being + modelled. For example, some potential natural primary keys for a table that + contains people would be: + + Example: Some potential natural primary keys for a table of people + + + + + + + + + Natural key + Pros + Cons + + + + + First name, last name, address + No two people with the same name would ever live at the same address, right? + Lots of columns force data duplication in referencing tables + + + SSN or driver’s license + These are guaranteed to be unique + Lots of people don’t have an SSN or a driver’s license + + + +
+ To avoid problems with natural keys, many applications instead define surrogate + primary keys. A surrogate primary keys is a column with an autoincrementing + integer value added to a table definition that ensures uniqueness. + Evergreen uses surrogate keys (a column named id with a SERIAL data type) + for most of its tables. +
+ + Foreign keys + Every table can contain zero or more foreign keys: one or more columns that + refer to the primary key of another table. + For example, let’s consider Evergreen’s modelling of the basic relationship + between copies, call numbers, and bibliographic records. Bibliographic records + contained in the biblio.record_entry table can have call numbers attached to + them. Call numbers are contained in the asset.call_number table, and they can + have copies attached to them. Copies are contained in the asset.copy table. + + Example: Evergreen’s copy / call number / bibliographic record relationships + + + + + + + + + + Table + Primary key + Column with a foreign key + Points to + + + + + asset.copy + asset.copy.id + asset.copy.call_number + asset.call_number.id + + + asset.call_number + asset.call_number.id + asset.call_number.record + biblio.record_entry.id + + + biblio.record_entry + biblio.record_entry.id + + + + + +
+
+ + Check constraints + PostgreSQL enables you to define rules to ensure that the value to be inserted + or updated meets certain conditions. For example, you can ensure that an + incoming integer value is within a specific range, or that a ZIP code matches a + particular pattern. + +
+ + Deconstructing a table definition statement + The actor.org_address table is a simple table in the Evergreen schema that + we can use as a concrete example of many of the properties of databases that + we have discussed so far. + +CREATE TABLE actor.org_address ( + id SERIAL PRIMARY KEY, + valid BOOL NOT NULL DEFAULT TRUE, + address_type TEXT NOT NULL DEFAULT 'MAILING', + org_unit INT NOT NULL REFERENCES actor.org_unit (id) + DEFERRABLE INITIALLY DEFERRED, + street1 TEXT NOT NULL, + street2 TEXT, + city TEXT NOT NULL, + county TEXT, + state TEXT NOT NULL, + country TEXT NOT NULL, + post_code TEXT NOT NULL +); + + + + + The column named id is defined with a special data type of SERIAL; if + given no value when a row is inserted into a table, the database automatically + generates the next sequential integer value for the column. SERIAL is a + popular data type for a primary key because it is guaranteed to be unique - and + indeed, the constraint for this column identifies it as the PRIMARY KEY. + + + + + The data type BOOL defines a boolean value: TRUE or FALSE are the only + acceptable values for the column. The constraint NOT NULL instructs the + database to prevent the column from ever containing a NULL value. The column + property DEFAULT TRUE instructs the database to automatically set the value + of the column to TRUE if no value is provided. + + + + + The data type TEXT defines a text column of practically unlimited length. + As with the previous column, there is a NOT NULL constraint, and a default + value of 'MAILING' will result if no other value is supplied. + + + + + The REFERENCES actor.org_unit (id) clause indicates that this column has a + foreign key relationship to the actor.org_unit table, and that the value of + this column in every row in this table must have a corresponding value in the + id column in the referenced table (actor.org_unit). + + + + + The column named street2 demonstrates that not all columns have constraints + beyond data type. In this case, the column is allowed to be NULL or to contain a + TEXT value. + + + + + + Displaying a table definition using <literal>psql</literal> + The psql command-line interface is the preferred method for accessing + PostgreSQL databases. It offers features like tab-completion, readline support + for recalling previous commands, flexible input and output formats, and + is accessible via a standard SSH session. + If you press the Tab key once after typing one or more characters of the + database object name, psql automatically completes the name if there are no + other matches. If there are other matches for your current input, nothing + happens until you press the Tab key a second time, at which point psql + displays all of the matches for your current input. + To display the definition of a database object such as a table, issue the + command \d _object-name_. For example, to display the definition of the + actor.usr_note table: + +$ psql evergreen +psql (8.4.1) +Type "help" for help. + +evergreen=# \d actor.usr_note + Table "actor.usr_note" + Column | Type | Modifiers +-------------+--------------------------+------------------------------------------------------------- + id | bigint | not null default nextval('actor.usr_note_id_seq'::regclass) + usr | bigint | not null + creator | bigint | not null + create_date | timestamp with time zone | default now() + pub | boolean | not null default false + title | text | not null + value | text | not null +Indexes: + "usr_note_pkey" PRIMARY KEY, btree (id) + "actor_usr_note_creator_idx" btree (creator) + "actor_usr_note_usr_idx" btree (usr) +Foreign-key constraints: + "usr_note_creator_fkey" FOREIGN KEY (creator) REFERENCES actor.usr(id) ON ... + "usr_note_usr_fkey" FOREIGN KEY (usr) REFERENCES actor.usr(id) ON DELETE .... + +evergreen=# \q +$ + + + + + This is the most basic connection to a PostgreSQL database. You can use a + number of other flags to specify user name, hostname, port, and other options. + + + + + The \d command displays the definition of a database object. + + + + + The \q command quits the psql session and returns you to the shell prompt. + + + + +
+
+ Basic SQL queries + + The SELECT statement + The SELECT statement is the basic tool for retrieving information from a + database. The syntax for most SELECT statements is: +
+ SELECT [columns(s)] + FROM [table(s)] + [WHERE condition(s)] + [GROUP BY columns(s)] + [HAVING grouping-condition(s)] + [ORDER BY column(s)] + [LIMIT maximum-results] + [OFFSET start-at-result-#] + ; +
+ For example, to select all of the columns for each row in the + actor.usr_address table, issue the following query: + SELECT * + FROM actor.usr_address + ; +
+ + Selecting particular columns from a table + SELECT * returns all columns from all of the tables included in your query. + However, quite often you will want to return only a subset of the possible + columns. You can retrieve specific columns by listing the names of the columns + you want after the SELECT keyword. Separate each column name with a comma. + For example, to select just the city, county, and state from the + actor.usr_address table, issue the following query: + SELECT city, county, state + FROM actor.usr_address + ; + + + Sorting results with the ORDER BY clause + By default, a SELECT statement returns rows matching your query with no + guarantee of any particular order in which they are returned. To force + the rows to be returned in a particular order, use the ORDER BY clause + to specify one or more columns to determine the sorting priority of the + rows. + For example, to sort the rows returned from your actor.usr_address query by + city, with county and then zip code as the tie breakers, issue the + following query: + +SELECT city, county, state + FROM actor.usr_address + ORDER BY city, county, post_code +; + + + + Filtering results with the WHERE clause + Thus far, your results have been returning all of the rows in the table. + Normally, however, you would want to restrict the rows that are returned to the + subset of rows that match one or more conditions of your search. The WHERE + clause enables you to specify a set of conditions that filter your query + results. Each condition in the WHERE clause is an SQL expression that returns + a boolean (true or false) value. + For example, to restrict the results returned from your actor.usr_address + query to only those rows containing a state value of Connecticut, issue the + following query: + +SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + ORDER BY city, county, post_code +; + + You can include more conditions in the WHERE clause with the OR and AND + operators. For example, to further restrict the results returned from your + actor.usr_address query to only those rows where the state column contains a + value of Connecticut and the city column contains a value of Hartford, + issue the following query: + +SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND city = 'Hartford' + ORDER BY city, county, post_code +; + + To return rows where the state is Connecticut and the city is Hartford or + New Haven, you must use parentheses to explicitly group the city value + conditions together, or else the database will evaluate the OR city = 'New + Haven' clause entirely on its own and match all rows where the city column is + New Haven, even though the state might not be Connecticut. + Trouble with OR + +SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND city = 'Hartford' OR city = 'New Haven' + ORDER BY city, county, post_code +; + +-- Can return unwanted rows because the OR is not grouped! + + + Grouped OR’ed conditions + +SELECT city, county, state + FROM actor.usr_address + WHERE state = 'Connecticut' + AND (city = 'Hartford' OR city = 'New Haven') + ORDER BY city, county, post_code +; + +-- The parentheses ensure that the OR is applied to the cities, and the +-- state in either case must be 'Connecticut' + + + + Comparison operators + Here is a partial list of comparison operators that are commonly used in + WHERE clauses: + + Comparing two scalar values + + + + x = y (equal to) + + + + + x != y (not equal to) + + + + + x < y (less than) + + + + + x > y (greater than) + + + + + x LIKE y (TEXT value x matches a subset of TEXT y, where y is a string that + can contain % as a wildcard for 0 or more characters, and _ as a wildcard + for a single character. For example, WHERE 'all you can eat fish and chips + and a big stick' LIKE '%fish%stick' would return TRUE) + + + + + x ILIKE y (like LIKE, but the comparison ignores upper-case / lower-case) + + + + + x IN y (x is in the list of values y, where y can be a list or a SELECT + statement that returns a list) + + + + + + + + NULL values + SQL databases have a special way of representing the value of a column that has + no value: NULL. A NULL value is not equal to zero, and is not an empty + string; it is equal to nothing, not even another NULL, because it has no value + that can be compared. + To return rows from a table where a given column is not NULL, use the + IS NOT NULL comparison operator. + Retrieving rows where a column is not <literal>NULL</literal> + +SELECT id, first_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NOT NULL +; + + + Similarly, to return rows from a table where a given column is NULL, use + the IS NULL comparison operator. + Retrieving rows where a column is <literal>NULL</literal> + +SELECT id, first_given_name, second_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NULL +; + + id | first_given_name | second_given_name | family_name +----+------------------+-------------------+---------------- + 1 | Administrator | | System Account +(1 row) + + + Notice that the NULL value in the output is displayed as empty space, + indistinguishable from an empty string; this is the default display method in + psql. You can change the behaviour of psql using the pset command: + Changing the way <literal>NULL</literal> values are displayed in <literal>psql</literal> + +evergreen=# \pset null '(null)' +Null display is '(null)'. + +SELECT id, first_given_name, second_given_name, family_name + FROM actor.usr + WHERE second_given_name IS NULL +; + + id | first_given_name | second_given_name | family_name +----+------------------+-------------------+---------------- + 1 | Administrator | (null) | System Account +(1 row) + + + Database queries within programming languages such as Perl and C have + special methods of checking for NULL values in returned results. + + + Text delimiter: ' + You might have noticed that we have been using the ' character to delimit + TEXT values and values such as dates and times that are TEXT values. Sometimes, + however, your TEXT value itself contains a ' character, such as the word + you’re. To prevent the database from prematurely ending the TEXT value at the + first ' character and returning a syntax error, use another ' character to + escape the following ' character. + For example, to change the last name of a user in the actor.usr table to + L’estat, issue the following SQL: + Escaping <literal>'</literal> in TEXT values + +UPDATE actor.usr + SET family_name = 'L''estat' + WHERE profile IN ( + SELECT id + FROM permission.grp_tree + WHERE name = 'Vampire' + ) + ; + + When you retrieve the row from the database, the value is displayed with just + a single ' character: + +SELECT id, family_name + FROM actor.usr + WHERE family_name = 'L''estat' +; + + id | family_name +----+------------- + 1 | L'estat +(1 row) + + + + Grouping and eliminating results with the GROUP BY and HAVING clauses + The GROUP BY clause returns a unique set of results for the desired columns. + This is most often used in conjunction with an aggregate function to present + results for a range of values in a single query, rather than requiring you to + issue one query per target value. + Returning unique results of a single column with <literal>GROUP BY</literal> + +SELECT grp + FROM permission.grp_perm_map + GROUP BY grp + ORDER BY grp; + + grp +-----+ + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 10 +(8 rows) + + + While GROUP BY can be useful for a single column, it is more often used + to return the distinct results across multiple columns. For example, the + following query shows us which groups have permissions at each depth in + the library hierarchy: + Returning unique results of multiple columns with <literal>GROUP BY</literal> + +SELECT grp, depth + FROM permission.grp_perm_map + GROUP BY grp, depth + ORDER BY depth, grp; + + grp | depth +-----+------- + 1 | 0 + 2 | 0 + 3 | 0 + 4 | 0 + 5 | 0 + 10 | 0 + 3 | 1 + 4 | 1 + 5 | 1 + 6 | 1 + 7 | 1 + 10 | 1 + 3 | 2 + 4 | 2 + 10 | 2 +(15 rows) + + + Extending this further, you can use the COUNT() aggregate function to + also return the number of times each unique combination of grp and depth + appears in the table. Yes, this is a sneak peek at the use of aggregate + functions! Keeners. + Counting unique column combinations with <literal>GROUP BY</literal> + +SELECT grp, depth, COUNT(grp) + FROM permission.grp_perm_map + GROUP BY grp, depth + ORDER BY depth, grp; + + grp | depth | count +-----+-------+------- + 1 | 0 | 6 + 2 | 0 | 2 + 3 | 0 | 45 + 4 | 0 | 3 + 5 | 0 | 5 + 10 | 0 | 1 + 3 | 1 | 3 + 4 | 1 | 4 + 5 | 1 | 1 + 6 | 1 | 9 + 7 | 1 | 5 + 10 | 1 | 10 + 3 | 2 | 24 + 4 | 2 | 8 + 10 | 2 | 7 +(15 rows) + + + You can use the WHERE clause to restrict the returned results before grouping + is applied to the results. The following query restricts the results to those + rows that have a depth of 0. + Using the <literal>WHERE</literal> clause with <literal>GROUP BY</literal> + +SELECT grp, COUNT(grp) + FROM permission.grp_perm_map + WHERE depth = 0 + GROUP BY grp + ORDER BY 2 DESC +; + + grp | count +-----+------- + 3 | 45 + 1 | 6 + 5 | 5 + 4 | 3 + 2 | 2 + 10 | 1 +(6 rows) + + + To restrict results after grouping has been applied to the rows, use the + HAVING clause; this is typically used to restrict results based on + a comparison to the value returned by an aggregate function. For example, + the following query restricts the returned rows to those that have more than + 5 occurrences of the same value for grp in the table. + <literal>GROUP BY</literal> restricted by a <literal>HAVING</literal> clause + +SELECT grp, COUNT(grp) + FROM permission.grp_perm_map + GROUP BY grp + HAVING COUNT(grp) > 5 +; + + grp | count +-----+------- + 6 | 9 + 4 | 15 + 5 | 6 + 1 | 6 + 3 | 72 + 10 | 18 +(6 rows) + + + + + Eliminating duplicate results with the DISTINCT keyword + GROUP BY is one way of eliminating duplicate results from the rows returned + by your query. The purpose of the DISTINCT keyword is to remove duplicate + rows from the results of your query. However, it works, and it is easy - so if + you just want a quick list of the unique set of values for a column or set of + columns, the DISTINCT keyword might be appropriate. + On the other hand, if you are getting duplicate rows back when you don’t expect + them, then applying the DISTINCT keyword might be a sign that you are + papering over a real problem. + Returning unique results of multiple columns with <literal>DISTINCT</literal> + +SELECT DISTINCT grp, depth + FROM permission.grp_perm_map + ORDER BY depth, grp +; + + grp | depth +-----+------- + 1 | 0 + 2 | 0 + 3 | 0 + 4 | 0 + 5 | 0 + 10 | 0 + 3 | 1 + 4 | 1 + 5 | 1 + 6 | 1 + 7 | 1 + 10 | 1 + 3 | 2 + 4 | 2 + 10 | 2 +(15 rows) + + + + + Paging through results with the LIMIT and OFFSET clauses + The LIMIT clause restricts the total number of rows returned from your query + and is useful if you just want to list a subset of a large number of rows. For + example, in the following query we list the five most frequently used + circulation modifiers: + Using the <literal>LIMIT</literal> clause to restrict results + +SELECT circ_modifier, COUNT(circ_modifier) + FROM asset.copy + GROUP BY circ_modifier + ORDER BY 2 DESC + LIMIT 5 +; + + circ_modifier | count +---------------+-------- + CIRC | 741995 + BOOK | 636199 + SER | 265906 + DOC | 191598 + LAW MONO | 126627 +(5 rows) + + + When you use the LIMIT clause to restrict the total number of rows returned + by your query, you can also use the OFFSET clause to determine which subset + of the rows will be returned. The use of the OFFSET clause assumes that + you’ve used the ORDER BY clause to impose order on the results. + In the following example, we use the OFFSET clause to get results 6 through + 10 from the same query that we prevously executed. + Using the <literal>OFFSET</literal> clause to return a specific subset of rows + +SELECT circ_modifier, COUNT(circ_modifier) + FROM asset.copy + GROUP BY circ_modifier + ORDER BY 2 DESC + LIMIT 5 + OFFSET 5 +; + + circ_modifier | count +---------------+-------- + LAW SERIAL | 102758 + DOCUMENTS | 86215 + BOOK_WEB | 63786 + MFORM SER | 39917 + REF | 34380 +(5 rows) + + + +
+
+ Advanced SQL queries + + Transforming column values with functions + PostgreSQL includes many built-in functions for manipulating column data. + You can also create your own functions (and Evergreen does make use of + many custom functions). There are two types of functions used in + databases: scalar functions and aggregate functions. + + Scalar functions + Scalar functions transform each value of the target column. If your query + would return 50 values for a column in a given query, and you modify your + query to apply a scalar function to the values returned for that column, + it will still return 50 values. For example, the UPPER() function, + used to convert text values to upper-case, modifies the results in the + following set of queries: + Using the UPPER() scalar function to convert text values to upper-case + +-- First, without the UPPER() function for comparison +SELECT shortname, name + FROM actor.org_unit + WHERE id < 4 +; + + shortname | name +-----------+----------------------- + CONS | Example Consortium + SYS1 | Example System 1 + SYS2 | Example System 2 +(3 rows) + +-- Now apply the UPPER() function to the name column +SELECT shortname, UPPER(name) + FROM actor.org_unit + WHERE id < 4 +; + + shortname | upper +-----------+-------------------- + CONS | EXAMPLE CONSORTIUM + SYS1 | EXAMPLE SYSTEM 1 + SYS2 | EXAMPLE SYSTEM 2 +(3 rows) + + + There are so many scalar functions in PostgreSQL that we cannot cover them + all here, but we can list some of the most commonly used functions: + + + + || - concatenates two text values together + + + + + COALESCE() - returns the first non-NULL value from the list of arguments + + + + + LOWER() - returns a text value converted to lower-case + + + + + REPLACE() - returns a text value after replacing all occurrences of a given text value with a different text value + + + + + REGEXP_REPLACE() - returns a text value after being transformed by a regular expression + + + + + UPPER() - returns a text value converted to upper-case + + + + For a complete list of scalar functions, see + the PostgreSQL function documentation. + + + Aggregate functions + Aggregate functions return a single value computed from the the complete set of + values returned for the specified column. + + + + AVG() + + + + + COUNT() + + + + + MAX() + + + + + MIN() + + + + + SUM() + + + + + + + Sub-selects + A sub-select is the technique of using the results of one query to feed + into another query. You can, for example, return a set of values from + one column in a SELECT statement to be used to satisfy the IN() condition + of another SELECT statement; or you could return the MAX() value of a + column in a SELECT statement to match the = condition of another SELECT + statement. + For example, in the following query we use a sub-select to restrict the copies + returned by the main SELECT statement to only those locations that have an + opac_visible value of TRUE: + Sub-select example + +SELECT call_number + FROM asset.copy + WHERE deleted IS FALSE + AND location IN ( + SELECT id + FROM asset.copy_location + WHERE opac_visible IS TRUE + ) +; + + + Sub-selects can be an approachable way to breaking down a problem that + requires matching values between different tables, and often result in + a clearly expressed solution to a problem. However, if you start writing + sub-selects within sub-selects, you should consider tackling the problem + with joins instead. + + + Joins + Joins enable you to access the values from multiple tables in your query + results and comparison operators. For example, joins are what enable you to + relate a bibliographic record to a barcoded copy via the biblio.record_entry, + asset.call_number, and asset.copy tables. In this section, we discuss the + most common kind of join—the inner join—as well as the less common outer join + and some set operations which can compare and contrast the values returned by + separate queries. + When we talk about joins, we are going to talk about the left-hand table and + the right-hand table that participate in the join. Every join brings together + just two tables - but you can use an unlimited (for our purposes) number + of joins in a single SQL statement. Each time you use a join, you effectively + create a new table, so when you add a second join clause to a statement, + table 1 and table 2 (which were the left-hand table and the right-hand table + for the first join) now act as a merged left-hand table and the new table + in the second join clause is the right-hand table. + Clear as mud? Okay, let’s look at some examples. + + Inner joins + An inner join returns all of the columns from the left-hand table in the join + with all of the columns from the right-hand table in the joins that match a + condition in the ON clause. Typically, you use the = operator to match the + foreign key of the left-hand table with the primary key of the right-hand + table to follow the natural relationship between the tables. + In the following example, we return all of columns from the actor.usr and + actor.org_unit tables, joined on the relationship between the user’s home + library and the library’s ID. Notice in the results that some columns, like + id and mailing_address, appear twice; this is because both the actor.usr + and actor.org_unit tables include columns with these names. This is also why + we have to fully qualify the column names in our queries with the schema and + table names. + A simple inner join + +SELECT * + FROM actor.usr + INNER JOIN actor.org_unit ON actor.usr.home_ou = actor.org_unit.id + WHERE actor.org_unit.shortname = 'CONS' +; + +-[ RECORD 1 ]------------------+--------------------------------- +id | 1 +card | 1 +profile | 1 +usrname | admin +email | +... +mailing_address | +billing_address | +home_ou | 1 +... +claims_never_checked_out_count | 0 +id | 1 +parent_ou | +ou_type | 1 +ill_address | 1 +holds_address | 1 +mailing_address | 1 +billing_address | 1 +shortname | CONS +name | Example Consortium +email | +phone | +opac_visible | t +fiscal_calendar | 1 + + + Of course, you do not have to return every column from the joined tables; + you can (and should) continue to specify only the columns that you want to + return. In the following example, we count the number of borrowers for + every user profile in a given library by joining the permission.grp_tree + table where profiles are defined against the actor.usr table, and then + joining the actor.org_unit table to give us access to the user’s home + library: + Borrower Count by Profile (Adult, Child, etc)/Library + +SELECT permission.grp_tree.name, actor.org_unit.name, COUNT(permission.grp_tree.name) + FROM actor.usr + INNER JOIN permission.grp_tree + ON actor.usr.profile = permission.grp_tree.id + INNER JOIN actor.org_unit + ON actor.org_unit.id = actor.usr.home_ou + WHERE actor.usr.deleted IS FALSE + GROUP BY permission.grp_tree.name, actor.org_unit.name + ORDER BY actor.org_unit.name, permission.grp_tree.name +; + + name | name | count +-------+--------------------+------- + Users | Example Consortium | 1 +(1 row) + + + + + Aliases + So far we have been fully-qualifying all of our table names and column names to + prevent any confusion. This quickly gets tiring with lengthy qualified + table names like permission.grp_tree, so the SQL syntax enables us to assign + aliases to table names and column names. When you define an alias for a table + name, you can access its column throughout the rest of the statement by simply + appending the column name to the alias with a period; for example, if you assign + the alias au to the actor.usr table, you can access the actor.usr.id + column through the alias as au.id. + The formal syntax for declaring an alias for a column is to follow the column + name in the result columns clause with AS alias. To declare an alias for a table name, + follow the table name in the FROM clause (including any JOIN statements) with + AS alias. However, the AS keyword is optional for tables (and columns as + of PostgreSQL 8.4), and in practice most SQL statements leave it out. For + example, we can write the previous INNER JOIN statement example using aliases + instead of fully-qualified identifiers: + Borrower Count by Profile (using aliases) + +SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name +; + + Profile | Library | Count +---------+--------------------+------- + Users | Example Consortium | 1 +(1 row) + + + A nice side effect of declaring an alias for your columns is that the alias + is used as the column header in the results table. The previous version of + the query, which didn’t use aliased column names, had two columns named + name; this version of the query with aliases results in a clearer + categorization. + + + Outer joins + An outer join returns all of the rows from one or both of the tables + participating in the join. + + + + For a LEFT OUTER JOIN, the join returns all of the rows from the left-hand + table and the rows matching the join condition from the right-hand table, with + NULL values for the rows with no match in the right-hand table. + + + + + A RIGHT OUTER JOIN behaves in the same way as a LEFT OUTER JOIN, with the + exception that all rows are returned from the right-hand table participating in + the join. + + + + + For a FULL OUTER JOIN, the join returns all the rows from both the left-hand + and right-hand tables, with NULL values for the rows with no match in either + the left-hand or right-hand table. + + + + Base tables for the OUTER JOIN examples + +SELECT * FROM aaa; + + id | stuff +----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five +(5 rows) + +SELECT * FROM bbb; + + id | stuff | foo +----+-------+---------- + 1 | one | oneone + 2 | two | twotwo + 5 | five | fivefive + 6 | six | sixsix +(4 rows) + + + Example of a LEFT OUTER JOIN + +SELECT * FROM aaa + LEFT OUTER JOIN bbb ON aaa.id = bbb.id +; + id | stuff | id | stuff | foo +----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 3 | three | | | + 4 | four | | | + 5 | five | 5 | five | fivefive +(5 rows) + + + Example of a RIGHT OUTER JOIN + +SELECT * FROM aaa + RIGHT OUTER JOIN bbb ON aaa.id = bbb.id +; + id | stuff | id | stuff | foo +----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 5 | five | 5 | five | fivefive + | | 6 | six | sixsix +(4 rows) + + + Example of a FULL OUTER JOIN + +SELECT * FROM aaa + FULL OUTER JOIN bbb ON aaa.id = bbb.id +; + id | stuff | id | stuff | foo +----+-------+----+-------+---------- + 1 | one | 1 | one | oneone + 2 | two | 2 | two | twotwo + 3 | three | | | + 4 | four | | | + 5 | five | 5 | five | fivefive + | | 6 | six | sixsix +(6 rows) + + + + + Self joins + It is possible to join a table to itself. You can, in fact you must, use + aliases to disambiguate the references to the table. + + + + Set operations + Relational databases are effectively just an efficient mechanism for + manipulating sets of values; they are implementations of set theory. There are + three operators for sets (tables) in which each set must have the same number + of columns with compatible data types: the union, intersection, and difference + operators. + Base tables for the set operation examples + +SELECT * FROM aaa; + + id | stuff + ----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five + (5 rows) + +SELECT * FROM bbb; + + id | stuff | foo + ----+-------+---------- + 1 | one | oneone + 2 | two | twotwo + 5 | five | fivefive + 6 | six | sixsix +(4 rows) + + + + Union + The UNION operator returns the distinct set of rows that are members of + either or both of the left-hand and right-hand tables. The UNION operator + does not return any duplicate rows. To return duplicate rows, use the + UNION ALL operator. + Example of a UNION set operation + +-- The parentheses are not required, but are intended to help +-- illustrate the sets participating in the set operation +( + SELECT id, stuff + FROM aaa +) +UNION +( + SELECT id, stuff + FROM bbb +) +ORDER BY 1 +; + + id | stuff +----+------- + 1 | one + 2 | two + 3 | three + 4 | four + 5 | five + 6 | six +(6 rows) + + + + + Intersection + The INTERSECT operator returns the distinct set of rows that are common to + both the left-hand and right-hand tables. To return duplicate rows, use the + INTERSECT ALL operator. + Example of an INTERSECT set operation + +( + SELECT id, stuff + FROM aaa +) +INTERSECT +( + SELECT id, stuff + FROM bbb +) +ORDER BY 1 +; + + id | stuff +----+------- + 1 | one + 2 | two + 5 | five +(3 rows) + + + + + Difference + The EXCEPT operator returns the rows in the left-hand table that do not + exist in the right-hand table. You are effectively subtracting the common + rows from the left-hand table. + Example of an EXCEPT set operation + +( + SELECT id, stuff + FROM aaa +) +EXCEPT +( + SELECT id, stuff + FROM bbb +) +ORDER BY 1 +; + + id | stuff +----+------- + 3 | three + 4 | four +(2 rows) + +-- Order matters: switch the left-hand and right-hand tables +-- and you get a different result +( + SELECT id, stuff + FROM bbb +) +EXCEPT +( + SELECT id, stuff + FROM aaa +) +ORDER BY 1 +; + + id | stuff +----+------- + 6 | six +(1 row) + + + + + + Views + A view is a persistent SELECT statement that acts like a read-only table. + To create a view, issue the CREATE VIEW statement, giving the view a name + and a SELECT statement on which the view is built. + The following example creates a view based on our borrower profile count: + Creating a view + +CREATE VIEW actor.borrower_profile_count AS + SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name +; + + + When you subsequently select results from the view, you can apply additional + WHERE clauses to filter the results, or ORDER BY clauses to change the + order of the returned rows. In the following examples, we issue a simple + SELECT * statement to show that the default results are returned in the + same order from the view as the equivalent SELECT statement would be returned. + Then we issue a SELECT statement with a WHERE clause to further filter the + results. + Selecting results from a view + +SELECT * FROM actor.borrower_profile_count; + + Profile | Library | Count +----------------------------+----------------------------+------- + Faculty | University Library | 208 + Graduate | University Library | 16 + Patrons | University Library | 62 +... + +-- You can still filter your results with WHERE clauses +SELECT * + FROM actor.borrower_profile_count + WHERE "Profile" = 'Faculty'; + + Profile | Library | Count +---------+----------------------------+------- + Faculty | University Library | 208 + Faculty | College Library | 64 + Faculty | College Library 2 | 102 + Faculty | University Library 2 | 776 +(4 rows) + + + + + Inheritance + PostgreSQL supports table inheritance: that is, a child table inherits its + base definition from a parent table, but can add additional columns to its + own definition. The data from any child tables is visible in queries against + the parent table. + Evergreen uses table inheritance in several areas: + * In the Vandelay MARC batch importer / exporter, Evergreen defines base + tables for generic queues and queued records for which authority record and + bibliographic record child tables + * Billable transactions are based on the money.billable_xact table; + child tables include action.circulation for circulation transactions + and money.grocery for general bills. + * Payments are based on the money.payment table; its child table is + money.bnm_payment (for brick-and-mortar payments), which in turn has child + tables of money.forgive_payment, money.work_payment, money.credit_payment, + money.goods_payment, and money.bnm_desk_payment. The + money.bnm_desk_payment table in turn has child tables of money.cash_payment, + money.check_payment, and money.credit_card_payment. + * Transits are based on the action.transit_copy table, which has a child + table of action.hold_transit_copy for transits initiated by holds. + * Generic acquisition line items are defined by the + acq.lineitem_attr_definition table, which in turn has a number of child + tables to define MARC attributes, generated attributes, user attributes, and + provider attributes. + +
+
+ Understanding query performance with EXPLAIN + Some queries run for a long, long time. This can be the result of a poorly + written query—a query with a join condition that joins every + row in the biblio.record_entry table with every row in the metabib.full_rec + view would consume a massive amount of memory and disk space and CPU time—or + a symptom of a schema that needs some additional indexes. PostgreSQL provides + the EXPLAIN tool to estimate how long it will take to run a given query and + show you the query plan (how it plans to retrieve the results from the + database). + To generate the query plan without actually running the statement, simply + prepend the EXPLAIN keyword to your query. In the following example, we + generate the query plan for the poorly written query that would join every + row in the biblio.record_entry table with every row in the metabib.full_rec + view: + Query plan for a terrible query + +EXPLAIN SELECT * + FROM biblio.record_entry + FULL OUTER JOIN metabib.full_rec ON 1=1 +; + + QUERY PLAN +-------------------------------------------------------------------------------// + Merge Full Join (cost=0.00..4959156437783.60 rows=132415734100864 width=1379) + -> Seq Scan on record_entry (cost=0.00..400634.16 rows=2013416 width=1292) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) +(3 rows) + + + This query plan shows that the query would return 132415734100864 rows, and it + plans to accomplish what you asked for by sequentially scanning (Seq Scan) + every row in each of the tables participating in the join. + In the following example, we have realized our mistake in joining every row of + the left-hand table with every row in the right-hand table and take the saner + approach of using an INNER JOIN where the join condition is on the record ID. + Query plan for a less terrible query + +EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id; + QUERY PLAN +----------------------------------------------------------------------------------------// + Hash Join (cost=750229.86..5829273.98 rows=65766704 width=1379) + Hash Cond: (real_full_rec.record = bre.id) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) + -> Hash (cost=400634.16..400634.16 rows=2013416 width=1292) + -> Seq Scan on record_entry bre (cost=0.00..400634.16 rows=2013416 width=1292) +(5 rows) + + + This time, we will return 65766704 rows - still way too many rows. We forgot + to include a WHERE clause to limit the results to something meaningful. In + the following example, we will limit the results to deleted records that were + modified in the last month. + Query plan for a realistic query + +EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id + WHERE bre.deleted IS TRUE + AND DATE_TRUNC('MONTH', bre.edit_date) > + DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL) +; + + QUERY PLAN +----------------------------------------------------------------------------------------// + Hash Join (cost=5058.86..2306218.81 rows=201669 width=1379) + Hash Cond: (real_full_rec.record = bre.id) + -> Seq Scan on real_full_rec (cost=0.00..1640972.04 rows=65766704 width=87) + -> Hash (cost=4981.69..4981.69 rows=6174 width=1292) + -> Index Scan using biblio_record_entry_deleted on record_entry bre + (cost=0.00..4981.69 rows=6174 width=1292) + Index Cond: (deleted = true) + Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date) + > date_trunc('MONTH'::text, (now() - '1 mon'::interval)))) +(7 rows) + + + We can see that the number of rows returned is now only 201669; that’s + something we can work with. Also, the overall cost of the query is 2306218, + compared to 4959156437783 in the original query. The Index Scan tells us + that the query planner will use the index that was defined on the deleted + column to avoid having to check every row in the biblio.record_entry table. + However, we are still running a sequential scan over the + metabib.real_full_rec table (the table on which the metabib.full_rec + view is based). Given that linking from the bibliographic records to the + flattened MARC subfields is a fairly common operation, we could create a + new index and see if that speeds up our query plan. + Query plan with optimized access via a new index + +-- This index will take a long time to create on a large database +-- of bibliographic records +CREATE INDEX bib_record_idx ON metabib.real_full_rec (record); + +EXPLAIN SELECT * + FROM biblio.record_entry bre + INNER JOIN metabib.full_rec mfr ON mfr.record = bre.id + WHERE bre.deleted IS TRUE + AND DATE_TRUNC('MONTH', bre.edit_date) > + DATE_TRUNC ('MONTH', NOW() - '1 MONTH'::INTERVAL) +; + + QUERY PLAN +----------------------------------------------------------------------------------------// + Nested Loop (cost=0.00..1558330.46 rows=201669 width=1379) + -> Index Scan using biblio_record_entry_deleted on record_entry bre + (cost=0.00..4981.69 rows=6174 width=1292) + Index Cond: (deleted = true) + Filter: ((deleted IS TRUE) AND (date_trunc('MONTH'::text, edit_date) > + date_trunc('MONTH'::text, (now() - '1 mon'::interval)))) + -> Index Scan using bib_record_idx on real_full_rec + (cost=0.00..240.89 rows=850 width=87) + Index Cond: (real_full_rec.record = bre.id) +(6 rows) + + + We can see that the resulting number of rows is still the same (201669), but + the execution estimate has dropped to 1558330 because the query planner can + use the new index (bib_record_idx) rather than scanning the entire table. + Success! + While indexes can significantly speed up read access to tables for common + filtering conditions, every time a row is created or updated the corresponding + indexes also need to be maintained - which can decrease the performance of + writes to the database. Be careful to keep the balance of read performance + versus write performance in mind if you plan to create custom indexes in your + Evergreen database. +
+
+ Inserting, updating, and deleting data + + Inserting data + To insert one or more rows into a table, use the INSERT statement to identify + the target table and list the columns in the table for which you are going to + provide values for each row. If you do not list one or more columns contained + in the table, the database will automatically supply a NULL value for those + columns. The values for each row follow the VALUES clause and are grouped in + parentheses and delimited by commas. Each row, in turn, is delimited by commas + (this multiple row syntax requires PostgreSQL 8.2 or higher). + For example, to insert two rows into the permission.usr_grp_map table: + Inserting rows into the <literal>permission.usr_grp_map</literal> table + INSERT INTO permission.usr_grp_map (usr, grp) + VALUES (2, 10), (2, 4) + ; + + Of course, as with the rest of SQL, you can replace individual column values + with one or more use sub-selects: + Inserting rows using sub-selects instead of integers + +INSERT INTO permission.usr_grp_map (usr, grp) + VALUES ( + (SELECT id FROM actor.usr + WHERE family_name = 'Scott' AND first_given_name = 'Daniel'), + (SELECT id FROM permission.grp_tree + WHERE name = 'Local System Administrator') + ), ( + (SELECT id FROM actor.usr + WHERE family_name = 'Scott' AND first_given_name = 'Daniel'), + (SELECT id FROM permission.grp_tree + WHERE name = 'Circulator') + ) +; + + + + + Inserting data using a SELECT statement + Sometimes you want to insert a bulk set of data into a new table based on + a query result. Rather than a VALUES clause, you can use a SELECT + statement to insert one or more rows matching the column definitions. This + is a good time to point out that you can include explicit values, instead + of just column identifiers, in the return columns of the SELECT statement. + The explicit values are returned in every row of the result set. + In the following example, we insert 6 rows into the permission.usr_grp_map + table; each row will have a usr column value of 1, with varying values for + the grp column value based on the id column values returned from + permission.grp_tree: + Inserting rows via a <literal>SELECT</literal> statement + +INSERT INTO permission.usr_grp_map (usr, grp) + SELECT 1, id + FROM permission.grp_tree + WHERE id > 2 +; + +INSERT 0 6 + + + + + Deleting rows + Deleting data from a table is normally fairly easy. To delete rows from a table, + issue a DELETE statement identifying the table from which you want to delete + rows and a WHERE clause identifying the row or rows that should be deleted. + In the following example, we delete all of the rows from the + permission.grp_perm_map table where the permission maps to + UPDATE_ORG_UNIT_CLOSING and the group is anything other than administrators: + Deleting rows from a table + +DELETE FROM permission.grp_perm_map + WHERE grp IN ( + SELECT id + FROM permission.grp_tree + WHERE name != 'Local System Administrator' + ) AND perm = ( + SELECT id + FROM permission.perm_list + WHERE code = 'UPDATE_ORG_UNIT_CLOSING' + ) +; + + + There are two main reasons that a DELETE statement may not actually + delete rows from a table, even when the rows meet the conditional clause. + + + + If the row contains a value that is the target of a relational constraint, + for example, if another table has a foreign key pointing at your target + table, you will be prevented from deleting a row with a value corresponding + to a row in the dependent table. + + + + + If the table has a rule that substitutes a different action for a DELETE + statement, the deletion will not take place. In Evergreen it is common for a + table to have a rule that substitutes the action of setting a deleted column + to TRUE. For example, if a book is discarded, deleting the row representing + the copy from the asset.copy table would severely affect circulation statistics, + bills, borrowing histories, and their corresponding tables in the database that + have foreign keys pointing at the asset.copy table (action.circulation and + money.billing and its children respectively). Instead, the deleted column + value is set to TRUE and Evergreen’s application logic skips over these rows + in most cases. + + + + + + Updating rows + To update rows in a table, issue an UPDATE statement identifying the table + you want to update, the column or columns that you want to set with their + respective new values, and (optionally) a WHERE clause identifying the row or + rows that should be updated. + Following is the syntax for the UPDATE statement: +
+ UPDATE [table-name] + SET [column] TO [new-value] + WHERE [condition] + ; +
+
+
+
+ Query requests + The following queries were requested by Bibliomation, but might be reusable + by other libraries. + + Monthly circulation stats by collection code / library + Monthly Circulation Stats by Collection Code/Library + +SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", acl.name AS "Copy Location" + FROM asset.copy ac + INNER JOIN asset.copy_location acl ON ac.location = acl.id + INNER JOIN action.circulation acirc ON acirc.target_copy = ac.id + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP BY aou.name, acl.name + ORDER BY aou.name, acl.name, 1 +; + + + + + Monthly circulation stats by borrower stat / library + Monthly Circulation Stats by Borrower Stat/Library + +SELECT COUNT(acirc.id) AS "COUNT", aou.name AS "Library", asceum.stat_cat_entry AS "Borrower Stat" + FROM action.circulation acirc + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + INNER JOIN actor.stat_cat_entry_usr_map asceum ON asceum.target_usr = acirc.usr + INNER JOIN actor.stat_cat astat ON asceum.stat_cat = astat.id + WHERE DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND astat.name = 'Preferred language' + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP BY aou.name, asceum.stat_cat_entry + ORDER BY aou.name, asceum.stat_cat_entry, 1 +; + + + + + Monthly intralibrary loan stats by library + Monthly Intralibrary Loan Stats by Library + +SELECT aou.name AS "Library", COUNT(acirc.id) + FROM action.circulation acirc + INNER JOIN actor.org_unit aou ON acirc.circ_lib = aou.id + INNER JOIN asset.copy ac ON acirc.target_copy = ac.id + INNER JOIN asset.call_number acn ON ac.call_number = acn.id + WHERE acirc.circ_lib != acn.owning_lib + AND DATE_TRUNC('MONTH', acirc.create_time) = DATE_TRUNC('MONTH', NOW() - INTERVAL '3 month') + AND acirc.desk_renewal IS FALSE + AND acirc.opac_renewal IS FALSE + AND acirc.phone_renewal IS FALSE + GROUP by aou.name + ORDER BY aou.name, 2 +; + + + + + Monthly borrowers added by profile (adult, child, etc) / library + Monthly Borrowers Added by Profile (Adult, Child, etc)/Library + +SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + AND DATE_TRUNC('MONTH', au.create_date) = DATE_TRUNC('MONTH', NOW() - '3 months'::interval) + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name +; + + + + + Borrower count by profile (adult, child, etc) / library + Borrower Count by Profile (Adult, Child, etc)/Library + +SELECT pgt.name AS "Profile", aou.name AS "Library", COUNT(pgt.name) AS "Count" + FROM actor.usr au + INNER JOIN permission.grp_tree pgt + ON au.profile = pgt.id + INNER JOIN actor.org_unit aou + ON aou.id = au.home_ou + WHERE au.deleted IS FALSE + GROUP BY pgt.name, aou.name + ORDER BY aou.name, pgt.name +; + + + + + Monthly items added by collection / library + We define a collection as a shelving location in Evergreen. + Monthly Items Added by Collection/Library + +SELECT aou.name AS "Library", acl.name, COUNT(ac.barcode) + FROM actor.org_unit aou + INNER JOIN asset.call_number acn ON acn.owning_lib = aou.id + INNER JOIN asset.copy ac ON ac.call_number = acn.id + INNER JOIN asset.copy_location acl ON ac.location = acl.id + WHERE ac.deleted IS FALSE + AND acn.deleted IS FALSE + AND DATE_TRUNC('MONTH', ac.create_date) = DATE_TRUNC('MONTH', NOW() - '1 month'::interval) + GROUP BY aou.name, acl.name + ORDER BY aou.name, acl.name +; + + + + + Hold purchase alert by library + in the following set of queries, we bring together the active title, volume, + and copy holds and display those that have more than a certain number of holds + per title. The goal is to UNION ALL the three queries, then group by the + bibliographic record ID and display the title / author information for those + records that have more than a given threshold of holds. + Hold Purchase Alert by Library + +-- Title holds +SELECT all_holds.bib_id, aou.name, rmsr.title, rmsr.author, COUNT(all_holds.bib_id) + FROM + ( + ( + SELECT target, request_lib + FROM action.hold_request + WHERE hold_type = 'T' + AND fulfillment_time IS NULL + AND cancel_time IS NULL + ) + UNION ALL + -- Volume holds + ( + SELECT bre.id, request_lib + FROM action.hold_request ahr + INNER JOIN asset.call_number acn ON ahr.target = acn.id + INNER JOIN biblio.record_entry bre ON acn.record = bre.id + WHERE ahr.hold_type = 'V' + AND ahr.fulfillment_time IS NULL + AND ahr.cancel_time IS NULL + ) + UNION ALL + -- Copy holds + ( + SELECT bre.id, request_lib + FROM action.hold_request ahr + INNER JOIN asset.copy ac ON ahr.target = ac.id + INNER JOIN asset.call_number acn ON ac.call_number = acn.id + INNER JOIN biblio.record_entry bre ON acn.record = bre.id + WHERE ahr.hold_type = 'C' + AND ahr.fulfillment_time IS NULL + AND ahr.cancel_time IS NULL + ) + ) AS all_holds(bib_id, request_lib) + INNER JOIN reporter.materialized_simple_record rmsr + INNER JOIN actor.org_unit aou ON aou.id = all_holds.request_lib + ON rmsr.id = all_holds.bib_id + GROUP BY all_holds.bib_id, aou.name, rmsr.id, rmsr.title, rmsr.author + HAVING COUNT(all_holds.bib_id) > 2 + ORDER BY aou.name +; + + + + + Update borrower records with a different home library + In this example, the library has opened a new branch in a growing area, + and wants to reassign the home library for the patrons in the vicinity of + the new branch to the new branch. To accomplish this, we create a staging table + that holds a set of city names and the corresponding branch shortname for the home + library for each city. + Then we issue an UPDATE statement to set the home library for patrons with a + physical address with a city that matches the city names in our staging table. + Update borrower records with a different home library + +CREATE SCHEMA staging; +CREATE TABLE staging.city_home_ou_map (city TEXT, ou_shortname TEXT, + FOREIGN KEY (ou_shortname) REFERENCES actor.org_unit (shortname)); +INSERT INTO staging.city_home_ou_map (city, ou_shortname) + VALUES ('Southbury', 'BR1'), ('Middlebury', 'BR2'), ('Hartford', 'BR3'); +BEGIN; + +UPDATE actor.usr au SET home_ou = COALESCE( + ( + SELECT aou.id + FROM actor.org_unit aou + INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname + INNER JOIN actor.usr_address aua ON aua.city = schom.city + WHERE au.id = aua.usr + GROUP BY aou.id + ), home_ou) +WHERE ( + SELECT aou.id + FROM actor.org_unit aou + INNER JOIN staging.city_home_ou_map schom ON schom.ou_shortname = aou.shortname + INNER JOIN actor.usr_address aua ON aua.city = schom.city + WHERE au.id = aua.usr + GROUP BY aou.id +) IS NOT NULL; + + + +
+ +
diff --git a/2.0/development/json.xml b/2.0/development/json.xml new file mode 100644 index 0000000..3036c1e --- /dev/null +++ b/2.0/development/json.xml @@ -0,0 +1,1863 @@ + + + + JSON Queries + JSON + + The json_query facility provides a way for client applications to query the database over the network. Instead of constructing its own SQL, the application encodes a query in the + form of a JSON string and passes it to the json_query service. Then the json_query service parses the JSON, constructs and executes the corresponding SQL, and returns the results to + the client application. + This arrangement enables the json_query service to act as a gatekeeper, protecting the database from potentially damaging SQL commands. In particular, the generated SQL is + confined to SELECT statements, which will not change the contents of the database. + + In addition, the json_query service sometimes uses its knowledge of the database structure to supply column names and join conditions so that the client application doesn't + have to. + + Nevertheless, the need to encode a query in a JSON string adds complications, because the client needs to know how to build the right JSON. JSON queries are also somewhat + limiting -- they can't do all of the things that you can do with raw SQL. + + The IDL + + A JSON query does not refer to tables and columns. Instead, it refers to classes and fields, which the IDL maps to the corresponding database entities. + + The IDL (Interface Definition Language) is an XML file, typically /openils/conf/fm_IDL.xml. It maps each class to a table, view, or subquery, and + each field to a column. It also includes information about foreign key relationships. + + (The IDL also defines virtual classes and virtual fields, which don't correspond to database entities. We won't discuss them here, because json_query ignores them.) + + When it first starts up, json_query loads a relevant subset of the IDL into memory. Thereafter, it consults its copy of the IDL whenever it needs to know about the database + structure. It uses the IDL to validate the JSON queries, and to translate classes and fields to the corresponding tables and columns. In some cases it uses the IDL to supply information + that the queries don't provide. + Definitions + + You should also be familiar with JSON. However it is worth defining a couple of terms that have other meanings in other contexts: + + + An "object" is a JSON object, i.e. a comma-separated list of name:value pairs, enclosed in curly braces, like this: + { "a":"frobozz", "b":24, "c":null } + + An "array" is a JSON array, i.e. a comma-separated list of values, enclosed in square brackets, like this: + [ "Goober", 629, null, false, "glub" ] + + + + + The Examples + The test_json_query utility generated the SQL for all of the sample queries in this tutorial. Newlines and indentation were then inserted manually for readability. + All examples involve the actor.org_unit table, sometimes in combination with a few related tables. The queries themselves are designed to illustrate the syntax, not + to do anything useful at the application level. For example, it's not meaningful to take the square root of an org_unit id, except to illustrate how to code a function call. + The examples are like department store mannequins -- they have no brains, they're only for display. + The simplest kind of query defines nothing but a FROM clause. For example: + + { + "from":"aou" + } + + In this minimal example we select from only one table. Later we will see how to join multiple tables. + Since we don't supply a WHERE clause, json_query constructs a default WHERE clause for us, including all the available columns. The resulting SQL looks like this: + +SELECT + "aou".billing_address AS "billing_address", + "aou".holds_address AS "holds_address", + "aou".id AS "id", + "aou".ill_address AS "ill_address", + "aou".mailing_address AS "mailing_address", + "aou".name AS "name", + "aou".ou_type AS "ou_type", + "aou".parent_ou AS "parent_ou", + "aou".shortname AS "shortname", + "aou".email AS "email", + "aou".phone AS "phone", + "aou".opac_visible AS "opac_visible" +FROM + actor.org_unit AS "aou" ; + + + + Default SELECT Clauses + JSONSELECT clauses + The default SELECT clause includes every column that the IDL defines it as a non-virtual field for the class in question. If a column is present in the database but + not defined in the IDL, json_query doesn't know about it. In the case of the example shown above, all the columns are defined in the IDL, so they all show up in the default + SELECT clause. + If the FROM clause joins two or more tables, the default SELECT clause includes columns only from the core table, not from any of the joined tables. + The default SELECT clause has almost the same effect as "SELECT *", but not exactly. If you were to "SELECT * from actor.org_unit_type in psql, the output would + include all the same columns as in the example above, but not in the same order. A default SELECT clause includes the columns in the order in which the IDL defines them, + which may be different from the order in which the database defines them. + In practice, the sequencing of columns in the SELECT clause is not significant. The result set is returned to the client program in the form of a data structure, which + the client program can navigate however it chooses. + + + Other Lessons + There are other ways to get a default SELECT clause. However, default SELECT clauses are a distraction at this point, because most of the time you'll specify your + own SELECT clause explicitly, as we will discuss later. + Let's consider some more important aspects of this simple example -- more important because they apply to more complex queries as well. + + + The entire JSON query is an object. In this simple case the object includes only one entry, for the FROM clause. Typically you'll also have entries + for the SELECT clause and the WHERE clause, and possibly for HAVING, ORDER BY, LIMIT, or OFFSET clauses. There is no separate entry for a GROUP BY clause, which you + can specify by other means. + + + Although all the other entries are optional, you must include an entry for the FROM clause. You cannot, for example, do a SELECT USER the way + you can in psql. + + + Every column is qualified by an alias for the table. This alias is always the class name for the table, as defined in the IDL. + + + Every column is aliased with the column name. There is a way to choose a different column alias (not shown here). + + + + + The SELECT Clause + The following variation also produces a default SELECT clause: + +{ + "from":"aou", + "select": { + "aou":"*" + } +} + + ...and so does this one: + +{ + "select": { + "aou":null + }, + "from":"aou" +} + + While this syntax may not be terribly useful, it does illustrate the minimal structure of a SELECT clause in a JSON query: an entry in the outermost JSON object, + with a key of select. The value associated with this key is another JSON object, whose keys are class names. + (These two examples also illustrate another point: unlike SQL, a JSON query doesn't care whether the FROM clause or the SELECT clause comes first.) + Usually you don't want the default SELECT clause. Here's how to select only some of the columns: + +{ + "from":"aou", + "select": { + "aou":[ "id", "name" ] + } +} + + The value associated with the class name is an array of column names. If you select columns from multiple tables (not shown here), you'll need a separate entry for each table, + and a separate column list for each entry. + The previous example results in the following SQL: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" ; + + + + Fancier SELECT Clauses + The previous example featured an array of column names. More generally, it featured an array of field specifications, and one kind of field specification is a column name. + The other kind is a JSON object, with some combination of the following keys: + + + column -- the column name (required). + + + alias -- used to define a column alias, which otherwise defaults to the column name. + + + aggregate -- takes a value of true or false. Don't worry about this one yet. It concerns the use of GROUP BY clauses, which we will examine + later. + + + transform -- the name of an SQL function to be called. + + + result_field -- used with "transform"; specifies an output column of a function that returns multiple columns at a time. + + + params -- used with "transform"; provides a list of parameters for the function. They may be strings, numbers, or nulls. + + + This example assigns a different column alias: + +{ + "from":"aou", + "select": { + "aou": [ + "id", + { "column":"name", "alias":"org_name" } + ] + } +} + +SELECT + "aou".id AS "id", + "aou".name AS "org_name" +FROM + actor.org_unit AS "aou" ; + + In this case, changing the column alias doesn't accomplish much. But if we were joining to the actor.org_unit_type table, which also has a "name" column, we could + use different aliases to distinguish them. + The following example uses a function to raise a column to upper case: + +{ + "from":"aou", + "select": { + "aou": [ + "id", + { "column":"name", "transform":"upper" } + ] + } +} + +SELECT + "aou".id AS "id", + upper("aou".name ) AS "name" +FROM + actor.org_unit AS "aou" ; + + Here we take a substring of the name, using the params element to pass parameters: + + { + "from":"aou", + "select": { + "aou": [ + "id", { + "column":"name", + "transform":"substr", + "params":[ 3, 5 ] + } + ] + } + } + + SELECT + "aou".id AS "id", + substr("aou".name,'3','5' ) AS "name" + FROM + actor.org_unit AS "aou" ; + + The parameters specified with params are inserted after the applicable column (name in this case), + which is always the first parameter. They are always passed as strings, i.e. enclosed in quotes, even if the JSON expresses them as numbers. PostgreSQL will ordinarily + coerce them to the right type. However if the function name is overloaded to accept different types, PostgreSQL may invoke a function other than the one intended. + Finally we call a fictitious function "frobozz" that returns multiple columns, where we want only one of them: + +{ + "from":"aou", + "select": { + "aou": [ + "id", { + "column":"name", + "transform":"frobozz", + "result_field":"zamzam" + } + ] + } +} + +SELECT + "aou".id AS "id", + (frobozz("aou".name ))."zamzam" AS "name" +FROM + actor.org_unit AS "aou" ; + + The frobozz function doesn't actually exist, but json_query doesn't know that. The query won't fail until json_query tries to execute it in + the database. + + + Things You Can't Do + You can do some things in a SELECT clause with raw SQL (with psql, for example) that you can't do with a JSON query. Some of them matter and some of them don't. + When you do a JOIN, you can't arrange the selected columns in any arbitrary sequence, because all of the columns from a given table must be grouped together. + This limitation doesn't matter. The results are returned in the form of a data structure, which the client program can navigate however it likes. + You can't select an arbitrary expression, such as "percentage / 100" or "last_name || ', ' || first_name". Most of the time this limitation doesn't matter either, because + the client program can do these kinds of manipulations for itself. However, function calls may be a problem. You can't nest them, and you can't pass more than one column value + to them (and it has to be the first parameter). + You can't use a CASE expression. Instead, the client application can do the equivalent branching for itself. + You can't select a subquery. In raw SQL you can do something like the following: + +SELECT + id, + name, + ( + SELECT name + FROM actor.org_unit_type AS aout + WHERE aout.id = aou.ou_type + ) AS type_name +FROM + actor.org_unit AS aou; + + This contrived example is not very realistic. Normally you would use a JOIN in this case, and that's what you should do in a JSON query. Other cases may not be so + easy to solve. + + + The WHERE Clause + Most queries need a WHERE clause, as in this simple example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":"3" + } +} + + Like the SELECT clause, the WHERE clause gets its own entry in the top-level object of a JSON query. The key is where, and the associated value is either + an object (as shown here) or an array (to be discussed a bit later). Each entry in the object is a separate condition. + In this case, we use a special shortcut for expressing an equality condition. The column name is on the left of the colon, and the value to which we are equating it is on + the right. + Here's the resulting SQL: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".parent_ou = 3; + + Like the SELECT clause, the generated WHERE clause qualifies each column name with the alias of the relevant table. + If you want to compare a column to NULL, put null (without quotation marks) to the right of the colon instead of a literal value. The + resulting SQL will include IS NULL instead of an equals sign. + + + Other Kinds of Comparisons + Here's the same query (which generates the same SQL) without the special shortcut: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ "=":3 } + } +} + + We still have an entry whose key is the column name, but this time the associated value is another JSON object. It must contain exactly one entry, + with the comparison operator on the left of the colon, and the value to be compared on the right. + The same syntax works for other kinds of comparison operators. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ ">":3 } + } +} + + ...turns into: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".parent_ou > 3 ; + + The condition '=:null' turns into IS NULL. Any other operator used with null turns into IS NOT NULL. + You can use most of the comparison operators recognized by PostgreSQL: + + = <> != + < > <= >= + ~ ~* !~ !~* + like ilike + similar to + + The only ones you can't use are is distinct from and is not distinct from. + + + Custom Comparisons + Here's a dirty little secret: json_query doesn't really pay much attention to the operator you supply. It merely checks to make sure that the operator doesn't contain + any semicolons or white space, in order to prevent certain kinds of SQL injection. It also allows "similar to" as a special exception. + As a result, you can slip an operator of your own devising into the SQL, so long as it doesn't contain any semicolons or white space, and doesn't create invalid syntax. + Here's a contrived and rather silly example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ "<2+":3 } + } +} + + ...which results in the following SQL: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".parent_ou <2+ 3; + + It's hard to come up with a realistic case where this hack would be useful, but it could happen. + + + Comparing One Column to Another + Here's how to put another column on the right hand side of a comparison: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { ">": { "+aou":"parent_ou" } } + } +}; + + This syntax is similar to the previous examples, except that instead of comparing to a literal value, we compare to an object. This object has only a single entry, + whose key is a table alias preceded by a leading plus sign. The associated value is the name of the column. + Here's the resulting SQL: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE +( + "aou".id > ( "aou".parent_ou ) +); + + The table alias must correspond to the appropriate table. Since json_query doesn't validate the choice of alias, it won't detect an invalid alias until it tries to + execute the query. In this simple example there's only one table to choose from. The choice of alias is more important in a subquery or join. + The leading plus sign, combined with a table alias, can be used in other situations to designate the table to which a column belongs. We shall defer a discussion of + this usage to the section on joins. + + + Testing Boolean Columns + In SQL, there are several ways to test a boolean column such as actor.org_unit.opac_visible. The most obvious way is to compare it to true or false: + +SELECT + id +FROM + actor.org_unit +WHERE + opac_visible = true; + + In a JSON query this approach doesn't work. If you try it, the "= true" test will turn into IS NULL. Don't do that. Instead, use a leading plus sign, as described in + the preceding section, to treat the boolean column as a stand-alone condition: + +{ + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "+aou":"opac_visible" + } +} + + Result: + +SELECT + "aou".id AS "id" +FROM + actor.org_unit AS "aou" +WHERE + "aou".opac_visible ; + + If you need to test for falsity, then write a test for truth and negate it with the "-not" operator. We will discuss the "-not" operator later, but here's a preview: + +{ + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "-not": { + "+aou":"opac_visible" + } + } +} + +SELECT + "aou".id AS "id" +FROM + actor.org_unit AS "aou" +WHERE + NOT ( "aou".opac_visible ); + + You can also compare a boolean column directly to a more complex condition: + +{ + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "opac_visible": { + "=": { "parent_ou":{ ">":3 } } + } + } +} + + Here we compare a boolean column, not to a literal value, but to a boolean expression. The resulting SQL looks a little goofy, but it works: + +SELECT + "aou".id AS "id" +FROM + actor.org_unit AS "aou" +WHERE + ( + "aou".opac_visible = ( "aou".parent_ou > 3 ) + ); + + In this case we compare the boolean column to a single simple condition. However you can include additional complications -- multiple conditions, IN lists, + BETWEEN clauses, and other features as described below. + + + Multiple Conditions + If you need multiple conditions, just add them to the "where" object, separated by commas: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ ">":3 }, + "id":{ "<>":7 } + } +} + + The generated SQL connects the conditions with AND: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".parent_ou g 3 + AND "aou".id <> 7; + + Later we will see how to use OR instead of AND. + + + Using Arrays + Here's a puzzler. Suppose you need two conditions for the same column. How do you code them in the same WHERE clause? For example, suppose you want something like this: + +SELECT + id, + name +FROM + actor.org_unit +WHERE + parent_ou > 3 + AND parent_ou <> 7; + + You might try a WHERE clause like this: + +"where": { + "parent_ou":{ ">":3 }, + "parent_ou":{ "<>":7 } + } + + Nope. Won't work. According to JSON rules, two entries in the same object can't have the same key. + After slapping yourself in the forehead, you try something a little smarter: + +"where": { + "parent_ou": { + ">":3, + "<>":7 + } +} + + Nice try, but that doesn't work either. Maybe it ought to work -- at least it's legal JSON -- but, no. + Here's what works: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": [ + { "parent_ou":{ ">":3 } }, + { "parent_ou":{ "<>":7 } } + ] +} + + We wrapped the two conditions into two separate JSON objects, and then wrapped those objects together into a JSON array. The resulting SQL looks like this: + +SELECT + "aou".id AS "id", + "aou".name AS "name +FROM + actor.org_unit AS "aou" +WHERE + ( "aou".parent_ou > 3 ) +AND + ( "aou".parent_ou <> 7 ); + + That's not quite what we were hoping for, because the extra parentheses are so ugly. But they're harmless. This will do. + If you're in the mood, you can use arrays to as many parentheses as you like, even if there is only one condition inside: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": + [[[[[[ + { + "parent_ou":{ ">":3 } + }, + ]]]]]] +} + + ...yields: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + ( ( ( ( ( ( "aou".parent_ou > 3 ) ) ) ) ) ); + + + + How to OR + By default, json_query combines conditions with AND. When you need OR, here's how to do it: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-or": { + "id":2, + "parent_ou":3 + } + } +} + + We use -or as the key, with the conditions to be ORed in an associated object. The leading minus sign is there to make sure that the operator isn't confused with a + column name. Later we'll see some other operators with leading minus signs. In a couple of spots we even use plus signs. + Here are the results from the above example: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + ( + "aou".id = 2 + OR "aou".parent_ou = 3 + ); + + The conditions paired with -or are linked by OR and enclosed in parentheses. + Here's how to do the same thing using an array, except that it produces an extra layer of parentheses: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-or": [ + { "id":2 }, + { "parent_ou":3 } + ] + } +} +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + ( + ( "aou".id = 2 ) + OR ( "aou".parent_ou = 3 ) + ); + + It's possible, though not very useful, to have only a single condition subject to the -or operator. In that case, the condition appears by itself, since there's nothing + to OR it to. This trick is another way to add an extraneous layer of parentheses. + + + Another way to AND + You can also use the -and operator. It works just like -or, except that it combines conditions with AND instead of OR. Since AND is the default, we don't usually + need a separate operator for it, but it's available. + In rare cases, nothing else will do -- you can't include two conditions in the same list because of the duplicate key problem, but you can't combine them with + arrays either. In particular, you might need to combine them within an expression that you're comparing to a boolean column (see the subsection above on Testing Boolean Columns). + + + Negation with NOT + The -not operator negates a condition or set of conditions. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-not": { + "id":{ ">":2 }, + "parent_ou":3 + } + } +} + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + NOT + ( + "aou".id > 2 + AND "aou".parent_ou = 3 + ); + + In this example we merely negate a combination of two comparisons. However the condition to be negated may be as complicated as it needs to be. Anything that can be + subject to where can be subject to -not. + In most cases you can achieve the same result by other means. However the -not operator is the only way to represent NOT BETWEEN + (to be discussed later). + + + EXISTS with Subqueries + Two other operators carry a leading minus sign: -exists and its negation -not-exists. These operators apply to subqueries, which have the + same format as a full query. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-exists": { + "from":"asv", + "select":{ "asv":[ "id" ] }, + "where": { + "owner":7 + } + } + } +} + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE +EXISTS + ( + SELECT "asv".id AS "id" + FROM action.survey AS "asv" + WHERE "asv".owner = 7 + ); + + This kind of subquery is of limited use, because its WHERE clause doesn't have anything to do with the main query. It just shuts down the main query altogether + if it isn't satisfied. + More typical is a correlated subquery, whose WHERE clause refers to a row from the main query. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-exists": { + "from":"asv", + "select":{ "asv":[ "id" ] }, + "where": { + "owner":{ "=":{ "+aou":"id" }} + } + } + } +} + + Note the use of +aou to qualify the id column in the inner WHERE clause. + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + EXISTS + ( + SELECT "asv".id AS "id" + FROM action.survey AS "asv" + WHERE ("asv".owner = ( "aou".id )) + ); + + This latter example illustrates the syntax, but in practice, it would probably be more natural to use an IN clause with a subquery (to be discussed later). + + + BETWEEN Clauses + Here's how to express a BETWEEN clause: + +{ + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "parent_ou": { "between":[ 3, 7 ] } + } +} + + The value associated with the column name is an object with a single entry, whose key is "between". The corresponding value is an array with exactly two values, defining the + range to be tested. + The range bounds must be either numbers or string literals. Although SQL allows them to be null, a null doesn't make sense in this context, because a null never matches + anything. Consequently json_query doesn't allow them. + The resulting SQL is just what you would expect: + +SELECT + "aou".id AS "id" +FROM + actor.org_unit AS "aou" +WHERE + parent_ou BETWEEN '3' AND '7'; + + + + IN and NOT IN Lists + There are two ways to code an IN list. One way is simply to include the list of values in an array: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou": [ 3, 5, 7 ] + } +} + + As with a BETWEEN clause, the values in the array must be numbers or string literals. Nulls aren't allowed. Here's the resulting SQL, which again is just what + you would expect: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".parent_ou IN (3, 5, 7); + + The other way is similar to the syntax shown above for a BETWEEN clause, except that the array may include any non-zero number of values: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou": { "in": [ 3, 5, 7 ] } + } +} + + This version results in the same SQL as the first one. + For a NOT IN list, you can use the latter format, using the not in operator instead of in. Alternatively, you can use either format together with + the -not operator. + + + IN and NOT IN Clauses with Subqueries + For an IN clause with a subquery, the syntax is similar to the second of the two formats for an IN list (see the previous subsection). The "in" or "not in" operator + is paired, not with an array of values, but with an object representing the subquery. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + "in": { + "from":"asv", + "select":{ "asv":[ "owner" ] }, + "where":{ "name":"Voter Registration" } + } + } + } +} + + The results: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".id IN + ( + SELECT + "asv".owner AS "owner" + FROM + action.survey AS "asv" + WHERE + "asv".name = 'Voter Registration' + ); + + In SQL the subquery may select multiple columns, but in a JSON query it can select only a single column. + For a NOT IN clause with a subquery, use the not in operator instead of in. + + + Comparing to a Function + Here's how to compare a column to a function call: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id":{ ">":[ "sqrt", 16 ] } + } +} + + A comparison operator (> in this case) is paired with an array. The first entry in the array must be a string giving the name of the function. The remaining parameters, + if any, are the parameters. They may be strings, numbers, or nulls. The resulting SQL for this example: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + "aou".id > sqrt( '16' ); + + All parameters are passed as quoted strings -- even if, as in this case, they are really numbers. + This syntax is somewhat limited in that the function parameters must be constants (hence the use of a silly example). + + + Putting a Function Call on the Left + In the discussion of the SELECT clause, we saw how you could transform the value of a selected column by passing it to a function. In the WHERE clause, you can + use similar syntax to transform the value of a column before comparing it to something else. + For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "name": { + "=": { + "transform":"upper", + "value":"CARTER BRANCH" + } + } + } +} + + The "transform" entry gives the name of the function that we will use on the left side of the comparison. The "value" entry designates the value on the right side + of the comparison. + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + upper("aou".name ) = 'CARTER BRANCH' ; + + As in the SELECT clause, you can pass literal values or nulls to the function as additional parameters by using an array tagged as params: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "name": { + "=": { + "transform":"substr", + "params":[ 1, 6 ], + "value":"CARTER" + } + } + } +} + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + substr("aou".name,'1','6' ) = 'CARTER' ; + + The first parameter is always the column name, qualified by the class name, followed by any additional parameters (which are always enclosed in quotes even if they + are numeric). + As in the SELECT clause: if the function returns multiple columns, you can specify the one you want by using a "result_field" entry (not shown here). + If you leave out the "transform" entry (or misspell it), the column name will appear on the left without any function call. This syntax works, but it's more + complicated than it needs to be. + + + + Putting Function Calls on Both Sides + If you want to compare one function call to another, you can use the same syntax shown in the previous subsection -- except that the value entry carries an + array instead of a literal value. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + ">": { + "transform":"factorial", + "value":[ "sqrt", 1000 ] + } + } + } +} +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE + factorial("aou".id ) > sqrt( '1000' ) ; + + The format for the right side function is similar to what we saw earlier, in the subsection Comparing to a Function. Note that there are two different formats + for defining function calls: + + + For a function call to the left of the comparison, the function name is tagged as transform. The first parameter is always the relevant + column name; additional parameters, if any, are in an array tagged as "params". The entry for result_field, if present, specifies a subcolumn. + + + For a function call to the right of the comparison, the function name is the first entry in an array, together with any parameters. + There's no way to specify a subcolumn. + + + + + Comparing a Function to a Condition + So far we have seen two kinds of data for the value tag. A string or number translates to a literal value, and an array translates to a function call. + The third possibility is a JSON object, which translates to a condition. For example: + +{ + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + "=": { + "value":{ "parent_ou":{ ">":3 } }, + "transform":"is_prime" + } + } + } +} + + The function tagged as transform must return boolean, or else json_query will generate invalid SQL. The function used here, is_prime, + is fictitious. + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +WHERE +( + is_prime("aou".id ) = ( "aou".parent_ou > 3 ) +); + + If we left out the transform entry, json_query would compare the column on the left (which would to be boolean) to the condition on the right. The results are similar + to those for a simpler format described earlier (see the subsection Testing Boolean Columns). + In the example above we compared the boolean to a simple condition. However the expression on the right may include multiple conditions, IN lists, subqueries, + and whatever other complications are necessary. + + + Things You Can't Do + The WHERE clause is subject to some of the same limitations as the SELECT clause. However, in the WHERE clause these limitations are more limiting, because + the client program can't compensate by doing some of the work for itself. + You can't use arbitrary expressions in a WHERE condition, such as "WHERE id > parent_ou -- 3". In some cases you may be able to contrive a custom operator in order to + fake such an expression. However this mechanism is neither very general nor very aesthetic. + To the right of a comparison operator, all function parameters must be literals or null. You can't pass a column value, nor can you nest function calls. + Likewise you can't include column values or arbitrary expressions in an IN list or a BETWEEN clause. + You can't include null values in an IN list or a BETWEEN list, not that you should ever want to. + As noted earlier: you can't use the comparison operators is distinct from or is not distinct from. + Also as noted earlier: a subquery in an IN clause cannot select more than one column. + + + JOIN clauses + Until now, our examples have selected from only one table at a time. As a result, the FROM clause has been very simple -- just a single string containing + the class name of the relevant table. + When the FROM clause joins multiple tables, the corresponding JSON naturally gets more complicated. + SQL provides two ways to define a join. One way is to list both tables in the FROM clause, and put the join conditions in the WHERE clause: + +SELECT + aou.id, + aout.name +FROM + actor.org_unit aou, + actor.org_unit_type aout +WHERE + aout.id = aou.ou_type; + + The other way is to use an explicit JOIN clause: + +SELECT + aou.id, + aout.name +FROM + actor.org_unit aou + JOIN actor.org_unit_type aout + ON ( aout.id = aou.ou_type ); + + JSON queries use only the second of these methods. The following example expresses the same query in JSON: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aou":"aout" + } +} + + First, let's review the SELECT clause. Since it selects rows from two different tables, the data for select includes two entries, one for each table. + As for the FROM clause, it's no longer just a string. It's a JSON object, with exactly one entry. The key of this entry is the class name of the core table, i.e. + the table named immediately after the FROM keyword. The data associated with this key contains the rest of the information about the join. In this simple example, + that information consists entirely of a string containing the class name of the other table. + So where is the join condition? + It's in the IDL. Upon reading the IDL, json_query knows that actor.org_unit has a foreign key pointing to actor.org_unit_type, and builds a join condition accordingly: + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) ; + + In this case the core table is the child table, and the joined table is the parent table. We could just as well have written it the other way around: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + } +} + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) ; + + + + Specifying The Join Columns Explicitly + While it's convenient to let json_query pick the join columns, it doesn't always work. + For example, the actor.org_unit table has four different address ids, for four different kinds of addresses. Each of them is a foreign key to the actor.org_address table. + Json_query can't guess which one you want if you don't tell it. + (Actually it will try to guess. It will pick the first matching link that it finds in the IDL, which may or may not be the one you want.) + Here's how to define exactly which columns you want for the join: + +{ + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aou": { + "aoa": { + "fkey":"holds_address", + "field":"id" + } + } + } +} + + Before, the table we were joining was represented merely by its class name. Now it's represented by an entry in a JSON object. The key of that entry is the + class name, and the associated data is another layer of JSON object containing the attributes of the join. + Later we'll encounter other kinds of join attributes. For now, the only attributes that we're looking at are the ones that identify the join columns: + fkey and field. The hard part is remembering which is which: + + + fkey identifies the join column from the left table; + + + field identifies the join column from the right table. + + + When there are only two tables involved, the core table is on the left, and the non-core table is on the right. In more complex queries neither table may be the + core table. + Here is the result of the preceding JSON: + +SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" +FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_address AS "aoa" + ON ( "aoa".id = "aou".holds_address ) ; + + In this example the child table is on the left and the parent table is on the right. We can swap the tables if we swap the join columns as well: + +{ + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "fkey":"id", + "field":"holds_address" + } + } + } +} + +SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" +FROM + actor.org_address AS "aoa" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".holds_address = "aoa".id ) ; + + When you specify both of the join columns, json_query assumes that you know what you're doing. It doesn't check the IDL to confirm that the join makes sense. + The burden is on you to avoid absurdities. + + + Specifying Only One Join Column + We just saw how to specify both ends of a join. It turns out that there's a shortcut -- most of the time you only need to specify one end. Consider + the following variation on the previous example: + +{ + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"holds_address" + } + } + } +} + + ..which results in exactly the same SQL as before. + Here we specified the join column from the child table, the column that is a foreign key pointing to another table. As long as that linkage is defined in the IDL, + json_query can look it up and figure out what the corresponding column is in the parent table. + However this shortcut doesn't work if you specify only the column in the parent table, because it would lead to ambiguities. Suppose we had specified the id + column of actor.org_address. As noted earlier, there are four different foreign keys from actor.org_unit to actor.org_address, and json_query would have no way to guess + which one we wanted. + + + Joining to Multiple Tables + So far we have joined only two tables at a time. What if we need to join one table to two different tables? + Here's an example: + +{ + "select": { "aou":[ "id" ], "aout":[ "depth" ], "aoa":[ "street1" ] }, + "from": { + "aou": { + "aout":{}, + "aoa": { + "fkey":"holds_address" + } + } + } +} + + The first join, to actor.org_unit_type, is simple. We could have specified join columns, but we don't have to, because json_query will construct that join on the basis of + what it finds in the IDL. Having no join attributes to specify, we leave that object empty. + For the second join, to actor.org_address, we have to specify at least the join column in the child table, as discussed earlier. We could also have specified the join + column from the parent table, but we don't have to, so we didn't. + Here is the resulting SQL: + +SELECT + "aou".id AS "id", + "aout".depth AS "depth", + "aoa".street1 AS "street1" +FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) + INNER JOIN actor.org_address AS "aoa" + ON ( "aoa".id = "aou".holds_address ) ; + + Since there can be only one core table, the outermost object in the FROM clause can have only one entry, whose key is the class name of the core table. The next + level has one entry for every table that's joined to the core table. + + + Nested Joins + Let's look at that last query again. It joins three tables, and the core table is the one in the middle. Can we make one of the end tables the core table instead? + Yes, we can: + +{ + "select": { "aou":[ "id" ], "aout":[ "depth" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"holds_address", + "join": { + "aout":{ "fkey":"ou_type" } + } + } + } + } +} + + The join attribute introduces another level of join. In this case "aou" is the left table for the nested join, and the right table for the original join. + Here are the results: + +SELECT + "aou".id AS "id", + "aout".depth AS "depth", + "aoa".street1 AS "street1" +FROM + actor.org_address AS "aoa" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".holds_address = "aoa".id ) + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) ; + + + + Outer Joins + By default, json_query constructs an inner join. If you need an outer join, you can add the join type as an attribute of the join: + Yes, we can: + +{ + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"mailing_address", + "type":"left" + } + } + } +} + + Here is the resulting SQL for this example: + +SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" +FROM + actor.org_address AS "aoa" + LEFT JOIN actor.org_unit AS "aou" + ON ( "aou".mailing_address = "aoa".id ) ; + + + + Referring to Joined Tables in the WHERE Clause + In the WHERE clause of the generated SQL, every column name is qualified by a table alias, which is always the corresponding class name. + If a column belongs to the core table, this qualification happens by default. If it belongs to a joined table, the JSON must specify what class name + to use for an alias. For example: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "+aou":{ "parent_ou":2 } + } +} + + Note the peculiar operator +aou -- a plus sign followed by the relevant class name. This operator tells json_query to apply the specified class to the condition that + follows. The result: + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) +WHERE + ( "aou".parent_ou = 2 ); + + The plus-class operator may apply to multiple conditions: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "+aou":{ + "parent_ou":2, + "id":{ "<":42 } + } + } +} + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) +WHERE + ( + "aou".parent_ou = 2 + AND "aou".id < 42 + ); + + For these artificial examples, it would have been simpler to swap the tables, so that actor.org_unit is the core table. Then you wouldn't need to go through any + special gyrations to apply the right table alias. In a more realistic case, however, you might need to apply conditions to both tables. Just swapping the tables + wouldn't solve the problem. + You can also use a plus-class operator to compare columns from two different tables: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "depth": { ">": { "+aou":"parent_ou" } } + } +} + + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) +WHERE + ( + "aout".depth > ( "aou".parent_ou ) + ); + + Please don't expect that query to make any sense. It doesn't. But it illustrates the syntax. + + + Join Filters + While the above approach certainly works, the special syntax needed is goofy and awkward. A somewhat cleaner solution is to include a condition in the JOIN clause: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "parent_ou":2 + } + } + } + } +} + +SELECT + "aou".id AS "id", "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id + AND "aou".parent_ou = 2 ) ; + + By default, json_query uses AND to combine the filter condition with the original join condition. If you need OR, you can use the filter_op attribute to + say so: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "parent_ou":2 + }, + "filter_op":"or" + } + } + } +} + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id + OR "aou".parent_ou = 2 ) ; + + If the data tagged by filter_op is anything but or (in upper, lower, or mixed case), json_query uses AND instead of OR. + The condition tagged by filter may be much more complicated. In fact it accepts all the same syntax as the WHERE clause. + Remember, though, that it all gets combined with the the original join condition with an AND, or with an OR if you so specify. If + you're not careful, the result may be a confusing mixture of AND and OR at the same level. + + + Joining to a Subquery + In SQL you can put a subquery in a FROM clause, and select from it as if it were a table. A JSON query has no way to do that directly. The IDL, however, + can define a class as a subquery instead of as a table. When you SELECT from it, json_query inserts the corresponding subquery into the FROM clause. For example: + +{ + "select":{ "iatc":[ "id", "dest", "copy_status" ] }, + "from": "iatc" +} + + There's nothing special-looking about this JSON, but json_query expands it as follows: + +SELECT + "iatc".id AS "id", + "iatc".dest AS "dest", + "iatc".copy_status AS "copy_status" +FROM + ( + SELECT t.* + FROM + action.transit_copy t + JOIN actor.org_unit AS s + ON (t.source = s.id) + JOIN actor.org_unit AS d + ON (t.dest = d.id) + WHERE + s.parent_ou <> d.parent_ou + ) AS "iatc" ; + + The iatc class is like a view, except that it's defined in the IDL instead of the database. In this case it provides a way to do a join that would otherwise be + impossible through a JSON query, because it joins the same table in two different ways (see the next subsection). + + + Things You Can't Do + In a JOIN, as with other SQL constructs, there are some things that you can't do with a JSON query. + In particular, you can't specify a table alias, because the table alias is always the class name. As a result: + + + You can't join a table to itself. For example, you can't join actor.org_unit to itself in order to select the name of the parent for every org_unit. + + + You can't join to the same table in more than one way. For example, you can't join actor.org_unit to actor.org_address through four different foreign + keys, to get four kinds of addresses in a single query. + + + The only workaround is to perform the join in a view, or in a subquery defined in the IDL as described in the previous subsection. + Some other things, while not impossible, require some ingenuity in the use of join filters. + For example: by default, json_query constructs a join condition using only a single pair of corresponding columns. As long as the database is designed accordingly, + a single pair of columns will normally suffice. If you ever need to join on more than one pair of columns, you can use join filters for the extras. + Likewise, join conditions are normally equalities. In raw SQL it is possible (though rarely useful) to base a join on an inequality, or to use a function call in a join + condition, or to omit any join condition in order to obtain a Cartesian product. If necessary, you can devise such unconventional joins by combining the normal join + conditions with join filters. + For example, here's how to get a Cartesian product: + +{ + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "ou_type":{ "<>": { "+aout":"id" } } + }, + "filter_op":"or" + } + } + } +} + + +SELECT + "aou".id AS "id", + "aout".name AS "name" +FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON + ( + "aou".ou_type = "aout".id + OR ("aou".ou_type <> ( "aout".id )) + ) ; + + Yes, it's ugly, but at least you're not likely to do it by accident. + + + Selecting from Functions + In SQL, you can put a function call in the FROM clause. The function may return multiple columns and multiple rows. Within the query, the function behaves like a table. + A JSON query can also select from a function: + +{ + "from": [ "actor.org_unit_ancestors", 5 ] +} + + The data associated with from is an array instead of a string or an object. The first element in the array specifies the name of the function. Subsequent elements, + if any, supply the parameters of the function; they must be literal values or nulls. + Here is the resulting query: + +SELECT * +FROM + actor.org_unit_ancestors( '5' ) AS "actor.org_unit_ancestors" ; + + In a JSON query this format is very limited, largely because the IDL knows nothing about the available functions. You can't join the function to a table or to + another function. If you try to supply a SELECT list or a WHERE clause, json_query will ignore it. The generated query will always select every column, via a wild card asterisk, + from every row. + + + The ORDER BY Clause + In most cases you can encode an ORDER BY clause as either an array or an object. Let's take a simple example and try it both ways. First the array: + +{ + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { "class":"aou", "field":"name" } + ] +} + + Now the object: + +{ + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": { + "aou":{ "name":{} } + } +} + + The results are identical from either version: + +SELECT + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +ORDER BY + "aou".name; + + The array format is more verbose, but as we shall see, it is also more flexible. It can do anything the object format can do, plus some things that the object + format can't do. + + + ORDER BY as an Array + In the array format, each element of the array is an object defining one of the sort fields. Each such object must include at least two tags: + + + The class tag provides the name of the class, which must be either the core class or a joined class. + + + The field tag provides the field name, corresponding to one of the columns of the class. + + + If you want to sort by multiple fields, just include a separate object for each field. + If you want to sort a field in descending order, add a direction tag: + +{ + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { + "class":"aou", + "field":"name", + "transform":"upper" + } + ] +} + + +SELECT + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +ORDER BY + upper("aou".name ); + + If you need additional parameters for the function, you can use the params tag to pass them: + +{ + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { + "class":"aou", + "field":"name", + "transform":"substr", + "params":[ 1, 8 ] + } + ] +} + + The additional parameters appear as elements in an array. They may be numbers, strings, or nulls. + +SELECT + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +ORDER BY + substr("aou".name,'1','8' ); + + As we have seen elsewhere, all literal values are passed as quoted strings, even if they are numbers. + If the function returns multiple columns, you can use the result_field tag to indicate which one you want (not shown). + + + + ORDER BY as an Object + When you encode the ORDER BY clause as an object, the keys of the object are class names. Each class must be either the core class or a joined class. The data for + each class can be either an array or another layer of object. Here's an example with one of each: + +{ + "select":{ "aout":"id", "aou":[ "name" ] }, + "from": { "aou":"aout" }, + "order_by": { + "aout":[ "id" ], + "aou":{ "name":{ "direction":"desc" } } + } +} + + For the aout class, the associated array is simply a list of field names (in this case, just one). Naturally, each field must reside in the class with which + it is associated. + However, a list of field names provides no way to specify the direction of sorting, or a transforming function. You can add those details only if the class + name is paired with an object, as in the example for the "aou" class. The keys for such an object are field names, and the associated tags define other details. + In this example, we use the direction" tag to specify that the name field be sorted in descending order. This tag works the same way here as described earlier. + If the associated string starts with "D" or "d", the sort will be descending; otherwise it will be ascending. + Here is the resulting SQL: + +SELECT + "aou".name AS "name" +FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) +ORDER BY + "aout".id, + "aou".name DESC; + + +{ + "select":{ "aou":[ "name", "id" ] }, + "from": "aou", + "order_by": { + "aou":{ + "name":{ "transform":"substr", "params":[ 1, 8 ] } + } + } +} + +SELECT + "aou".name AS "name", + "aou".id AS "id" +FROM + actor.org_unit AS "aou" +ORDER BY + substr("aou".name,'1','8' ); + + + + Things You Can't Do + If you encode the ORDER BY clause as an object, you may encounter a couple of restrictions. + Because the key of such an object is the class name, all the fields from a given class must be grouped together. You can't sort by a column from one table, followed by + a column from another table, followed by a column from the first table. If you need such a sort, you must encode the ORDER BY clause in the array format, which suffers + from no such restrictions. + For similar reasons, with an ORDER BY clause encoded as an object, you can't reference the same column more than once. Although such a sort may seem perverse, + there are situations where it can be useful, provided that the column is passed to a transforming function. + For example, you might want a case-insensitive sort, except that for any given letter you want lower case to sort first. For example, you want diBona to sort + before Dibona. Here's a way to do that, coding the ORDER BY clause as an array: + +{ + "select":{ "au":[ "family_name", "id" ] }, + "from": "au", + "order_by": [ + { "class":"au", "field":"family_name", "transform":"upper" }, + { "class":"au", "field":"family_name" } + ] +} +SELECT + "au".family_name AS "family_name", + "au".id AS "id" +FROM + actor.usr AS "au" +ORDER BY + upper("au".family_name ), + "au".family_name; + + Such a sort is not possible where the ORDER BY clause is coded as an object. + + + The GROUP BY Clause + A JSON query has no separate construct to define a GROUP BY clause. Instead, the necessary information is distributed across the SELECT clause. However, + the way it works is a bit backwards from what you might expect, so pay attention. + Here's an example: + +{ + "select": { + "aou": [ + { "column":"parent_ou" }, + { "column":"name", "transform":"max", "aggregate":true } + ] + }, + "from": "aou" +} + + The transform tag is there just to give us an excuse to do a GROUP BY. What's important to notice is the aggregate tag. + Here's the resulting SQL: + +SELECT + "aou".parent_ou AS "parent_ou", + max("aou".name ) AS "name" +FROM + actor.org_unit AS "aou" +GROUP BY + 1; + + The GROUP BY clause references fields from the SELECT clause by numerical reference, instead of by repeating them. Notice that the field it references, + parent_ou, is the one that doesn't carry the aggregate tag in the JSON. + Let's state that more generally. The GROUP BY clause includes only the fields that do not carry the aggregate tag (or that carry it with a value of false). + However, that logic applies only when some field somewhere does carry the aggregate tag, with a value of true. If there is no aggregate tag, or + it appears only with a value of false, then there is no GROUP BY clause. + If you really want to include every field in the GROUP BY clause, don't use aggregate. Use the distinct tag, as described in the next section. + + + The DISTINCT Clause + JSON queries don't generate DISTINCT clauses. However, they can generate GROUP BY clauses that include every item from the SELECT clause. The effect is the same as + applying DISTINCT to the entire SELECT clause. + For example: + +{ + "select": { + "aou": [ + "parent_ou", + "ou_type" + ] + }, + "from":"aou", + "distinct":"true" +} + + Note the distinct entry at the top level of the query object, with a value of true. + +SELECT + "aou".parent_ou AS "parent_ou", + "aou".ou_type AS "ou_type" +FROM + actor.org_unit AS "aou" +GROUP BY + 1, 2; + + The generated GROUP BY clause references every column in the SELECT clause by number. + + + The HAVING Clause + For a HAVING clause, add a having entry at the top level of the query object. For the associated data, you can use all the same syntax + that you can use for a WHERE clause. + Here's a simple example: + +{ + "select": { + "aou": [ + "parent_ou", { + "column":"id", + "transform":"count", + "alias":"id_count", + "aggregate":"true" + } + ] + }, + "from":"aou", + "having": { + "id": { + ">" : { + "transform":"count", + "value":6 + } + } + } +} + + We use the aggregate tag in the SELECT clause to give us a GROUP BY to go with the HAVING. Results: + +SELECT + "aou".parent_ou AS "parent_ou", + count("aou".id ) AS "id_count" +FROM + actor.org_unit AS "aou" +GROUP BY + 1 +HAVING + count("aou".id ) > 6 ; + + In raw SQL we could have referred to count( 1 ). But since JSON queries cannot encode arbitrary expressions, we applied the count function to a column that + cannot be null. + + + The LIMIT and OFFSET Clauses + To add an LIMIT or OFFSET clause, add an entry to the top level of a query object. For example: + +{ + "select": { + "aou": [ "id", "name" ] + }, + "from":"aou", + "order_by": { "aou":[ "id" ] }, + "offset": 7, + "limit": 42 +} + + The data associated with offset and limit may be either a number or a string, but if it's a string, it should have a number inside. + Result: + +SELECT + "aou".id AS "id", + "aou".name AS "name" +FROM + actor.org_unit AS "aou" +ORDER BY + "aou".id +LIMIT 42 +OFFSET 7; + + + diff --git a/2.0/development/supercat.xml b/2.0/development/supercat.xml new file mode 100644 index 0000000..484651b --- /dev/null +++ b/2.0/development/supercat.xml @@ -0,0 +1,163 @@ + + + + SuperCat + +
> + Using SuperCat + SuperCat + SuperCat allows Evergreen record and information retrieval from a web browser using a based on a number of open web standards and formats. The following record types are + supported: + + isbn + metarecord + record + + + Return a list of ISBNs for related records + SuperCatISBNs + Similar to the OCLC xISBN service, Evergreen can return a list of related records based on its oISBN algorithm: + http://<hostname>/opac/extras/osibn/<ISBN> + For example, http://dev.gapines.org/opac/extras/oisbn/0439136350 returns: + +<idlist metarecord="302670"> +<isbn record="250060">0790783525</isbn> +<isbn record="20717">0736691316</isbn> +<isbn record="250045">0790783517</isbn> +<isbn record="199060">9500421151</isbn> +<isbn record="250061">0790783495</isbn> +<isbn record="154477">0807286028</isbn> +<isbn record="227297">1594130027</isbn> +<isbn record="26682">0786222743</isbn> +<isbn record="17179">0807282316</isbn> +<isbn record="34885">0807282316</isbn> +<isbn record="118019">8478885196</isbn> +<isbn record="1231">0738301477</isbn> +</idlist> + + + + Return records + SuperCatrecords + SuperCat can return records and metarecords in many different formats (see + http://<hostname>/opac/extras/supercat/retrieve/<format>/<record-type>/<bib-ID> + For example, http://dev.gapines.org/opac/extras/supercat/retrieve/mods/record/555 returns: + +<mods:modsCollection version="3.0"> + <mods:mods xsi:schemaLocation="http://www.loc.gov/mods/ http://www.loc.gov/standards/mods/mods.xsd"> + <titleInfo> + <title>More Brer Rabbit stories /</title> + </titleInfo> + <typeOfResource>text</typeOfResource> + <originInfo> + <place> + <code authority="marc">xx</c0de> + </place> + <publisher>Award Publications</publisher> + <dateIssued>c1982, 1983</dateIssued> + <dateIssued encoding="marc" point="start">1983</dateIssued> + <dateIssued encoding="marc" point="end">1982</dateIssued> + <issuance>monographic</issuance> + </originInfo> + <language authority="iso639-2b">eng</language> + <physicalDescription> + <form authority="marcform">print</form> + <extent>unp. : col. ill.</extent> + </physicalDescription> + <note type="statement of responsibility">ill. by Rene Cloke.</note> + <subject authority="lcsh"> + <topic>Animals</topic> + <topic>Fiction</topic> + </subject> + <subject authority="lcsh"> + <topic>Fables</topic> + </subject> + <recordInfo> + <recordContentSource>(BRO)</recordContentSource> + <recordCreationDate encoding="marc">930903</recordCreationDate> + <recordChangeDate encoding="iso8601">19990703024637.0</recordChangeDate> + <recordIdentifier>PIN60000007 </recordIdentifier> + </recordInfo> + </mods:mods> +</mods:modsCollection> + + + + Return a feed of recently edited or created records + SuperCatrecent records + SuperCat can return feeds of recently edited or created authority and bibliographic records: + http://<hostname>/opac/extras/feed/freshmeat/<feed-type>/[authority|biblio]/[import|edit]/<limit>/<date> + The limit records imported or edited following the supplied date will be returned. If you do not supply a date, then the most recent limit records will be returned. + If you do not supply a limit, then up to 10 records will be returned. + Feed-type can be one of atom, html, htmlholdings, marcxml, mods, mods3, or rss2. + For example, http://dev.gapines.org/opac/extras/feed/freshmeat/atom/biblio/import/10/2008-01-01 + + + Browse records + SuperCat can browse records in HTML and XML formats: + http://<hostname>/opac/extras/supercat/browse/<format>/call_number/<org_unit>/<call_number> + For example, http://dev.gapines.org/opac/extras/browse/xml/call_number/-/GV returns: + +<hold:volumes xmlns:hold='http://open-ils.org/spec/holdings/v1'> + <hold:volume id="tag:open-ils.org,2008:asset-call_number/130607" lib="FRRLS-FA" label="GUTCHEON BETH"> + <act:owning_lib id="tag:open-ils.org,2008:actor-org_unit/111" name="Fayette County Public Library"/> + <record xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/ + standards/marcxml/schema/MARC21slim.xsd" + id="tag:open-ils.org,2008:biblio-record_entry/21669/FRRLS-FA"> + <leader>09319pam a2200961 a 4500</leader> + <controlfield tag="001"/> + <controlfield tag="005">20000302124754.0</controlfield> + <controlfield tag="008">990817s2000 nyu 000 1 eng </controlfield> + <datafield tag="010" ind1=" " ind2=" "> + <subfield code="a"> 99045936</subfield> + </datafield> + .. + </record> + <record> + .. + </record> + </hold:volume> +</hold:volumes> + + + + Supported formats + SuperCatformats + SuperCat maintains a list of supported formats for records and metarecords: + http://<hostname>/opac/extras/supercat/formats/<record-type> + For example, http://dev.gapines.org/opac/extras/supercat/formats/record returns: + +<formats> + <format> + <name>opac</name> + <type>text/html</type> + </format> + <format> + <name>htmlholdings</name> + <type>text/html</type> + </format> +... + + +
+
+ Adding new SuperCat Formats + SuperCatformatsadding + Adding SuperCat formats requires experience editing XSL files and familiarity with XML and Perl. + SuperCat web services are based on the OpenSRF service, >open-ils.supercat. + Developers are able to add new formats by adding the xsl stylesheet for the format. By default, the location of the stylesheets is /openils/var/xsl/. You must also add the feed to the perl + modules openils/lib/perl5/OpenILS/WWW/SuperCat/feed.pm and openils/lib/perl5/OpenILS/WWW/SuperCat.pm. An Evergreen restart is + required for the feed to be activated. + Use an existing xsl stylesheet and Perl module entry as a template for your new format. +
+
+ Customizing SuperCat Formats + SuperCatformatscustomizing + Editing SuperCat formats requires experience editing XSL files and familiarity with XML.. + It is possible to customize existing supercat formats using XSL stylesheets. You are able to change the content to be displayed and the design of the pages. + In order to change the display of a specific format, edit the corresponding XSL file(s) for the particular format. The default location for the XSL stylesheets is + /openils/var/xsl/. +
+
+ diff --git a/2.0/root.xml b/2.0/root.xml index 7ddee42..fac0ddd 100755 --- a/2.0/root.xml +++ b/2.0/root.xml @@ -76,6 +76,15 @@ Development + + + + + + + + + -- 2.11.0