From: dbs Date: Mon, 26 Oct 2009 14:28:57 +0000 (+0000) Subject: Resurrect JSON grammar and tutorial that were blown away by r13990. X-Git-Url: https://old-git.evergreen-ils.org/?a=commitdiff_plain;h=a74e01901bbde68f23f1cd9c76fd7e85d4ce582b;p=evergreen%2Fmasslnc.git Resurrect JSON grammar and tutorial that were blown away by r13990. git-svn-id: svn://svn.open-ils.org/ILS/trunk@14602 dcc99617-32d9-48b4-a31d-7c20da2025e4 --- diff --git a/docs/TechRef/JSONGrammar.xml b/docs/TechRef/JSONGrammar.xml new file mode 100644 index 0000000000..4812128c22 --- /dev/null +++ b/docs/TechRef/JSONGrammar.xml @@ -0,0 +1,1223 @@ + + + + + Grammar of JSON Queries + + + + + Scott + McKellar + + + Equinox Software, Inc. + + + + + + + Introduction + The format of this grammar approximates Extended Backus-Naur notation. However it is + intended as input to human beings, not to parser generators such as Lex or Yacc. Do not + expect formal rigor. Sometimes narrative text will explain things that are clumsy to + express in formal notation. More often, the text will restate or summarize the formal + productions. + Conventions: + + + The grammar is a series of productions. + + + A production consists of a name, followed by "::=", followed by a definition + for the name. The name identifies a grammatical construct that can appear on the + right side of another production. + + + Literals (including punctuation) are enclosed in 'single quotes', or in + "double quotes" if case is not significant. + + + A single quotation mark within a literal is escaped with a preceding + backslash: 'dog\'s tail'. + + + If a construct can be defined more than one way, then the alternatives may + appear in separate productions; or, they may appear in the same production, + separated by pipe symbols. The choice between these representations is of only + cosmetic significance. + + + A construct enclosed within square brackets is optional. + + + A construct enclosed within curly braces may be repeated zero or more + times. + + + JSON allows arbitrary white space between tokens. To avoid ugly clutter, this + grammar ignores the optional white space. + + + In many cases a production defines a JSON object, i.e. a list of name-value + pairs, separated by commas. Since the order of these name/value pairs is not + significant, the grammar will not try to show all the possible sequences. In + general it will present the required pairs first, if any, followed by any + optional elements. + + + + Since both EBNF and JSON use curly braces and square brackets, pay close attention to + whether these characters are in single quotes. If they're in single quotes, they are + literal elements of the JSON notation. Otherwise they are elements of the EBNF notation. + + + + + Primitives + We'll start by defining some primitives, to get them out of the way. They're mostly + just what you would expect. + + + + string + '"' chars '"' + + + + chars + any valid sequence of UTF-8 characters, with certain special characters + escaped according to JSON rules + + + + integer_literal + [ sign ] digit { digit } + + + + sign + '+' | '-' + + + + digit + digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' + + + + integer_string + '"' integer_literal '"' + + + + integer + integer_literal | integer_string + + + + number + any valid character sequence that is numeric according to JSON rules + + + + + When json_query requires an integral value, it will usually accept a quoted string + and convert it to an integer by brute force – to zero if necessary. Likewise it may + truncate a floating point number to an integral value. Scientific notation will be + accepted but may not give the intended results. + + + + + boolean + 'true' | 'false' | string | number + + + + + The preferred way to encode a boolean is with the JSON reserved word true or false, + in lower case without quotation marks. The string true, in upper, + lower, or mixed case, is another way to encode true. Any other string evaluates to + false. + As an accommodation to perl, numbers may be used as booleans. A numeric value of 1 + means true, and any other numeric value means false. + Any other valid JSON value, such as an array, will be accepted as a boolean but + interpreted as false. + The last couple of primitives aren't really very primitive, but we introduce them + here for convenience: + + + + + class_name + string + + + + + A class_name is a special case of a string: the name of a class as defined by the + IDL. The class may refer either to a database table or to a source_definition, which is + a subquery. + + + + + field_name + string + + + + + A field_name is another special case of a string: the name of a non-virtual field as + defined by the IDL. A field_name is also a column name for the table corresponding to + the relevant class. + + + + + Query + + The following production applies not only to the main query but also to most + subqueries. + + + + + query + '{' '"from"' ':' from_list [ ',' '"select"' ':' select_list + ] [ ',' '"where"' ':' where_condition ] [ ',' '"having"' ':' + where_condition ] [ ',' '"order_by"' ':' order_by_list ] [ ',' + '"limit"' ':' integer ] [ ',' '"offset"' ':' integer ] [ ',' + '"distinct"' ':' boolean ] [ ',' '"no_i18n"' ':' boolean ] '}' + + + + + + Except for the "distinct" and no_i18n entries, + each name/value pair represents a major clause of the SELECT statement. The name/value + pairs may appear in any order. + There is no name/value pair for the GROUP BY clause, because json_query generates it + automatically according to information encoded elsewhere. + The "distinct" entry, if present and true, tells json_query that + it may have to create a GROUP BY clause. If not present, it defaults to false. + The "no_i18n" entry, if present and true, tells json_query to + suppress internationalization. If not present, it defaults to false. (Note that + "no_i18n" contains the digit one, not the letter ell.) + The values for limit and offset provide the + arguments of the LIMIT and OFFSET clauses, respectively, of the SQL statement. Each + value should be non-negative, if present, or else the SQL won't work. + + + + FROM Clause + + The object identified by “from” encodes the FROM clause of + the SQL. The associated value may be a string, an array, or a JSON object. + + + + + + from_list + class_name + + + + + + If from_list is a class_name, the + json_query inserts the corresponding table name or subquery into the FROM + clause, using the class_name as an alias for the table + or subquery. The class must be defined as non-virtual in the IDL. + + + + + + from_list + '[' string { ',' parameter } ']' + + + + parameter + string | number | 'null' + + + + + + If from_list is a JSON array, then it represents a table-like function from + which the SQL statement will select rows, using a SELECT clause consisting + of “SELECT *” (regardless of the select_list supplied by the method parameter). + + + The first entry in the array is the name of the function. It must be a string + naming a stored function. Each subsequent entry is a function parameter. If + it is a string or a number, json_query will insert it into a comma-separated + parameter list, enclosed in quotes, with any special characters escaped as needed. + If it is the JSON reserved word null, json_query will insert + it into the parameter list as a null value. + + + If from_list is a JSON object, it must contain exactly one entry. + The key of this entry must be the name of a non-virtual class defined in the IDL. + This class will be the top-level class of the FROM clause, the only one named + outside of a JOIN clause. + + + + + + from_list + '{' class_name ':' join_list '}' + + + + join_list + class_name + + + + join_list + '{' join_def { ',' join_def } '}' + + + + + + If the associated data is a class_name, json_query will + construct an INNER JOIN clause joining the class to the top-level clause, + using the columns specified by the IDL for such a join. + + + Otherwise, the associated data must be a JSON object with one or more entries, + each entry defining a join: + + + + + + join_def + + class_name ':' + '{' + [ '”type”' ':' string ] + [ '”field”' ':' field_name ] + [ '”fkey”' ':' field_name ] + [ '”filter”' ':' where_condition ] + [ '”filter_op”' ':' string ] + [ '”join”' ':' join_list ] + '}' + + + + + + + The data portion of the “join_type” entry tells json_query + whether to use a left join, right join, full join, or inner join. The values + “left”, “right”, and “full”, + in upper, lower, or mixed case, have the obvious meanings. If the + “join_type” entry has any other value, or is not present, + json_query constructs an inner join. + + + The “field” and “fkey” attributes specify the + columns to be equated in the join condition. The “field” + attribute refers to the column in the joined table, i.e. the one named by the + join_def. The “fkey” attribute refers to the + corresponding column in the other table, i.e. the one named outside the + join_def – either the top-level table or a table named by some + other join_def. + + + It may be tempting to suppose that “fkey” stands for “foreign key”, + and therefore refers to a column in the child table that points to the key of a + parent table. Resist the temptation; the labels are arbitrary. The json_query + method doesn't care which table is the parent and which is the child. + + + These relationships are best explained with an example. The following + from_list: + + + + + { + "aou": { + "asv": { + "type" : "left", + "fkey" : "id", + "field" : "owner" + } + } + } + + + + + ...turns into the following FROM clause: + + + + + FROM + actor.org_unit AS "aou" + LEFT JOIN action.survey AS "asv" + ON ( "asv".owner = "aou".id ) + + + + + Note in this example that “fkey” refers to a column of the + class “aou”, and “field” refers to a + column of the class “asv”. + + + If you specify only one of the two columns, json_query will try to identify the + other one from the IDL. However, if you specify only the column from the parent + table, this attempt will probably fail. + + + If you specify both columns, json_query will use the column names you specify, + without verifying them with a lookup in the IDL. By this means you can perform + a join using a linkage that the IDL doesn't define. Of course, if the columns + don't exist in the database, the query will fail when json_query tries to execute it. + + + Using the columns specified, either explicitly or implicitly, the json_query + method constructs a join condition. With raw SQL it is possible (though + rarely useful) to join two tables by an inequality. However the json_query + method always uses a simple equality condition. + + + Using a “filter” entry in the join_def, you can apply one + or more additional conditions to the JOIN clause, typically to restrict the + join to certain rows of the joined table. The data associated with the + “filter” key is the same sort of + where_condition that you use for a WHERE clause + (discussed below). + + + If the string associated with the “filter_op” entry is + “OR” in upper, lower, or mixed case, then the json_query + method uses OR to connect the standard join condition to any additional + conditions supplied by a “filter” entry. + + + (Note that if the where_condition supplies multiple + conditions, they will be connected by AND. You will probably want to move + them down a layer – enclose them in parentheses, in effect – to avoid a + confusing mixture of ANDs and ORs.) + + + If the “filter_op” entry carries any other value, or if + it is absent, then the json_query method uses AND. In the absence of a + “filter” entry, “filter_op” has no effect. + + + A “join” entry in a join_def specifies + another layer of join. The class named in the subjoin is joined to the class + named by the join_def to which it is subordinate. By this + means you can encode multiple joins in a hierarchy. + + + + SELECT Clause + + If a query does not contain an entry for “select”, json_query + will construct a default SELECT clause. The default includes every non-virtual + field from the top-level class of the FROM clause, as defined by the IDL. The + result is similar to SELECT *, except: + + + + + The default includes only the fields defined in the IDL. + + + The columns will appear in the same order in which they appear in the IDL, + regardless of the order in which the database defines them. + + + + + There are other ways to specify a default SELECT list, as shown below. + + + If a "select" entry is present, the associated value must + be a JSON object, keyed on class names: + + + + + + select_list + '{' class_name ':' field_list { ',' class_name ':' field_list } '}' + + + + + + The class_name must identify either the top-level class or + a class belonging to one of the joins. Otherwise json_query will silently + ignore the select_list. + + + + + + field_list + 'null' | '”*”' + + + + + + If a field_list is either the JSON reserved word null + (in lower case) or an asterisk in double quotes, json_query constructs a + default SELECT list – provided that the class is the top-level class of the + query. If the class belongs to a join somewhere, json_query ignores the + field_list. + + + More commonly, the field_list is a JSON array of zero or + more field specifications: + + + + + + field_list + '[' [ field_spec { ',' field_spec } ] ']' + + + + + + If the array is empty, json_query will construct a default SELECT list for + the class – again, provided that the class is the top-level class in the query. + + + In the simplest case, a field specification may name a non-virtual field + defined in the IDL: + + + + + + field_spec + field_name + + + + + + In some cases json_query constructs a call to the + oils_i18n_xlate function to internationalize the value of the + selected column. Specifically, it does so if all the following are true: + + + + + the settings file defines a locale; + + + in the field definition for the field in the IDL, the tag + “il8n” is present and true; + + + the query does not include the + "no_il8n" tag (or includes it with a value of false). + + + + + A field specification may be a JSON object: + + + + + + field_spec + + '{' + '”column”' ':' + [ ',' '”alias”' ':' string ] + [ ',' '”aggregate”' ':' boolean ] + [ ',' transform_spec ] + '}' + + + + + + + The “column” entry provides the column name, which must + be defined as non-virtual in the IDL. + + + The “alias” entry provides a column alias. If no alias + is specified, json_query uses the column name as its own alias. + + + The “aggregate” entry has no effect on the SELECT clause + itself. Rather, it affects the construction of a GROUP BY class. If there + is an “aggregate” entry for any field, then json_query builds + a GROUP BY clause listing every column that is not tagged + for aggregation (or that carries an “aggregate” entry with + a value of false). If all columns are tagged for + aggregation, then json_query omits the GROUP BY clause. + + + + + + transform_spec + + '”transform”' ':' string ] + [ ',' '”result_field” ':' string ] + [ ',' '”params” ':' param_list ] + + + + + + + When a transform_spec is present, json_query selects the + return value of a function instead of selecting the column directly. The entry + for “transform” provides the name of the function, and the + column name (as specified by the “column” tag), qualified by + the class name, is the argument to the function. For example, you might use such + a function to format a date or time, or otherwise transform a column value. + You might also use an aggregate function such as SUM, COUNT, or MAX (possibly + together with the “aggregate” tag). + + + The “result_field” entry, when present, specifies a subcolumn + of the function's return value. The resulting SQL encloses the function call + in parentheses, and follows it with a period and the subcolumn name. + + + The “params” entry, if present, provides a possibly empty + array of additional parameter values, either strings, numbers, or nulls: + + + + + + param_list + '[' [ parameter { ',' parameter } ] ']' + + + + + + Such parameter values are enclosed in single quotes, with any special characters + escaped as needed, and inserted after the column name as additional parameters + to the function. You might, for example, use an additional parameter to provide + a format string for a reformatting function. + + + + WHERE Clause + There are two types of where_condition: objects and arrays. + Of these, the object type is the more fundamental, and occurs at some level in every + where_condition. The array type is mainly a way of circumventing + a limitation of the object type. + The object type of where_condition is a comma-separated list + of one or more conditions: + + + + + where_condition + '{' condition { ',' condition } '}' + + + + + The generated SQL will include a code fragment for each condition, + joined by AND (or in some cases by OR, as described below). + As usual for entries in a JSON object, each condition consists + of a unique string to serve as a key, a colon, and an associated value. + The key string may be the name of a column belonging to the relevant table, or + it may be an operator string. In order to distinguish it from any possible column + name, an operator string always begins with a plus sign or minus sign. + JSON requires that every key string be unique within an object. This requirement + imposes some awkward limitations on a JSON query. For example, you might want to + express two conditions for the same column: id > 10 and id != 25. Since each of + those conditions would have the same key string, namely “id”, you can't put them + into the same JSON object. + The solution is to put such conflicting conditions in separate JSON objects, and + put the objects into an array: + + + + + where_condition + '[' where_condition { ',' where_condition } ']' + + + + + The resulting SQL encloses each subordinate set of conditions + in parentheses, and connects the sets with AND (or in some cases OR, as described + below). It's possible to put only a single where_condition in + the array; the result is to add a layer of parentheses around the condition. + There are two kinds of condition where the operator begins + with a plus sign. In the simpler case, the associated data is simply a column name: + + + + + + condition + plus_class ':' field_name + + + + + A plus_class is a string that begins with a plus sign. + The rest of the string, after the plus sign, must be the class name for the table + to which the column belongs. + If the column is a boolean, then the resulting SQL uses it (as qualified by the + class name) as a stand-alone condition. + Otherwise, this kind of syntax provides a way to place a column on the right side + of a comparison operator. For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + ">": { "+aou":"parent_ou" } + } + } + } + + + + The resulting SQL: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( + "aou".id > ( "aou".parent_ou ) + ); + + + + The other type of condition that uses a + plus_class applies a specified class name to a + where_condition: + + + + + condition + plus_class ':' where_condition + + + + + The resulting SQL is enclosed in parentheses, and qualifies the columns with + the specified class name. This syntax provides a mechanism to shift the class + context – i.e. to refer to one class in a context that would otherwise refer to + a different class. + Ordinarily the class name must be a valid non-virtual class defined in the IDL, + and applicable to the associated where_condition. There is at + least one peculiar exception. The JSON fragment: + + + + "+abc": { "+xyz":"frobozz" } + + + + ...is rendered as: + + + + ( "xyz".frobozz ) + + + + ...even though neither “abc”, nor “xyz”, + nor “frobozz” is defined in the IDL. The class name + “abc” isn't used at all because the “+xyz” + operator overrides it. Such a query won't fail until json_query tries + to execute it in the database. + The other operators that may occur at this level all begin with a minus sign, + and they all represent familiar SQL operators. For example, the + “-or” operator joins the conditions within a + where_condition by OR (instead of the default AND), and + encloses them all in parentheses: + + + + + condition + '”-or”' ':' where_condition + + + + + In fact the “-or” operator is the only way to get OR into + the WHERE clause. + The “-and” operator is similar, except that it uses AND: + + + + + condition + '”-and”' ':' where_condition + + + + + Arguably the “-and” operator is redundant, because you can + get the same effect by wrapping the subordinate where_condition + in a JSON array. Either technique merely adds a layer of parentheses, since AND + connects successive conditions by default. + The “-not” operator expands the subordinate + where_condition within parentheses, and prefaces the result + with NOT: + + + + + condition + '”-not”' ':' where_condition + + + + + The “-exists” or “-not-exists” operator + constructs a subquery within an EXISTS or NOT EXISTS clause, respectively: + + + + + condition + '”-exists”' ':' query + + + + condition + '”-not-exists”' ':' query + + + + + The remaining kinds of condition all have a + field_name on the left and some kind of predicate + on the right. A predicate places a constraint on the value of + the column – or, in some cases, on the value of the column as transformed by some + function call: + + + + + condition + field_name ':' predicate + + + + + The simplest such constraint is to require that the column have a specific value, + or be null: + + + + + predicate + lit_value | 'null' + + + + lit_value + string | number + + + + + You can also compare a column to a literal value using some kind of inequality. + However it's a bit more complicated because you have to specify what kind of comparison + to make: + + + + + predicate + '{' compare_op ':' lit_value '}' + + + + compare_op + string + + + + + A compare_op is a string that defines a comparison operator. + Valid values include the following: + + + = <> != + < > <= >= + ~ ~* !~ !~* + like ilike + similar to + + + Strictly speaking, json_query accepts any compare_op + that doesn't contain semicolons or white space (or + “similar to” as a special exception). As a result, it + is possible – and potentially useful – to use a custom operator like + “>100*” in order to insert an expression that would + otherwise be difficult or impossible to create through a JSON query. The ban + on semicolons and white space prevents certain kinds of SQL injection. + Note that json_query does not accept two operators that + PostgreSQL does accept: “is distinct from” + and “is not distinct from”. + You can also compare a column to a null value: + + + + + predicate + '{' compare_op ':' 'null' '}' + + + + + The equality operator “=” turns into IS NULL. Any other + operator turns into IS NOT NULL. + When a compare_op is paired with an array, it defines a + function call: + + + + + predicate + '{' compare_op ':' '[' string { ',' parameter } ']' '}' + + + + + The first entry in the array is the function's name. Subsequent entries in + the array, if any, represent the parameters of the function call. They may be + strings, numbers, or nulls. In the generated SQL, the function call appears on + the right of the comparison. + The “between” operator creates a BETWEEN clause: + + + + + predicate + '{' “between” ':' '[' lit_value ',' lit_value ']' '}' + + + + + Although PostgreSQL allows a null value in a BETWEEN clause, json_query + requires literal non-null values. It isn't sensible to use null values in a + BETWEEN clause. A few experiments show that the results of the comparison are + peculiar and erratic. + There are two ways to create an IN list of allowed values. The simplest is + to put literal values into a JSON array: + + + + + predicate + '[' lit_value { ',' lit_value } ']' + + + + + As with BETWEEN clauses, json_query does not accept null values in an IN list, + even though PostgreSQL does allow them. Nulls are not sensible in this context + because they never match anything. + + + + Having Clause + For the HAVING clause, json_query accepts exactly the same syntax as it accepts for + the WHERE clause. + The other way to create an IN list is to use an explicit + “in” operator with an array of literal values. This format + also works for the “not in” operator: + + + + + predicate + '{' in_operator ';' '[' lit_value [ ',' lit_value ] ']' '}' + + + + in_operator + “in” | “not in” + + + + + Another kind of IN or NOT IN clause uses a subquery instead of a list of + values: + + + + + predicate + '{' in_operator ':' query '}' + + + + + The remaining types of predicate can put a function call on + the left of the comparison, by using a transform_spec together + with a “value” tag. The transform_spec is + optional, and if you don't need it, the same SQL would in many cases be easier to + express by other means. + The transform_spec construct was described earlier in + connection with the SELECT clause, but here it is again: + + + + + transform_spec + + '”transform”' ':' string ] + [ ',' '”result_field” ':' string ] + [ ',' '”params” ':' param_list ] + + + + + + As in the SELECT clause, the “transform” string names the + function. The first parameter is always the column identified by the field_name. + Additional parameters, if any, appear in the param_list. The + “result_field” string, if present, identifies one column of a + multicolumn return value. + Here's a second way to compare a value to a literal value (but not to a null + value): + + + + + predicate + '{' compare_op ':' '{' '”value”' ':' lit_value + [ transform_spec ] '}' '}' + + + + + ...and a way to compare a value to a boolean expression: + + + + + predicate + '{' compare_op ':' '{' '”value”' ':' '{' + condition { ',' condition } [ transform_spec ] '}' '}' + + + + + The final predicate is another way to put a function call on the right side + of the comparison: + + + + + predicate + '{' compare_op ':' '{' '”value”' ':' '[' + string { ',' parameter } ']' [ transform_spec ] '}' '}' + + + + + This format is available for the sake of consistency, but offers no advantage + over the simpler version. + + + ORDER BY Clause + There are two ways to encode an ORDER BY clause: as an array, or as a list. + Either may be empty, in which case the generated SQL will not include an ORDER BY + clause: + + + + + order_by_list + '[' ']' | '{' '}' + + + + + If not empty, the array contains one or more objects, each defining a sort + field: + + + + + order_by_list + '{' sort_field_def { ',' sort_field_def } '}' + + + + sort_field_def + '{' + '”class”' ':' class_name + ',' '”field”' ':' field_name + [ ',' '”direction”' ':' lit_value ] + [ ',' transform_spec ] + '}' + + + + + The “class” and “field” entries are + required, and of course the field must belong to the class. Furthermore, at + least one field from the class must appear in the SELECT clause. + The “direction” entry, if present, specifies whether the + sort will be ascending or descending for the column in question. If the associated + value begins with “D” or “d”, the sort will be descending; otherwise the sort will + be ascending. If the value is a number, it will be treated as a string that does not + start with “D” or “d”, resulting in an ascending sort. + In the absence of a “direction” entry, the sort will be + ascending. + The transform_spec works here the same way it works in the + SELECT clause and the WHERE clause, enabling you to pass the column through a + transforming function before the sort: + + + + + transform_spec + + '”transform”' ':' string ] + [ ',' '”result_field” ':' string ] + [ ',' '”params” ':' param_list ] + + + + + + When the order_by_list is an object instead of an array, + the syntax is less verbose, but also less flexible. The keys for the object are + class names: + + + + + order_by_list + '{' class_name ':' sort_class_def + { ',' class_name ':' sort_class_def } '}' + + + + + Each class must be referenced in the SELECT clause. + As in the SELECT clause, all the fields for a given class must be grouped + together. You can't sort by a column from one table, then a column from a second + table, then another column from the first table. If you need this kind of sort, + you must encode the ORDER BY clause as an array instead of an object. + The data associated with a class_name may be either an array + or an object. If an array, it's simply a list of field names, and each field must + belong to the class: + + + + + sort_class_def + '[' field_name { ',' field_name } ']' + + + + + With this syntax, the direction of sorting will always be ascending. + If the data is an object, the keys are field names, and as usual the fields + must belong to the class: + + + + + sort_class_def + '{' field_name ':' sort_class_subdef + { ',' field_name ':' sort_class_subdef } '}' + + + + + Since the field_name is the key for the object, it cannot + appear more than once. As a result, some kinds of sorts are not possible with this + syntax. For example, one might want to sort by UPPER( family_name ), and then by + family_name with case unchanged, to make sure that “diBona” comes before “Dibona”. + For situations like this, you must encode the ORDER BY clause as an array rather + than an object. + The data associated with each field_name may take either of + two forms. In the simplest case, it's a literal value to specify the direction + of sorting: + + + + + sort_class_subdef + lit_value + + + + + If the literal is a string starting with “D” or “d”, json_query sorts the field + in descending order. Otherwise it sorts the field in ascending order. + In other cases, the field_name may be paired with an object + to specify more details: + + + + + sort_class_subdef + '{' + [ '”direction”' ':' lit_value ] + [ transform_spec ] + '}' + + + + + As before, the value tagged as “direction” specifies the + direction of the sort, depending on the first character. If not otherwise + specified, the sort direction defaults to ascending. + Also as before, the transform_spec may specify a function + through which to pass the column. + Since both the “direction” tag and the + transform_spec are optional, the object may be empty: + + + + + sort_class_subdef + '{' '}' + + + + + + diff --git a/docs/TechRef/JSONTutorial.xml b/docs/TechRef/JSONTutorial.xml new file mode 100644 index 0000000000..afc0cf3554 --- /dev/null +++ b/docs/TechRef/JSONTutorial.xml @@ -0,0 +1,2795 @@ + + + + + + JSON Queries: A Tutorial + + + + Scott + McKellar + + + Equinox Software, Inc. + + + + + 2009 + Equinox Software, Inc. + + + + Licensing: Creative Commons Attribution-Share Alike 3.0 United States License. + + + + + Introduction + The json_query facility provides a way for client applications to query the + database over the network. Instead of constructing its own SQL, the application + encodes a query in the form of a JSON string and passes it to the json_query service. + Then the json_query service parses the JSON, constructs and executes the + corresponding SQL, and returns the results to the client application. + This arrangement enables the json_query service to act as a gatekeeper, protecting + the database from potentially damaging SQL commands. In particular, the generated SQL + is confined to SELECT statements, which will not change the contents of the database. + In addition, the json_query service sometimes uses its knowledge of the database + structure to supply column names and join conditions so that the client application + doesn't have to. + Nevertheless, the need to encode a query in a JSON string adds complications, + because the client needs to know how to build the right JSON. JSON queries are also + somewhat limiting – they can't do all of the things that you can do with raw SQL. + This tutorial explains what you can do with a JSON query, and how you can do it. + + + The IDL + A JSON query does not refer to tables and columns. Instead, it refers to classes + and fields, which the IDL maps to the corresponding database entities. + The IDL (Interface Definition Language) is an XML file, typically + /openils/conf/fm_IDL.xml. It maps each class to a table, view, + or subquery, and each field to a column. It also includes information about foreign + key relationships. + (The IDL also defines virtual classes and virtual fields, which don't correspond + to database entities. We won't discuss them here, because json_query ignores them.) + When it first starts up, json_query loads a relevant subset of the IDL into memory. + Thereafter, it consults its copy of the IDL whenever it needs to know about the + database structure. It uses the IDL to validate the JSON queries, and to translate + classes and fields to the corresponding tables and columns. In some cases it uses the + IDL to supply information that the queries don't provide. + + + + Definitions + References to “SQL” refer to the dialect implemented by PostgreSQL. This tutorial + assumes that you are already familiar with SQL. + You should also be familiar with JSON. However it is worth defining a couple of terms + that have other meanings in other contexts: + + + + + An “object” is a JSON object, i.e. a comma-separated list of name:value pairs, + enclosed in curly braces, like this: + + + { “a”:”frobozz”, “b”:24, “c”:null } + + + + + + + An “array” is a JSON array, i.e. a comma-separated list of values, enclosed + in square brackets, like this: + + + [ “Goober”, 629, null, false, “glub” ] + + + + + + + + + + + The Examples + The test_json_query utility generated the SQL for all of the sample queries in this + tutorial. Newlines and indentation were then inserted manually for readability. + All examples involve the actor.org_unit table, sometimes in combination with a + few related tables. The queries themselves are designed to illustrate the syntax, not + to do anything useful at the application level. For example, it's not meaningful to + take the square root of an org_unit id, except to illustrate how to code a function call. + The examples are like department store mannequins – they have no brains, they're only + for display. + + + + + + Hello, World! + + The simplest kind of query defines nothing but a FROM clause. For example: + + + + { + "from":"aou" + } + + + + In this minimal example we select from only one table. Later we will see how to join + multiple tables. + Since we don't supply a WHERE clause, json_query constructs a default WHERE clause for + us, including all the available columns. The resulting SQL looks like this: + + + SELECT + "aou".billing_address AS "billing_address", + "aou".holds_address AS "holds_address", + "aou".id AS "id", + "aou".ill_address AS "ill_address", + "aou".mailing_address AS "mailing_address", + "aou".name AS "name", + "aou".ou_type AS "ou_type", + "aou".parent_ou AS "parent_ou", + "aou".shortname AS "shortname", + "aou".email AS "email", + "aou".phone AS "phone", + "aou".opac_visible AS "opac_visible" + FROM + actor.org_unit AS "aou" ; + + + + + Default SELECT Clauses + The default SELECT clause includes every column that the IDL defines it as a + non-virtual field for the class in question. If a column is present in the database + but not defined in the IDL, json_query doesn't know about it. In the case of the + example shown above, all the columns are defined in the IDL, so they all show up in + the default SELECT clause. + If the FROM clause joins two or more tables, the default SELECT clause includes + columns only from the core table, not from any of the joined tables. + The default SELECT clause has almost the same effect as “SELECT *”, + but not exactly. If you were to “SELECT * from actor.org_unit_type + in psql, the output would include all the same columns as in the example above, but not in + the same order. A default SELECT clause includes the columns in the order in which the IDL + defines them, which may be different from the order in which the database defines them. + In practice, the sequencing of columns in the SELECT clause is not significant. + The result set is returned to the client program in the form of a data structure, which + the client program can navigate however it chooses. + + + + Other Lessons + There are other ways to get a default SELECT clause. However, default SELECT clauses + are a distraction at this point, because most of the time you'll specify your own SELECT + clause explicitly, as we will discuss later. + Let's consider some more important aspects of this simple example – more important + because they apply to more complex queries as well. + + + + + The entire JSON query is an object. In this simple case the object includes + only one entry, for the FROM clause. Typically you'll also have entries for + the SELECT clause and the WHERE clause, and possibly for HAVING, ORDER BY, + LIMIT, or OFFSET clauses. There is no separate entry for a GROUP BY clause, + which you can specify by other means. + + + + Although all the other entries are optional, you must include an entry for + the FROM clause. You cannot, for example, do a SELECT USER the way you can in + psql. + + + + Every column is qualified by an alias for the table. This alias is always the + class name for the table, as defined in the IDL. + + + + Every column is aliased with the column name. There is a way to choose a + different column alias (not shown here). + + + + + + + + + The SELECT Clause + + The following variation also produces a default SELECT clause: + + + + { + "from":"aou", + "select": { + "aou":"*" + } + } + + + + ...and so does this one: + + + + { + "select": { + "aou":null + }, + "from":"aou" + } + + + + While this syntax may not be terribly useful, it does illustrate the minimal structure + of a SELECT clause in a JSON query: an entry in the outermost JSON object, with a key of + “select”. The value associated with this key is another JSON object, + whose keys are class names. + (These two examples also illustrate another point: unlike SQL, a JSON query doesn't care + whether the FROM clause or the SELECT clause comes first.) + Usually you don't want the default SELECT clause. Here's how to select only some of the + columns: + + + + { + "from":"aou", + "select": { + "aou":[ "id", "name" ] + } + } + + + + The value associated with the class name is an array of column names. If you select + columns from multiple tables (not shown here), you'll need a separate entry for each + table, and a separate column list for each entry. + The previous example results in the following SQL: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" ; + + + + + Fancier SELECT Clauses + The previous example featured an array of column names. More generally, it + featured an array of field specifications, and one kind of field specification + is a column name. The other kind is a JSON object, with some combination of the + following keys: + + + + + “column” -- the column name (required). + + + + “alias” -- used to define a column alias, which + otherwise defaults to the column name. + + + + “aggregate” -- takes a value of + true or false. Don't worry about + this one yet. It concerns the use of GROUP BY clauses, which we will + examine later. + + + + “transform” -- the name of an SQL function to be + called. + + + + “result_field” -- used with + “transform”; specifies an output column of a function that + returns multiple columns at a time. + + + + “params” -- used with “transform”; + provides a list of parameters for the function. They may be strings, numbers, + or nulls. + + + + + This example assigns a different column alias: + + + + { + "from":"aou", + "select": { + "aou": [ + "id", + { "column":"name", "alias":"org_name" } + ] + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "org_name" + FROM + actor.org_unit AS "aou" ; + + + + In this case, changing the column alias doesn't accomplish much. But if we + were joining to the actor.org_unit_type table, which also has a “name” column, + we could use different aliases to distinguish them. + The following example uses a function to raise a column to upper case: + + + + { + "from":"aou", + "select": { + "aou": [ + "id", + { "column":"name", "transform":"upper" } + ] + } + } + + + + + + SELECT + "aou".id AS "id", + upper("aou".name ) AS "name" + FROM + actor.org_unit AS "aou" ; + + + + Here we take a substring of the name, using the “params” + element to pass parameters: + + + + { + "from":"aou", + "select": { + "aou": [ + "id", { + "column":"name", + "transform":"substr", + "params":[ 3, 5 ] + } + ] + } + } + + + + + + SELECT + "aou".id AS "id", + substr("aou".name,'3','5' ) AS "name" + FROM + actor.org_unit AS "aou" ; + + + + The parameters specified with “params” are inserted + after the applicable column (“name” in this + case), which is always the first parameter. They are always passed as strings, + i.e. enclosed in quotes, even if the JSON expresses them as numbers. PostgreSQL + will ordinarily coerce them to the right type. However if the function name is + overloaded to accept different types, PostgreSQL may invoke a function other than + the one intended. + Finally we call a fictitious function “frobozz” that returns + multiple columns, where we want only one of them: + + + + { + "from":"aou", + "select": { + "aou": [ + "id", { + "column":"name", + "transform":"frobozz", + "result_field":"zamzam" + } + ] + } + } + + + + + + SELECT + "aou".id AS "id", + (frobozz("aou".name ))."zamzam" AS "name" + FROM + actor.org_unit AS "aou" ; + + + + The “frobozz” function doesn't actually exist, but json_query doesn't know + that. The query won't fail until json_query tries to execute it in the database. + + + + + Things You Can't Do + You can do some things in a SELECT clause with raw SQL (with psql, for example) + that you can't do with a JSON query. Some of them matter and some of them don't. + When you do a JOIN, you can't arrange the selected columns in any arbitrary + sequence, because all of the columns from a given table must be grouped together. + This limitation doesn't matter. The results are returned in the form of a data + structure, which the client program can navigate however it likes. + You can't select an arbitrary expression, such as + “percentage / 100” or “last_name || ', ' || first_name”. + Most of the time this limitation doesn't matter either, because the client program + can do these kinds of manipulations for itself. However, function calls may be a problem. + You can't nest them, and you can't pass more than one column value to them (and it has + to be the first parameter). + You can't use a CASE expression. Instead, the client application can do the equivalent + branching for itself. + You can't select a subquery. In raw SQL you can do something like the following: + + + + SELECT + id, + name, + ( + SELECT name + FROM actor.org_unit_type AS aout + WHERE aout.id = aou.ou_type + ) AS type_name + FROM + actor.org_unit AS aou; + + + + This contrived example is not very realistic. Normally you would use a JOIN in this + case, and that's what you should do in a JSON query. Other cases may not be so easy + to solve. + + + + + + The WHERE Clause + Most queries need a WHERE clause, as in this simple example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":"3" + } + } + + + + Like the SELECT clause, the WHERE clause gets its own entry in the top-level object + of a JSON query. The key is “where”, and the associated value is + either an object (as shown here) or an array (to be discussed a bit later). Each entry + in the object is a separate condition. + In this case, we use a special shortcut for expressing an equality condition. The + column name is on the left of the colon, and the value to which we are equating it is + on the right. + Here's the resulting SQL: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".parent_ou = 3; + + + + Like the SELECT clause, the generated WHERE clause qualifies each column name with + the alias of the relevant table. + If you want to compare a column to NULL, put “null” (without + quotation marks) to the right of the colon instead of a literal value. The resulting + SQL will include “IS NULL” instead of an equals sign. + + + Other Kinds of Comparisons + Here's the same query (which generates the same SQL) without the special + shortcut: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ "=":3 } + } + } + + + + We still have an entry whose key is the column name, but this time the + associated value is another JSON object. It must contain exactly one entry, + with the comparison operator on the left of the colon, and the value to be + compared on the right. + The same syntax works for other kinds of comparison operators. + For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ ">":3 } + } + } + + + + ...turns into: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".parent_ou > 3 ; + + + + The condition '“=”:null' turns into IS NULL. Any other + operator used with “null” turns into IS NOT NULL. + You can use most of the comparison operators recognized by PostgreSQL: + + + = <> != + < > <= >= + ~ ~* !~ !~* + like ilike + similar to + + + The only ones you can't use are “is distinct from” and + “is not distinct from”. + + + + Custom Comparisons + Here's a dirty little secret: json_query doesn't really pay much attention to the + operator you supply. It merely checks to make sure that the operator doesn't contain + any semicolons or white space, in order to prevent certain kinds of SQL injection. + It also allows “similar to” as a special exception. + As a result, you can slip an operator of your own devising into the SQL, so long as + it doesn't contain any semicolons or white space, and doesn't create invalid syntax. + Here's a contrived and rather silly example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ "<2+":3 } + } + } + + + + ...which results in the following SQL: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".parent_ou <2+ 3; + + + + It's hard to come up with a realistic case where this hack would be useful, but it + could happen. + + + + Comparing One Column to Another + Here's how to put another column on the right hand side of a comparison: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { ">": { "+aou":"parent_ou" } } + } + }; + + + + This syntax is similar to the previous examples, except that instead of comparing + to a literal value, we compare to an object. This object has only a single entry, + whose key is a table alias preceded by a leading plus sign. The associated value is + the name of the column. + Here's the resulting SQL: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( + "aou".id > ( "aou".parent_ou ) + ); + + + + The table alias must correspond to the appropriate table. Since json_query doesn't + validate the choice of alias, it won't detect an invalid alias until it tries to + execute the query. In this simple example there's only one table to choose from. The + choice of alias is more important in a subquery or join. + The leading plus sign, combined with a table alias, can be used in other situations + to designate the table to which a column belongs. We shall defer a discussion of this + usage to the section on joins. + + + + Testing Boolean Columns + In SQL, there are several ways to test a boolean column such as + actor.org_unit.opac_visible. The most obvious way is to compare it to true or false: + + + + SELECT + id + FROM + actor.org_unit + WHERE + opac_visible = true; + + + + In a JSON query this approach doesn't work. If you try it, the “= true” test will + turn into IS NULL. Don't do that. Instead, use a leading plus sign, as described in + the preceding section, to treat the boolean column as a stand-alone condition: + + + + { + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "+aou":"opac_visible" + } + } + + + + Result: + + + + SELECT + "aou".id AS "id" + FROM + actor.org_unit AS "aou" + WHERE + "aou".opac_visible ; + + + + If you need to test for falsity, then write a test for truth and negate it with the + “-not” operator. We will discuss the “-not” operator later, but + here's a preview: + + + + { + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "-not": { + "+aou":"opac_visible" + } + } + } + + + + + + SELECT + "aou".id AS "id" + FROM + actor.org_unit AS "aou" + WHERE + NOT ( "aou".opac_visible ); + + + + You can also compare a boolean column directly to a more complex condition: + + + + { + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "opac_visible": { + "=": { "parent_ou":{ ">":3 } } + } + } + } + + + + Here we compare a boolean column, not to a literal value, but to a boolean expression. + The resulting SQL looks a little goofy, but it works: + + + + SELECT + "aou".id AS "id" + FROM + actor.org_unit AS "aou" + WHERE + ( + "aou".opac_visible = ( "aou".parent_ou > 3 ) + ); + + + + In this case we compare the boolean column to a single simple condition. However you + can include additional complications – multiple conditions, IN lists, BETWEEN clauses, + and other features as described below. + + + + Multiple Conditions + If you need multiple conditions, just add them to the “where” + object, separated by commas: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou":{ ">":3 }, + "id":{ "<>":7 } + } + } + + + + The generated SQL connects the conditions with AND: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".parent_ou > 3 + AND "aou".id <> 7; + + + + Later we will see how to use OR instead of AND. + + + + Using Arrays + Here's a puzzler. Suppose you need two conditions for the same column. How do + you code them in the same WHERE clause? For example, suppose you want something + like this: + + + + SELECT + id, + name + FROM + actor.org_unit + WHERE + parent_ou > 3 + AND parent_ou <> 7; + + + + You might try a WHERE clause like this: + + + + “where”: { + “parent_ou”:{ “>”:3 }, + “parent_ou”:{ “<>”:7 } + } + + + + Nope. Won't work. According to JSON rules, two entries in the same object + can't have the same key. + After slapping yourself in the forehead, you try something a little smarter: + + + + “where”: { + “parent_ou”: { + “>”:3, + “<>”:7 + } + } + + + + Nice try, but that doesn't work either. Maybe it ought to work – at least it's + legal JSON – but, no. + Here's what works: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": [ + { "parent_ou":{ ">":3 } }, + { "parent_ou":{ "<>":7 } } + ] + } + + + + We wrapped the two conditions into two separate JSON objects, and then wrapped + those objects together into a JSON array. The resulting SQL looks like this: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name + FROM + actor.org_unit AS "aou" + WHERE + ( "aou".parent_ou > 3 ) + AND + ( "aou".parent_ou <> 7 ); + + + + That's not quite what we were hoping for, because the extra parentheses are so ugly. + But they're harmless. This will do. + If you're in the mood, you can use arrays to as many parentheses as + you like, even if there is only one condition inside: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": + [[[[[[ + { + "parent_ou":{ ">":3 } + }, + ]]]]]] + } + + + + ...yields: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( ( ( ( ( ( "aou".parent_ou > 3 ) ) ) ) ) ); + + + + + + + How to OR + By default, json_query combines conditions with AND. When you need OR, + here's how to do it: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-or": { + "id":2, + "parent_ou":3 + } + } + } + + + + We use “-or” as the key, with the conditions to be ORed in an + associated object. The leading minus sign is there to make sure that the operator + isn't confused with a column name. Later we'll see some other operators with leading + minus signs. In a couple of spots we even use plus signs. + Here are the results from the above example: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( + "aou".id = 2 + OR "aou".parent_ou = 3 + ); + + + + The conditions paired with “-or” are linked by OR and enclosed + in parentheses, + Here's how to do the same thing using an array, except that it produces an extra + layer of parentheses: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-or": [ + { "id":2 }, + { "parent_ou":3 } + ] + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( + ( "aou".id = 2 ) + OR ( "aou".parent_ou = 3 ) + ); + + + + It's possible, though not very useful, to have only a single condition subject to + the “-or” operator. In that case, the condition appears by itself, + since there's nothing to OR it to. This trick is another way to add an extraneous + layer of parentheses, + + + + Another way to AND + You can also use the “-and” operator. It works just like + “-or”, except that it combines conditions with AND instead of OR. + Since AND is the default, we don't usually need a separate operator for it, but it's + available. + + + In rare cases, nothing else will do – you can't include two conditions in the same + list because of the duplicate key problem, but you can't combine them with arrays + either. In particular, you might need to combine them within an expression that + you're comparing to a boolean column (see the subsection above on Testing Boolean + Columns). + + + + Negation with NOT + The “-not” operator negates a condition or set of conditions. + For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-not": { + "id":{ ">":2 }, + "parent_ou":3 + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + NOT + ( + "aou".id > 2 + AND "aou".parent_ou = 3 + ); + + + + In this example we merely negate a combination of two comparisons. However the + condition to be negated may be as complicated as it needs to be. Anything that can + be subject to “where” can be subject to + “-not”. + In most cases you can achieve the same result by other means. However the + “-not” operator is the only way to represent NOT BETWEEN + (to be discussed later). + + + + EXISTS with Subqueries + Two other operators carry a leading minus sign: “-exists” + and its negation “-not-exists”. These operators apply to + subqueries, which have the same format as a full query. For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-exists": { + "from":"asv", + "select":{ "asv":[ "id" ] }, + "where": { + "owner":7 + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + EXISTS + ( + SELECT "asv".id AS "id" + FROM action.survey AS "asv" + WHERE "asv".owner = 7 + ); + + + + This kind of subquery is of limited use, because its WHERE clause doesn't + have anything to do with the main query. It just shuts down the main query + altogether if it isn't satisfied. + More typical is a correlated subquery, whose WHERE clause refers to a row + from the main query. For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "-exists": { + "from":"asv", + "select":{ "asv":[ "id" ] }, + "where": { + "owner":{ "=":{ "+aou":"id" }} + } + } + } + } + + + + Note the use of “+aou” to qualify the id column in the + inner WHERE clause. + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + EXISTS + ( + SELECT "asv".id AS "id" + FROM action.survey AS "asv" + WHERE ("asv".owner = ( "aou".id )) + ); + + + + This latter example illustrates the syntax, but in practice, it would + probably be more natural to use an IN clause with a subquery (to be discussed + later). + + + + BETWEEN Clauses + Here's how to express a BETWEEN clause: + + + + { + "from":"aou", + "select": { "aou":[ "id" ] }, + "where": { + "parent_ou": { "between":[ 3, 7 ] } + } + } + + + + The value associated with the column name is an object with a single + entry, whose key is “between”. The corresponding + value is an array with exactly two values, defining the range to be + tested. + The range bounds must be either numbers or string literals. Although + SQL allows them to be null, a null doesn't make sense in this context, + because a null never matches anything. Consequently json_query doesn't + allow them. + The resulting SQL is just what you would expect: + + + + SELECT + "aou".id AS "id" + FROM + actor.org_unit AS "aou" + WHERE + parent_ou BETWEEN '3' AND '7'; + + + + + + + IN and NOT IN Lists + There are two ways to code an IN list. One way is simply to include + the list of values in an array: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou": [ 3, 5, 7 ] + } + } + + + + As with a BETWEEN clause, the values in the array must be numbers or + string literals. Nulls aren't allowed. Here's the resulting SQL, which + again is just what you would expect: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".parent_ou IN (3, 5, 7); + + + + The other way is similar to the syntax shown above for a BETWEEN clause, + except that the array may include any non-zero number of values: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "parent_ou": { "in": [ 3, 5, 7 ] } + } + } + + + + This version results in the same SQL as the first one. + For a NOT IN list, you can use the latter format, using the + “not in” operator instead of “in”. + Alternatively, you can use either format together with the + “-not” operator. + + + + IN and NOT IN Clauses with Subqueries + For an IN clause with a subquery, the syntax is similar to the second + of the two formats for an IN list (see the previous subsection). The + “in” or “not in” operator is paired, + not with an array of values, but with an object representing the subquery. + For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + "in": { + "from":"asv", + "select":{ "asv":[ "owner" ] }, + "where":{ "name":"Voter Registration" } + } + } + } + } + + + + The results: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".id IN + ( + SELECT + "asv".owner AS "owner" + FROM + action.survey AS "asv" + WHERE + "asv".name = 'Voter Registration' + ); + + + + In SQL the subquery may select multiple columns, but in a JSON query it + can select only a single column. + For a NOT IN clause with a subquery, use the “not in” + operator instead of “in”. + + + + Comparing to a Function + Here's how to compare a column to a function call: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id":{ ">":[ "sqrt", 16 ] } + } + } + + + + A comparison operator (“>” in this case) is paired + with an array. The first entry in the array must be a string giving the name + of the function. The remaining parameters, if any, are the parameters. They + may be strings, numbers, or nulls. The resulting SQL for this example: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + "aou".id > sqrt( '16' ); + + + + All parameters are passed as quoted strings -- even if, as in this case, + they are really numbers. + This syntax is somewhat limited in that the function parameters must be + constants (hence the use of a silly example). + + + + Putting a Function Call on the Left + In the discussion of the SELECT clause, we saw how you could transform the value + of a selected column by passing it to a function. In the WHERE clause, you can use + similar syntax to transform the value of a column before comparing it to something + else. + For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "name": { + "=": { + "transform":"upper", + "value":"CARTER BRANCH" + } + } + } + } + + + + The “transform” entry gives the name of the function that we + will use on the left side of the comparison. The “value” entry + designates the value on the right side of the comparison. + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + upper("aou".name ) = 'CARTER BRANCH' ; + + + + As in the SELECT clause, you can pass literal values or nulls to the function + as additional parameters by using an array tagged as + “params”: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "name": { + "=": { + "transform":"substr", + "params":[ 1, 6 ], + "value":"CARTER" + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + substr("aou".name,'1','6' ) = 'CARTER' ; + + + + The first parameter is always the column name, qualified by the class name, + followed by any additional parameters (which are always enclosed in quotes even + if they are numeric). + As in the SELECT clause: if the function returns multiple columns, you can specify + the one you want by using a “result_field” entry (not shown + here). + If you leave out the “transform” entry (or misspell it), the + column name will appear on the left without any function call. This syntax works, + but it's more complicated than it needs to be. + + + + Putting Function Calls on Both Sides + If you want to compare one function call to another, you can use the same syntax + shown in the previous subsection – except that the “value” entry + carries an array instead of a literal value. For example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + ">": { + "transform":"factorial", + "value":[ "sqrt", 1000 ] + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + factorial("aou".id ) > sqrt( '1000' ) ; + + + + The format for the right side function is similar to what we saw earlier, in the + subsection Comparing to a Function. Note that there are two different formats for + defining function calls: + + + + + For a function call to the left of the comparison, the function name is + tagged as “transform”. The first parameter is always the + relevant column name; additional parameters, if any, are in an array tagged + as “params”. The entry for + “result_field”, if present, specifies a subcolumn. + + + + For a function call to the right of the comparison, the function name is + the first entry in an array, together with any parameters. There's no way to + specify a subcolumn. + + + + + + + + Comparing a Function to a Condition + So far we have seen two kinds of data for the “value” tag. A + string or number translates to a literal value, and an array translates to a function + call. The third possibility is a JSON object, which translates to a condition. For + example: + + + + { + "from":"aou", + "select": { "aou":[ "id", "name" ] }, + "where": { + "id": { + "=": { + "value":{ "parent_ou":{ ">":3 } }, + "transform":"is_prime" + } + } + } + } + + + + The function tagged as “transform” must return boolean, or else + json_query will generate invalid SQL. The function used here, + “is_prime”, is fictitious. + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + WHERE + ( + is_prime("aou".id ) = ( "aou".parent_ou > 3 ) + ); + + + + If we left out the “transform” entry, json_query would compare + the column on the left (which would to be boolean) to the condition on the right. + The results are similar to those for a simpler format described earlier (see the + subsection Testing Boolean Columns). + In the example above we compared the boolean to a simple condition. However the + expression on the right may include multiple conditions, IN lists, subqueries, and + whatever other complications are necessary. + + + + Things You Can't Do + The WHERE clause is subject to some of the same limitations as the SELECT clause. + However, in the WHERE clause these limitations are more limiting, because the client + program can't compensate by doing some of the work for itself. + You can't use arbitrary expressions in a WHERE condition, such as + “WHERE id > parent_ou – 3”. In some cases you may be able to + contrive a custom operator in order to fake such an expression. However this mechanism + is neither very general nor very aesthetic. + To the right of a comparison operator, all function parameters must be literals or + null. You can't pass a column value, nor can you nest function calls. + Likewise you can't include column values or arbitrary expressions in an IN list + or a BETWEEN clause. + You can't include null values in an IN list or a BETWEEN list, not that you should + ever want to. + As noted earlier: you can't use the comparison operators + “is distinct from” or “is not distinct from”. + Also as noted earlier: a subquery in an IN clause cannot select more than one + column. + + + + + + JOIN clauses + Until now, our examples have selected from only one table at a time. As a result, + the FROM clause has been very simple – just a single string containing the class name of + the relevant table. + When the FROM clause joins multiple tables, the corresponding JSON naturally gets more + complicated. + SQL provides two ways to define a join. One way is to list both tables in the FROM + clause, and put the join conditions in the WHERE clause: + + + + SELECT + aou.id, + aout.name + FROM + actor.org_unit aou, + actor.org_unit_type aout + WHERE + aout.id = aou.ou_type; + + + + The other way is to use an explicit JOIN clause: + + + + SELECT + aou.id, + aout.name + FROM + actor.org_unit aou + JOIN actor.org_unit_type aout + ON ( aout.id = aou.ou_type ); + + + + JSON queries use only the second of these methods. The following example expresses + the same query in JSON: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aou":"aout" + } + } + + + + First, let's review the SELECT clause. Since it selects rows from two different tables, + the data for “select” includes two entries, one for each table. + As for the FROM clause, it's no longer just a string. It's a JSON object, with exactly + one entry. The key of this entry is the class name of the core table, i.e. the table + named immediately after the FROM keyword. The data associated with this key contains the + rest of the information about the join. In this simple example, that information consists + entirely of a string containing the class name of the other table. + So where is the join condition? + It's in the IDL. Upon reading the IDL, json_query knows that actor.org_unit has a + foreign key pointing to actor.org_unit_type, and builds a join condition accordingly: + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) ; + + + + In this case the core table is the child table, and the joined table is the parent table. + We could just as well have written it the other way around: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + } + } + + + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) ; + + + + + Specifying The Join Columns Explicitly + While it's convenient to let json_query pick the join columns, it doesn't + always work. + For example, the actor.org_unit table has four different address ids, for + four different kinds of addresses. Each of them is a foreign key to the + actor.org_address table. Json_query can't guess which one you want if you + don't tell it. + (Actually it will try to guess. It will pick the first matching link that + it finds in the IDL, which may or may not be the one you want.) + Here's how to define exactly which columns you want for the join: + + + + { + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aou": { + "aoa": { + "fkey":"holds_address", + "field":"id" + } + } + } + } + + + + Before, the table we were joining was represented merely by its class name. + Now it's represented by an entry in a JSON object. The key of that entry is the + class name, and the associated data is another layer of JSON object containing + the attributes of the join. + Later we'll encounter other kinds of join attributes. For now, the only + attributes that we're looking at are the ones that identify the join columns: + “fkey” and “field”. The hard part is + remembering which is which: + + + + + “fkey” identifies the join column from the + left table; + + + + “field” identifies the join column from the + right table. + + + + + When there are only two tables involved, the core table is on the left, and + the non-core table is on the right. In more complex queries neither table may + be the core table. + Here is the result of the preceding JSON: + + + + SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" + FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_address AS "aoa" + ON ( "aoa".id = "aou".holds_address ) ; + + + + In this example the child table is on the left and the parent table is on the + right. We can swap the tables if we swap the join columns as well: + + + + { + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "fkey":"id", + "field":"holds_address" + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" + FROM + actor.org_address AS "aoa" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".holds_address = "aoa".id ) ; + + + + When you specify both of the join columns, json_query assumes that you know + what you're doing. It doesn't check the IDL to confirm that the join makes sense. + The burden is on you to avoid absurdities. + + + + Specifying Only One Join Column + We just saw how to specify both ends of a join. It turns out that there's a + shortcut – most of the time you only need to specify one end. Consider the + following variation on the previous example: + + + + { + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"holds_address" + } + } + } + } + + + + ..which results in exactly the same SQL as before. + Here we specified the join column from the child table, the column that is a + foreign key pointing to another table. As long as that linkage is defined in the IDL, + json_query can look it up and figure out what the corresponding column is in the + parent table. + However this shortcut doesn't work if you specify only the column in the parent + table, because it would lead to ambiguities. Suppose we had specified the id column + of actor.org_address. As noted earlier, there are four different foreign keys from + actor.org_unit to actor.org_address, and json_query would have no way to guess + which one we wanted. + + + + Joining to Multiple Tables + So far we have joined only two tables at a time. What if we need to join one + table to two different tables? + Here's an example: + + + + { + "select": { "aou":[ "id" ], "aout":[ "depth" ], "aoa":[ "street1" ] }, + "from": { + "aou": { + "aout":{}, + "aoa": { + "fkey":"holds_address" + } + } + } + } + + + + The first join, to actor.org_unit_type, is simple. We could have specified join + columns, but we don't have to, because json_query will construct that join on the + basis of what it finds in the IDL. Having no join attributes to specify, we leave + that object empty. + For the second join, to actor.org_address, we have to specify at least the join + column in the child table, as discussed earlier. We could also have specified the + join column from the parent table, but we don't have to, so we didn't. + Here is the resulting SQL: + + + + SELECT + "aou".id AS "id", + "aout".depth AS "depth", + "aoa".street1 AS "street1" + FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) + INNER JOIN actor.org_address AS "aoa" + ON ( "aoa".id = "aou".holds_address ) ; + + + + Since there can be only one core table, the outermost object in the FROM clause + can have only one entry, whose key is the class name of the core table. The next + level has one entry for every table that's joined to the core table. + + + + Nested Joins + Let's look at that last query again. It joins three tables, and the core table + is the one in the middle. Can we make one of the end tables the core table instead? + Yes, we can: + + + + { + "select": { "aou":[ "id" ], "aout":[ "depth" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"holds_address", + "join": { + "aout":{ "fkey":"ou_type" } + } + } + } + } + } + + + + The “join” attribute introduces another level of join. In this + case “aou” is the left table for the nested join, and the right table + for the original join. Here are the results: + + + + SELECT + "aou".id AS "id", + "aout".depth AS "depth", + "aoa".street1 AS "street1" + FROM + actor.org_address AS "aoa" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".holds_address = "aoa".id ) + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) ; + + + + + + + Outer Joins + By default, json_query constructs an inner join. If you need an outer join, you + can add the join type as an attribute of the join: + + + + { + "select": { "aou":[ "id" ], "aoa":[ "street1" ] }, + "from": { + "aoa": { + "aou": { + "field":"mailing_address", + "type":"left" + } + } + } + } + + + + Here we asked for a left outer join. For a right outer join, code + “type”:”right”. For a full outer join, code + “type”:”full”. Any other value for “type” results in an inner + join, so watch out for typos. A type of “rihgt” will give you + a wrong join instead of a right one. + Here is the resulting SQL for this example: + + + + SELECT + "aou".id AS "id", + "aoa".street1 AS "street1" + FROM + actor.org_address AS "aoa" + LEFT JOIN actor.org_unit AS "aou" + ON ( "aou".mailing_address = "aoa".id ) ; + + + + + + + Referring to Joined Tables in the WHERE Clause + In the WHERE clause of the generated SQL, every column name is qualified by a + table alias, which is always the corresponding class name. + If a column belongs to the core table, this qualification happens by default. + If it belongs to a joined table, the JSON must specify what class name to use for + an alias. For example: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "+aou":{ "parent_ou":2 } + } + } + + + + Note the peculiar operator “+aou” -- a plus sign followed + by the relevant class name. This operator tells json_query to apply the specified + class to the condition that follows. The result: + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) + WHERE + ( "aou".parent_ou = 2 ); + + + + The plus-class operator may apply to multiple conditions: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "+aou":{ + "parent_ou":2, + "id":{ "<":42 } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) + WHERE + ( + "aou".parent_ou = 2 + AND "aou".id < 42 + ); + + + + For these artificial examples, it would have been simpler to swap the tables, + so that actor.org_unit is the core table. Then you wouldn't need to go through + any special gyrations to apply the right table alias. In a more realistic case, + however, you might need to apply conditions to both tables. Just swapping the + tables wouldn't solve the problem. + You can also use a plus-class operator to compare columns from two different + tables: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout":"aou" + }, + "where": { + "depth": { ">": { "+aou":"parent_ou" } } + } + } + + + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id ) + WHERE + ( + "aout".depth > ( "aou".parent_ou ) + ); + + + + Please don't expect that query to make any sense. It doesn't. But it + illustrates the syntax. + + + + Join Filters + While the above approach certainly works, the special syntax needed is goofy + and awkward. A somewhat cleaner solution is to include a condition in the JOIN + clause: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "parent_ou":2 + } + } + } + } + } + + + + + + SELECT + "aou".id AS "id", "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id + AND "aou".parent_ou = 2 ) ; + + + + By default, json_query uses AND to combine the “filter” + condition with the original join condition. If you need OR, you can use the + “filter_op” attribute to say so: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "parent_ou":2 + }, + "filter_op":"or" + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON ( "aou".ou_type = "aout".id + OR "aou".parent_ou = 2 ) ; + + + + If the data tagged by “filter_op” is anything but + “or” (in upper, lower, or mixed case), json_query uses AND + instead of OR. + The condition tagged by “filter” may be much more complicated. + In fact it accepts all the same syntax as the WHERE clause. + Remember, though, that it all gets combined with the the original join condition + with an AND, or with an OR if you so specify. If you're not careful, the result + may be a confusing mixture of AND and OR at the same level. + + + + Joining to a Subquery + In SQL you can put a subquery in a FROM clause, and select from it as if it were + a table. A JSON query has no way to do that directly. The IDL, however, can define + a class as a subquery instead of as a table. When you SELECT from it, json_query + inserts the corresponding subquery into the FROM clause. For example: + + + + { + "select":{ "iatc":[ "id", "dest", "copy_status" ] }, + "from": "iatc" + } + + + + There's nothing special-looking about this JSON, but json_query expands it as + follows: + + + + SELECT + "iatc".id AS "id", + "iatc".dest AS "dest", + "iatc".copy_status AS "copy_status" + FROM + ( + SELECT t.* + FROM + action.transit_copy t + JOIN actor.org_unit AS s + ON (t.source = s.id) + JOIN actor.org_unit AS d + ON (t.dest = d.id) + WHERE + s.parent_ou <> d.parent_ou + ) AS "iatc" ; + + + + The “iatc” class is like a view, except that it's defined in the + IDL instead of the database. In this case it provides a way to do a join that would + otherwise be impossible through a JSON query, because it joins the same table in two + different ways (see the next subsection). + + + + Things You Can't Do + In a JOIN, as with other SQL constructs, there are some things that you can't do with + a JSON query. + In particular, you can't specify a table alias, because the table alias is always the + class name. As a result: + + + + + You can't join a table to itself. For example, you can't join actor.org_unit + to itself in order to select the name of the parent for every org_unit. + + + + You can't join to the same table in more than one way. For example, you can't + join actor.org_unit to actor.org_address through four different foreign keys, to + get four kinds of addresses in a single query. + + + + + The only workaround is to perform the join in a view, or in a subquery defined in + the IDL as described in the previous subsection. + Some other things, while not impossible, require some ingenuity in the use of join + filters. + For example: by default, json_query constructs a join condition using only a single + pair of corresponding columns. As long as the database is designed accordingly, a + single pair of columns will normally suffice. If you ever need to join on more than + one pair of columns, you can use join filters for the extras. + Likewise, join conditions are normally equalities. In raw SQL it is possible + (though rarely useful) to base a join on an inequality, or to use a function call in + a join condition, or to omit any join condition in order to obtain a Cartesian product. + If necessary, you can devise such unconventional joins by combining the normal join + conditions with join filters. + For example, here's how to get a Cartesian product: + + + + { + "select": { "aou":[ "id" ], "aout":[ "name" ] }, + "from": { + "aout": { + "aou": { + "filter": { + "ou_type":{ "<>": { "+aout":"id" } } + }, + "filter_op":"or" + } + } + } + } + + + + + + SELECT + "aou".id AS "id", + "aout".name AS "name" + FROM + actor.org_unit_type AS "aout" + INNER JOIN actor.org_unit AS "aou" + ON + ( + "aou".ou_type = "aout".id + OR ("aou".ou_type <> ( "aout".id )) + ) ; + + + + Yes, it's ugly, but at least you're not likely to do it by accident. + + + + + Selecting from Functions + In SQL, you can put a function call in the FROM clause. The function may return + multiple columns and multiple rows. Within the query, the function behaves like a + table. + A JSON query can also select from a function: + + + + { + "from": [ "actor.org_unit_ancestors", 5 ] + } + + + + The data associated with “from” is an array instead of a string + or an object. The first element in the array specifies the name of the function. + Subsequent elements, if any, supply the parameters of the function; they must be + literal values or nulls. + Here is the resulting query: + + + + SELECT * + FROM + actor.org_unit_ancestors( '5' ) AS "actor.org_unit_ancestors" ; + + + + In a JSON query this format is very limited, largely because the IDL knows + nothing about the available functions. You can't join the function to a table or + to another function. If you try to supply a SELECT list or a WHERE clause, + json_query will ignore it. The generated query will always select every column, + via a wild card asterisk, from every row. + + + + The ORDER BY Clause + In most cases you can encode an ORDER BY clause as either an array or an object. + Let's take a simple example and try it both ways. First the array: + + + + { + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { "class":"aou", "field":"name" } + ] + } + + + + Now the object: + + + + { + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": { + "aou":{ "name":{} } + } + } + + + + The results are identical from either version: + + + + SELECT + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + ORDER BY + "aou".name; + + + + The array format is more verbose, but as we shall see, it is also more flexible. + It can do anything the object format can do, plus some things that the object format + can't do. + + + ORDER BY as an Array + In the array format, each element of the array is an object defining one of the + sort fields. Each such object must include at least two tags: + + + + + The “class” tag provides the name of the class, + which must be either the core class or a joined class. + + + + The “field” tag provides the field name, corresponding + to one of the columns of the class. + + + + + If you want to sort by multiple fields, just include a separate object for each + field. + If you want to sort a field in descending order, add a + “direction” tag: + + + + { + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { "class":"aou", "field":"name", "direction":"desc" } + ] + } + + + + + + SELECT + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + ORDER BY + "aou".name DESC; + + + + The string tagged as “direction” can be anything – all that + matters is the first character. If the string starts with “D” or “d”, the sort + will be descending. Otherwise it will be ascending. So + “diplodocus” or “Dioscorides” will work as + well as “desc”, but “going down” means that + the sort will go up. + You can also pass a column through some kind of transforming function, much as + you can in the SELECT and WHERE clauses, using the “transform” + tag. For example, for a case-insensitive sort, you could raise to upper case: + + + + { + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { + "class":"aou", + "field":"name", + "transform":"upper" + } + ] + } + + + + + + SELECT + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + ORDER BY + upper("aou".name ); + + + + If you need additional parameters for the function, you can use the + “params” tag to pass them: + + + + { + "select":{ "aou":[ "name" ] }, + "from": "aou", + "order_by": [ + { + "class":"aou", + "field":"name", + "transform":"substr", + "params":[ 1, 8 ] + } + ] + } + + + + The additional parameters appear as elements in an array. They may be numbers, + strings, or nulls. + + + + SELECT + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + ORDER BY + substr("aou".name,'1','8' ); + + + + As we have seen elsewhere, all literal values are passed as quoted strings, + even if they are numbers. + If the function returns multiple columns, you can use the + “result_field” tag to indicate which one you want (not shown). + + + + ORDER BY as an Object + When you encode the ORDER BY clause as an object, the keys of the object are + class names. Each class must be either the core class or a joined class. The + data for each class can be either an array or another layer of object. Here's an + example with one of each: + + + + { + "select":{ "aout":"id", "aou":[ "name" ] }, + "from": { "aou":"aout" }, + "order_by": { + "aout":[ "id" ], + "aou":{ "name":{ "direction":"desc" } } + } + } + + + + For the “aout” class, the associated array is simply a list + of field names (in this case, just one). Naturally, each field must reside in + the class with which it is associated. + However, a list of field names provides no way to specify the direction of + sorting, or a transforming function. You can add those details only if the + class name is paired with an object, as in the example for the + “aou” class. The keys for such an object are field names, and + the associated tags define other details. + In this example, we use the “direction” tag to specify that + the name field be sorted in descending order. This tag works the same way here as + described earlier. If the associated string starts with “D” or “d”, the sort will + be descending; otherwise it will be ascending. + Here is the resulting SQL: + + + + SELECT + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + INNER JOIN actor.org_unit_type AS "aout" + ON ( "aout".id = "aou".ou_type ) + ORDER BY + "aout".id, + "aou".name DESC; + + + + You can also use the “transform”, “params”, + and “result_field” tags to specify the use of a transforming + function, as described in the previous subsection. For example: + + + + { + "select":{ "aou":[ "name", "id" ] }, + "from": "aou", + "order_by": { + "aou":{ + "name":{ "transform":"substr", "params":[ 1, 8 ] } + } + } + } + + + + + + SELECT + "aou".name AS "name", + "aou".id AS "id" + FROM + actor.org_unit AS "aou" + ORDER BY + substr("aou".name,'1','8' ); + + + + + + + Things You Can't Do + If you encode the ORDER BY clause as an object, you may encounter a couple of + restrictions. + Because the key of such an object is the class name, all the fields from a given + class must be grouped together. You can't sort by a column from one table, followed + by a column from another table, followed by a column from the first table. If you + need such a sort, you must encode the ORDER BY clause in the array format, which + suffers from no such restrictions. + For similar reasons, with an ORDER BY clause encoded as an object, you can't + reference the same column more than once. Although such a sort may seem perverse, + there are situations where it can be useful, provided that the column is passed to + a transforming function. + For example, you might want a case-insensitive sort, except that for any given + letter you want lower case to sort first. For example, you want “diBona” to sort + before “Dibona”. Here's a way to do that, coding the ORDER BY clause as an array: + + + + { + "select":{ "au":[ "family_name", "id" ] }, + "from": "au", + "order_by": [ + { "class":"au", "field":"family_name", "transform":"upper" }, + { "class":"au", "field":"family_name" } + ] + } + + + + + + SELECT + "au".family_name AS "family_name", + "au".id AS "id" + FROM + actor.usr AS "au" + ORDER BY + upper("au".family_name ), + "au".family_name; + + + + Such a sort is not possible where the ORDER BY clause is coded as an object. + + + + + The GROUP BY Clause + A JSON query has no separate construct to define a GROUP BY clause. Instead, the + necessary information is distributed across the SELECT clause. However, the way it works + is a bit backwards from what you might expect, so pay attention. + Here's an example: + + + + { + "select": { + "aou": [ + { "column":"parent_ou" }, + { "column":"name", "transform":"max", "aggregate":true } + ] + }, + "from": "aou" + } + + + + The “transform” tag is there just to give us an excuse to do a GROUP + BY. What's important to notice is the “aggregate” tag. + Here's the resulting SQL: + + + + SELECT + "aou".parent_ou AS "parent_ou", + max("aou".name ) AS "name" + FROM + actor.org_unit AS "aou" + GROUP BY + 1; + + + + The GROUP BY clause references fields from the SELECT clause by numerical reference, + instead of by repeating them. Notice that the field it references, parent_ou, is the + one that doesn't carry the “aggregate” tag in + the JSON. + Let's state that more generally. The GROUP BY clause includes only the fields that + do not carry the “aggregate” tag (or that carry + it with a value of false). + However, that logic applies only when some field somewhere does + carry the “aggregate” tag, with a value of true. If there is no + “aggregate” tag, or it appears only with a value of false, then there + is no GROUP BY clause. + If you really want to include every field in the GROUP BY clause, don't use + “aggregate”. Use the “distinct” tag, as described + in the next section. + + + + The DISTINCT Clause + JSON queries don't generate DISTINCT clauses. However, they can generate GROUP + BY clauses that include every item from the SELECT clause. The effect is the same + as applying DISTINCT to the entire SELECT clause. + For example: + + + + { + "select": { + "aou": [ + "parent_ou", + "ou_type" + ] + }, + "from":"aou", + "distinct":"true" + } + + + + Note the “distinct” entry at the top level of the + query object, with a value of “true”. + + + + SELECT + "aou".parent_ou AS "parent_ou", + "aou".ou_type AS "ou_type" + FROM + actor.org_unit AS "aou" + GROUP BY + 1, 2; + + + + The generated GROUP BY clause references every column in the SELECT clause by + number. + + + + The HAVING Clause + For a HAVING clause, add a “having” entry at the top level + of the query object. For the associated data, you can use all the same syntax + that you can use for a WHERE clause. + Here's a simple example: + + + + { + "select": { + "aou": [ + "parent_ou", { + "column":"id", + "transform":"count", + "alias":"id_count", + "aggregate":"true" + } + ] + }, + "from":"aou", + "having": { + "id": { + ">" : { + "transform":"count", + "value":6 + } + } + } + } + + + + We use the “aggregate” tag in the SELECT clause to give us a GROUP BY to go + with the HAVING. Results: + + + + SELECT + "aou".parent_ou AS "parent_ou", + count("aou".id ) AS "id_count" + FROM + actor.org_unit AS "aou" + GROUP BY + 1 + HAVING + count("aou".id ) > 6 ; + + + + In raw SQL we could have referred to “count( 1 )”. But since JSON queries + cannot encode arbitrary expressions, we applied the count function to a column + that cannot be null. + + + + The LIMIT and OFFSET Clauses + To add an LIMIT or OFFSET clause, add an entry to the top level of a query + object. For example: + + + + { + "select": { + "aou": [ "id", "name" ] + }, + "from":"aou", + "order_by": { "aou":[ "id" ] }, + "offset": 7, + "limit": 42 + } + + + + The data associated with “offset” and “limit” + may be either a number or a string, but if it's a string, it should have a number + inside. + Result: + + + + SELECT + "aou".id AS "id", + "aou".name AS "name" + FROM + actor.org_unit AS "aou" + ORDER BY + "aou".id + LIMIT 42 + OFFSET 7; + + + + + + diff --git a/docs/TechRef/TechRef.xml b/docs/TechRef/TechRef.xml new file mode 100644 index 0000000000..266f01fa29 --- /dev/null +++ b/docs/TechRef/TechRef.xml @@ -0,0 +1,13 @@ + + + + + Evergreen Technical Reference + + + + + +