LP#
1234845: Performance improvement to evergreen.ranked_volumes() database function.
For the client I analyzed logs from production Pg instance for ~ 10
days.
During this time, the single most time consuming query (summarized time
for all instances of the query, with different parameters) was:
SELECT * FROM unapi.bre ( ... ) AS "unapi.bre";
I profiled this function, and found that in my test case most of the
time (2.04s out of 2.06s, so ~ 99%) was spent in call to
unapi.holdings_xml() function.
When I profiled this function, I found that most of the time (sorry,
don't have the number now with me) was spent in call to
evergreen.ranked_volumes() function.
At this moment in my research something changed on the server I was
testing on, and all subsequent times were ~ 4-5 times lower, but the
ratios were more or less the same.
Anyway - call to evergreen.ranked_volumes() showed repeatable time (with
full caches/buffers) of ~ 380ms.
I modified the function by:
1. inlining actor.org_unit_descendants(?, ?)
2. inlining evergreen.rank_ou(?, ?, ?)
3. extracting depth calculation to separate call
4. switched to plpgsql (which gives me ability to use variables)
5. removed evergreen.rank_ou() and evergreen.rank_cp_status() from
select clause - these are still in WINDOW definition, but they
weren't used in the SELECT, so it's better to remove from there.
6. in passing renamed arguments to avoid name clash (argument depth vs.
field depth)
7. in passing changed usage of $* to access parameters to using named
parameters, for readability.
New function did the same work in ~ 18ms.
EDIT: Convert to SQL, keeping all of the improvements from depesz
EDIT2: Added Signed-off-by line for depesz, see http://markmail.org/message/rv4vaarwixeswqgu
Signed-off-by: Hubert depesz Lubaczewski <depesz@depesz.com>
Signed-off-by: Jason Stephenson <jstephenson@mvlc.org>
Signed-off-by: Mike Rylander <mrylander@gmail.com>
Signed-off-by: Kathy Lussier <klussier@masslnc.org>
Signed-off-by: Ben Shum <bshum@biblio.org>