This patch band-aids around the following listener error that
we've occassionally seen:
server: died with error Can't kill a non-numeric process ID
at .../OpenSRF/Server.pm line 335.
With this patch, OpenSRF::Server->kill_child() will now simply
log a warning rather than passing a non-numeric value to kill()
This patch does not address the root cause, which is still unknown,
but could include:
- An actual failure to fork a new child leading to an undef PID
ending up on the drone list. As OpenSRF::Server does
not currently check for undef return from fork(), this cannot
be dismissed, but this doesn't fit our observations.
- Something akin to bug
1953044; in particular, there might be
a race condition between ->reap_children(), ->check_status(),
and ->handle_sighup() that results in an attempt to kill a
child that's already been reaped. If so, the patch for bug
1953044 *might* also ameliorate this crash.
To test
-------
[1] Check logs to see if you've encountered the issue listed above.
[2] Apply the patch and wait to see if the error ever recurs or
if you see one of the two warnings in the log:
refused to kill child with non-numeric PID $pid
refused to kill child with undefined PID
Signed-off-by: Galen Charlton <gmc@equinoxOLI.org>
my $self = shift;
my $child = shift || pop(@{$self->{idle_list}}) or return;
$chatty and $logger->internal("server: killing child $child");
- kill('TERM', $child->{pid});
+
+ if (defined($child->{pid})) {
+ if ($child->{pid} =~ /^-?\d+$/) {
+ kill('TERM', $child->{pid});
+ } else {
+ $logger->warn("refused to kill child with non-numeric PID $child->{pid}");
+ }
+ } else {
+ $logger->warn('refused to kill child with undefined PID');
+ }
}
# ----------------------------------------------------------------