* [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
@ 2021-01-14 15:39 Stefan Reiter
2021-01-26 18:23 ` Thomas Lamprecht
0 siblings, 1 reply; 3+ messages in thread
From: Stefan Reiter @ 2021-01-14 15:39 UTC (permalink / raw)
To: pve-devel
This attempts to solve the issue where on slow network storages,
aborting a backup job (which may wait for buffers to flush) could take
longer than 5 seconds, and would thus result in the task being killed by
SIGKILL, not removing the backup lock in the process.
Make the implementation future-proof by using a map from task type to a
timeout value. Default stays at 5, so tasks other than 'vzdump' are not
affected.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index d5b84d0..8a0cb9a 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -365,8 +365,16 @@ sub active_workers {
return $res;
}
+my $timeout_map = {
+ # backup cancellation on slow target storages might take a while, avoid
+ # leaving the VM in locked state
+ "vzdump" => 60,
+};
+
my $kill_process_group = sub {
- my ($pid, $pstart) = @_;
+ my ($pid, $pstart, $timeout) = @_;
+
+ $timeout //= 5;
# send kill to process group (negative pid)
my $kpid = -$pid;
@@ -374,8 +382,7 @@ my $kill_process_group = sub {
# always send signal to all pgrp members
kill(15, $kpid); # send TERM signal
- # give max 5 seconds to shut down
- for (my $i = 0; $i < 5; $i++) {
+ for (my $i = 0; $i < $timeout; $i++) {
return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
sleep (1);
}
@@ -394,7 +401,8 @@ sub check_worker {
return 0 if !$running;
if ($killit) {
- &$kill_process_group($task->{pid});
+ my $type = $task->{type};
+ &$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
return 0;
}
--
2.20.1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
2021-01-14 15:39 [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs Stefan Reiter
@ 2021-01-26 18:23 ` Thomas Lamprecht
2021-01-27 11:11 ` Stefan Reiter
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2021-01-26 18:23 UTC (permalink / raw)
To: Proxmox VE development discussion, Stefan Reiter
On 14.01.21 16:39, Stefan Reiter wrote:
> This attempts to solve the issue where on slow network storages,
> aborting a backup job (which may wait for buffers to flush) could take
> longer than 5 seconds, and would thus result in the task being killed by
> SIGKILL, not removing the backup lock in the process.
>
> Make the implementation future-proof by using a map from task type to a
> timeout value. Default stays at 5, so tasks other than 'vzdump' are not
> affected.
>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
Not to sure about that map there in pve-common, that module should stay rather
agnostic of user special treatment.
Did you thought about passing that explicitly on worker creation, or setting it
in the RPCEnv inside a worker?
> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> index d5b84d0..8a0cb9a 100644
> --- a/src/PVE/RESTEnvironment.pm
> +++ b/src/PVE/RESTEnvironment.pm
> @@ -365,8 +365,16 @@ sub active_workers {
> return $res;
> }
>
> +my $timeout_map = {
> + # backup cancellation on slow target storages might take a while, avoid
> + # leaving the VM in locked state
> + "vzdump" => 60,
> +};
> +
> my $kill_process_group = sub {
> - my ($pid, $pstart) = @_;
> + my ($pid, $pstart, $timeout) = @_;
> +
> + $timeout //= 5;
>
> # send kill to process group (negative pid)
> my $kpid = -$pid;
> @@ -374,8 +382,7 @@ my $kill_process_group = sub {
> # always send signal to all pgrp members
> kill(15, $kpid); # send TERM signal
>
> - # give max 5 seconds to shut down
> - for (my $i = 0; $i < 5; $i++) {
> + for (my $i = 0; $i < $timeout; $i++) {
> return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
> sleep (1);
> }
> @@ -394,7 +401,8 @@ sub check_worker {
> return 0 if !$running;
>
> if ($killit) {
> - &$kill_process_group($task->{pid});
> + my $type = $task->{type};
> + &$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
2021-01-26 18:23 ` Thomas Lamprecht
@ 2021-01-27 11:11 ` Stefan Reiter
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Reiter @ 2021-01-27 11:11 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox VE development discussion
On 26/01/2021 19:23, Thomas Lamprecht wrote:
> On 14.01.21 16:39, Stefan Reiter wrote:
>> This attempts to solve the issue where on slow network storages,
>> aborting a backup job (which may wait for buffers to flush) could take
>> longer than 5 seconds, and would thus result in the task being killed by
>> SIGKILL, not removing the backup lock in the process.
>>
>> Make the implementation future-proof by using a map from task type to a
>> timeout value. Default stays at 5, so tasks other than 'vzdump' are not
>> affected.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>> src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
>> 1 file changed, 12 insertions(+), 4 deletions(-)
>>
>
> Not to sure about that map there in pve-common, that module should stay rather
> agnostic of user special treatment.
>
> Did you thought about passing that explicitly on worker creation, or setting it
> in the RPCEnv inside a worker?
I generally agree that it's a bit misplaced, but I don't see a way to
encode it in the worker - the only info we have in check_worker and
stop_task is the UPID, and I don't think it makes sense to encode a
timeout in that? Or is there a way I'm not seeing to retrieve additional
info about a worker from the UPID alone?
We could at least put the map in pve-manager, but I'm not sure if that's
any better.
>
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index d5b84d0..8a0cb9a 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -365,8 +365,16 @@ sub active_workers {
>> return $res;
>> }
>>
>> +my $timeout_map = {
>> + # backup cancellation on slow target storages might take a while, avoid
>> + # leaving the VM in locked state
>> + "vzdump" => 60,
>> +};
>> +
>> my $kill_process_group = sub {
>> - my ($pid, $pstart) = @_;
>> + my ($pid, $pstart, $timeout) = @_;
>> +
>> + $timeout //= 5;
>>
>> # send kill to process group (negative pid)
>> my $kpid = -$pid;
>> @@ -374,8 +382,7 @@ my $kill_process_group = sub {
>> # always send signal to all pgrp members
>> kill(15, $kpid); # send TERM signal
>>
>> - # give max 5 seconds to shut down
>> - for (my $i = 0; $i < 5; $i++) {
>> + for (my $i = 0; $i < $timeout; $i++) {
>> return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
>> sleep (1);
>> }
>> @@ -394,7 +401,8 @@ sub check_worker {
>> return 0 if !$running;
>>
>> if ($killit) {
>> - &$kill_process_group($task->{pid});
>> + my $type = $task->{type};
>> + &$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
>> return 0;
>> }
>>
>>
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-27 11:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 15:39 [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs Stefan Reiter
2021-01-26 18:23 ` Thomas Lamprecht
2021-01-27 11:11 ` Stefan Reiter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal