all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
@ 2021-01-14 15:39 Stefan Reiter
  2021-01-26 18:23 ` Thomas Lamprecht
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Reiter @ 2021-01-14 15:39 UTC (permalink / raw)
  To: pve-devel

This attempts to solve the issue where on slow network storages,
aborting a backup job (which may wait for buffers to flush) could take
longer than 5 seconds, and would thus result in the task being killed by
SIGKILL, not removing the backup lock in the process.

Make the implementation future-proof by using a map from task type to a
timeout value. Default stays at 5, so tasks other than 'vzdump' are not
affected.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
index d5b84d0..8a0cb9a 100644
--- a/src/PVE/RESTEnvironment.pm
+++ b/src/PVE/RESTEnvironment.pm
@@ -365,8 +365,16 @@ sub active_workers  {
     return $res;
 }
 
+my $timeout_map = {
+    # backup cancellation on slow target storages might take a while, avoid
+    # leaving the VM in locked state
+    "vzdump" => 60,
+};
+
 my $kill_process_group = sub {
-    my ($pid, $pstart) = @_;
+    my ($pid, $pstart, $timeout) = @_;
+
+    $timeout //= 5;
 
     # send kill to process group (negative pid)
     my $kpid = -$pid;
@@ -374,8 +382,7 @@ my $kill_process_group = sub {
     # always send signal to all pgrp members
     kill(15, $kpid); # send TERM signal
 
-    # give max 5 seconds to shut down
-    for (my $i = 0; $i < 5; $i++) {
+    for (my $i = 0; $i < $timeout; $i++) {
 	return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
 	sleep (1);
     }
@@ -394,7 +401,8 @@ sub check_worker {
     return 0 if !$running;
 
     if ($killit) {
-	&$kill_process_group($task->{pid});
+	my $type = $task->{type};
+	&$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
 	return 0;
     }
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
  2021-01-14 15:39 [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs Stefan Reiter
@ 2021-01-26 18:23 ` Thomas Lamprecht
  2021-01-27 11:11   ` Stefan Reiter
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2021-01-26 18:23 UTC (permalink / raw)
  To: Proxmox VE development discussion, Stefan Reiter

On 14.01.21 16:39, Stefan Reiter wrote:
> This attempts to solve the issue where on slow network storages,
> aborting a backup job (which may wait for buffers to flush) could take
> longer than 5 seconds, and would thus result in the task being killed by
> SIGKILL, not removing the backup lock in the process.
> 
> Make the implementation future-proof by using a map from task type to a
> timeout value. Default stays at 5, so tasks other than 'vzdump' are not
> affected.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 

Not to sure about that map there in pve-common, that module should stay rather
agnostic of user special treatment.

Did you thought about passing that explicitly on worker creation, or setting it
in the RPCEnv inside a worker?

> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
> index d5b84d0..8a0cb9a 100644
> --- a/src/PVE/RESTEnvironment.pm
> +++ b/src/PVE/RESTEnvironment.pm
> @@ -365,8 +365,16 @@ sub active_workers  {
>      return $res;
>  }
>  
> +my $timeout_map = {
> +    # backup cancellation on slow target storages might take a while, avoid
> +    # leaving the VM in locked state
> +    "vzdump" => 60,
> +};
> +
>  my $kill_process_group = sub {
> -    my ($pid, $pstart) = @_;
> +    my ($pid, $pstart, $timeout) = @_;
> +
> +    $timeout //= 5;
>  
>      # send kill to process group (negative pid)
>      my $kpid = -$pid;
> @@ -374,8 +382,7 @@ my $kill_process_group = sub {
>      # always send signal to all pgrp members
>      kill(15, $kpid); # send TERM signal
>  
> -    # give max 5 seconds to shut down
> -    for (my $i = 0; $i < 5; $i++) {
> +    for (my $i = 0; $i < $timeout; $i++) {
>  	return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
>  	sleep (1);
>      }
> @@ -394,7 +401,8 @@ sub check_worker {
>      return 0 if !$running;
>  
>      if ($killit) {
> -	&$kill_process_group($task->{pid});
> +	my $type = $task->{type};
> +	&$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
>  	return 0;
>      }
>  
> 






^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs
  2021-01-26 18:23 ` Thomas Lamprecht
@ 2021-01-27 11:11   ` Stefan Reiter
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Reiter @ 2021-01-27 11:11 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox VE development discussion

On 26/01/2021 19:23, Thomas Lamprecht wrote:
> On 14.01.21 16:39, Stefan Reiter wrote:
>> This attempts to solve the issue where on slow network storages,
>> aborting a backup job (which may wait for buffers to flush) could take
>> longer than 5 seconds, and would thus result in the task being killed by
>> SIGKILL, not removing the backup lock in the process.
>>
>> Make the implementation future-proof by using a map from task type to a
>> timeout value. Default stays at 5, so tasks other than 'vzdump' are not
>> affected.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>   src/PVE/RESTEnvironment.pm | 16 ++++++++++++----
>>   1 file changed, 12 insertions(+), 4 deletions(-)
>>
> 
> Not to sure about that map there in pve-common, that module should stay rather
> agnostic of user special treatment.
> 
> Did you thought about passing that explicitly on worker creation, or setting it
> in the RPCEnv inside a worker?

I generally agree that it's a bit misplaced, but I don't see a way to 
encode it in the worker - the only info we have in check_worker and 
stop_task is the UPID, and I don't think it makes sense to encode a 
timeout in that? Or is there a way I'm not seeing to retrieve additional 
info about a worker from the UPID alone?

We could at least put the map in pve-manager, but I'm not sure if that's 
any better.

> 
>> diff --git a/src/PVE/RESTEnvironment.pm b/src/PVE/RESTEnvironment.pm
>> index d5b84d0..8a0cb9a 100644
>> --- a/src/PVE/RESTEnvironment.pm
>> +++ b/src/PVE/RESTEnvironment.pm
>> @@ -365,8 +365,16 @@ sub active_workers  {
>>       return $res;
>>   }
>>   
>> +my $timeout_map = {
>> +    # backup cancellation on slow target storages might take a while, avoid
>> +    # leaving the VM in locked state
>> +    "vzdump" => 60,
>> +};
>> +
>>   my $kill_process_group = sub {
>> -    my ($pid, $pstart) = @_;
>> +    my ($pid, $pstart, $timeout) = @_;
>> +
>> +    $timeout //= 5;
>>   
>>       # send kill to process group (negative pid)
>>       my $kpid = -$pid;
>> @@ -374,8 +382,7 @@ my $kill_process_group = sub {
>>       # always send signal to all pgrp members
>>       kill(15, $kpid); # send TERM signal
>>   
>> -    # give max 5 seconds to shut down
>> -    for (my $i = 0; $i < 5; $i++) {
>> +    for (my $i = 0; $i < $timeout; $i++) {
>>   	return if !PVE::ProcFSTools::check_process_running($pid, $pstart);
>>   	sleep (1);
>>       }
>> @@ -394,7 +401,8 @@ sub check_worker {
>>       return 0 if !$running;
>>   
>>       if ($killit) {
>> -	&$kill_process_group($task->{pid});
>> +	my $type = $task->{type};
>> +	&$kill_process_group($task->{pid}, undef, $timeout_map->{$type});
>>   	return 0;
>>       }
>>   
>>
> 
> 




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-27 11:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 15:39 [pve-devel] [PATCH common] allow longer timeout for cancelling 'vzdump' jobs Stefan Reiter
2021-01-26 18:23 ` Thomas Lamprecht
2021-01-27 11:11   ` Stefan Reiter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal