public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Fabian Ebner <f.ebner@proxmox.com>,
	pve-devel@lists.proxmox.com,
	Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pve-devel] [PATCH v3 guest-common 1/1] guest helpers: add run_with_replication_guard
Date: Tue, 22 Feb 2022 11:27:42 +0100	[thread overview]
Message-ID: <1645524681.xz819fggl9.astroid@nora.none> (raw)
In-Reply-To: <df346821-3dc3-d65e-f127-4c5e156424a7@proxmox.com>

On February 22, 2022 10:41 am, Fabian Ebner wrote:
> Am 21.02.22 um 12:58 schrieb Fabian Ebner:
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>> 
>> New in v3.
>> 
>>  src/PVE/GuestHelpers.pm | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>> 
>> diff --git a/src/PVE/GuestHelpers.pm b/src/PVE/GuestHelpers.pm
>> index 970c460..1183819 100644
>> --- a/src/PVE/GuestHelpers.pm
>> +++ b/src/PVE/GuestHelpers.pm
>> @@ -3,8 +3,9 @@ package PVE::GuestHelpers;
>>  use strict;
>>  use warnings;
>>  
>> -use PVE::Tools;
>> +use PVE::ReplicationConfig;
>>  use PVE::Storage;
>> +use PVE::Tools;
>>  
>>  use POSIX qw(strftime);
>>  use Scalar::Util qw(weaken);
>> @@ -82,6 +83,18 @@ sub guest_migration_lock {
>>      return $res;
>>  }
>>  
>> +sub run_with_replication_guard {
>> +    my ($vmid, $timeout, $log, $func, @param) = @_;
>> +
>> +    my $repl_conf = PVE::ReplicationConfig->new();
>> +    if ($repl_conf->check_for_existing_jobs($vmid, 1)) {
>> +	$log->("checking/waiting for active replication..") if $log;
>> +	guest_migration_lock($vmid, $timeout, $func, @param);
> 
> I wonder if we should unconditionally take the lock? If not, we can race
> with a newly created replication job:
> 1. snapshot deletion starts
> 2. replication job is created
> 3. replication job starts
> 4. snapshot deletion runs into 'dataset is busy' error, because snapshot
> is used by replication

that could also be solved by lock_config on the guest config when 
creating/modifying a replication job, but unconditionally trying to 
obtain the lock is also fine by me.

> IIRC Thomas didn't want the log line to be printed when there is no
> replication configured, and we can still do that if we go for the
> "unconditionally acquire lock" approach (it should get the lock quickly
> except in the above edge case), but it would mean checking the
> replication config just for that. My suggestion would be to get rid of
> this helper, not log anything in the cases with 10s timeout, and log if
> replication is configured in the backup case with 600s timeout.

for the long timeout case, the following would also work? or the 
existing helper could switch to something like this, if we want to 
keep it after all..

my $lock_obtained;
my $do_delete = sub {
  $lock_obtained = 1;
  ...
};

eval {
  # no msg
  lock_with_really_short_timeout($do_delete);
};

if (my $err = $@) {
  # $lock_obtained tells us whether the error is failing to lock
  # or failing to delete
  die $err
    if $lock_obtained;

  msg(..)
  lock_with_long_timeout($do_delete);
}

that way we don't have a delay warranting a log message if no 
replication is running (or even configured), but if one is running, 
we'll quickly fail to obtain the lock, and can retry with a message that 
explains the pause and a longer timeout..

> 
>> +    } else {
>> +	$func->(@param);
>> +    }
>> +}
>> +
>>  sub check_hookscript {
>>      my ($volid, $storecfg) = @_;
>>  
> 




  reply	other threads:[~2022-02-22 10:28 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-21 11:58 [pve-devel] [PATCH-SERIES guest-common/container/qemu-server] fix #3424: wait for active replication when deleting a snapshot Fabian Ebner
2022-02-21 11:58 ` [pve-devel] [PATCH v3 guest-common 1/1] guest helpers: add run_with_replication_guard Fabian Ebner
2022-02-22  9:41   ` Fabian Ebner
2022-02-22 10:27     ` Fabian Grünbichler [this message]
2022-02-22 13:44       ` Fabian Ebner
2022-02-21 11:58 ` [pve-devel] [PATCH v3 container 1/2] partially fix #3424: vzdump: cleanup: wait for active replication Fabian Ebner
2022-02-21 11:58 ` [pve-devel] [PATCH v3 container 2/2] fix #3424: api: snapshot delete: " Fabian Ebner
2022-02-21 11:58 ` [pve-devel] [PATCH v3 qemu-server 1/1] " Fabian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1645524681.xz819fggl9.astroid@nora.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal