From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 55E6262933 for ; Tue, 22 Feb 2022 11:28:20 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 44F6122D5F for ; Tue, 22 Feb 2022 11:27:50 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 0FAA622D56 for ; Tue, 22 Feb 2022 11:27:49 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id D581B45239 for ; Tue, 22 Feb 2022 11:27:48 +0100 (CET) Date: Tue, 22 Feb 2022 11:27:42 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fabian Ebner , pve-devel@lists.proxmox.com, Thomas Lamprecht References: <20220221115828.76012-1-f.ebner@proxmox.com> <20220221115828.76012-2-f.ebner@proxmox.com> In-Reply-To: MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1645524681.xz819fggl9.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.187 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [guesthelpers.pm] Subject: Re: [pve-devel] [PATCH v3 guest-common 1/1] guest helpers: add run_with_replication_guard X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2022 10:28:20 -0000 On February 22, 2022 10:41 am, Fabian Ebner wrote: > Am 21.02.22 um 12:58 schrieb Fabian Ebner: >> Signed-off-by: Fabian Ebner >> --- >>=20 >> New in v3. >>=20 >> src/PVE/GuestHelpers.pm | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >>=20 >> diff --git a/src/PVE/GuestHelpers.pm b/src/PVE/GuestHelpers.pm >> index 970c460..1183819 100644 >> --- a/src/PVE/GuestHelpers.pm >> +++ b/src/PVE/GuestHelpers.pm >> @@ -3,8 +3,9 @@ package PVE::GuestHelpers; >> use strict; >> use warnings; >> =20 >> -use PVE::Tools; >> +use PVE::ReplicationConfig; >> use PVE::Storage; >> +use PVE::Tools; >> =20 >> use POSIX qw(strftime); >> use Scalar::Util qw(weaken); >> @@ -82,6 +83,18 @@ sub guest_migration_lock { >> return $res; >> } >> =20 >> +sub run_with_replication_guard { >> + my ($vmid, $timeout, $log, $func, @param) =3D @_; >> + >> + my $repl_conf =3D PVE::ReplicationConfig->new(); >> + if ($repl_conf->check_for_existing_jobs($vmid, 1)) { >> + $log->("checking/waiting for active replication..") if $log; >> + guest_migration_lock($vmid, $timeout, $func, @param); >=20 > I wonder if we should unconditionally take the lock? If not, we can race > with a newly created replication job: > 1. snapshot deletion starts > 2. replication job is created > 3. replication job starts > 4. snapshot deletion runs into 'dataset is busy' error, because snapshot > is used by replication that could also be solved by lock_config on the guest config when=20 creating/modifying a replication job, but unconditionally trying to=20 obtain the lock is also fine by me. > IIRC Thomas didn't want the log line to be printed when there is no > replication configured, and we can still do that if we go for the > "unconditionally acquire lock" approach (it should get the lock quickly > except in the above edge case), but it would mean checking the > replication config just for that. My suggestion would be to get rid of > this helper, not log anything in the cases with 10s timeout, and log if > replication is configured in the backup case with 600s timeout. for the long timeout case, the following would also work? or the=20 existing helper could switch to something like this, if we want to=20 keep it after all.. my $lock_obtained; my $do_delete =3D sub { $lock_obtained =3D 1; ... }; eval { # no msg lock_with_really_short_timeout($do_delete); }; if (my $err =3D $@) { # $lock_obtained tells us whether the error is failing to lock # or failing to delete die $err if $lock_obtained; msg(..) lock_with_long_timeout($do_delete); } that way we don't have a delay warranting a log message if no=20 replication is running (or even configured), but if one is running,=20 we'll quickly fail to obtain the lock, and can retry with a message that=20 explains the pause and a longer timeout.. >=20 >> + } else { >> + $func->(@param); >> + } >> +} >> + >> sub check_hookscript { >> my ($volid, $storecfg) =3D @_; >> =20 >=20