From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 14E076382E for ; Wed, 26 Jan 2022 10:08:14 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0A7625847 for ; Wed, 26 Jan 2022 10:08:14 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 533AC583B for ; Wed, 26 Jan 2022 10:08:12 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 17F5A43938 for ; Wed, 26 Jan 2022 10:08:12 +0100 (CET) Message-ID: Date: Wed, 26 Jan 2022 10:08:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-US To: pve-devel@lists.proxmox.com References: <20220114130849.57616-1-f.ebner@proxmox.com> From: Fabian Ebner In-Reply-To: <20220114130849.57616-1-f.ebner@proxmox.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.134 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, lxc.pm] Subject: Re: [pve-devel] [PATCH v2 container] fix #3424: vzdump: cleanup: wait for active replication X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jan 2022 09:08:14 -0000 Am 14.01.22 um 14:08 schrieb Fabian Ebner: > As replication and backup can happen at the same time, the vzdump > snapshot might be actively used by replication when backup tries > to cleanup, resulting in a not (or only partially) removed snapshot > and locked (snapshot-delete) container. > > Wait up to 10 minutes for any ongoing replication. If replication > doesn't finish in time, the fact that there is no attempt to remove > the snapshot means that there's no risk for the container to end up in > a locked state. And the beginning of the next backup will force remove > the left-over snapshot, which will very likely succeed even at the > storage layer, because the replication really should be done by then > (subsequent replications shouldn't matter as they don't need to > re-transfer the vzdump snapshot). > > Suggested-by: Fabian Grünbichler > Signed-off-by: Fabian Ebner > --- Might not be the best approach as it doesn't cover the same edge case with manual snapshot removal: https://bugzilla.proxmox.com/show_bug.cgi?id=3424#c1 > > Changes from v1: > * Check if replication is configured first. > * Use "active replication" in log message. > > VM backups are not affected by this, because they don't use > storage/config snapshots, but use pve-qemu's block layer. > > Decided to go for this approach rather than replication waiting on > backup, because "full backup can take much longer than replication > usually does", and even if we time out, we can just skip the removal > for now and have the next backup do it. > > src/PVE/VZDump/LXC.pm | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/src/PVE/VZDump/LXC.pm b/src/PVE/VZDump/LXC.pm > index b7f7463..5bac089 100644 > --- a/src/PVE/VZDump/LXC.pm > +++ b/src/PVE/VZDump/LXC.pm > @@ -8,9 +8,11 @@ use File::Path; > use POSIX qw(strftime); > > use PVE::Cluster qw(cfs_read_file); > +use PVE::GuestHelpers; > use PVE::INotify; > use PVE::LXC::Config; > use PVE::LXC; > +use PVE::ReplicationConfig; > use PVE::Storage; > use PVE::Tools; > use PVE::VZDump; > @@ -476,8 +478,21 @@ sub cleanup { > } > > if ($task->{cleanup}->{remove_snapshot}) { > - $self->loginfo("cleanup temporary 'vzdump' snapshot"); > - PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0); > + my $do_remove = sub { > + $self->loginfo("cleanup temporary 'vzdump' snapshot"); > + PVE::LXC::Config->snapshot_delete($vmid, 'vzdump', 0); > + }; > + > + my $repl_conf = PVE::ReplicationConfig->new(); > + eval { > + if ($repl_conf->check_for_existing_jobs($vmid, 1)) { > + $self->loginfo("checking/waiting for active replication.."); > + PVE::GuestHelpers::guest_migration_lock($vmid, 600, $do_remove); > + } else { > + $do_remove->(); > + } > + }; > + die "snapshot 'vzdump' was not (fully) removed - $@" if $@; > } > } >