From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7DCDB81DEE for ; Fri, 26 Nov 2021 11:53:07 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 74269170B2 for ; Fri, 26 Nov 2021 11:52:37 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 9405E170A8 for ; Fri, 26 Nov 2021 11:52:36 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6EFDB44A04 for ; Fri, 26 Nov 2021 11:52:36 +0100 (CET) From: Fabian Ebner To: pve-devel@lists.proxmox.com Date: Fri, 26 Nov 2021 11:52:30 +0100 Message-Id: <20211126105232.436044-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.175 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, replication.pm, replicationstate.pm] Subject: [pve-devel] [PATCH guest-common 1/2] replication: update last_sync before removing old replication snapshots X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Nov 2021 10:53:07 -0000 If pvesr was terminated after finishing with the new sync and after removing old replication snapshots, but before it could write the new state, the next replication would fail. It would wrongly interpret the actual last replication snapshot as stale, remove it, and (if no other snapshots are present) attempt a full sync, which would fail. Reported in the community forum [0], this was brought to light by the new pvescheduler before it learned graceful reload. It's not possible to simply preserve a last remaining snapshot in prepare(), because prepare() is also used for valid removals. Instead, update last_sync early enough. Stale snapshots will still be removed on the next run if there are any. [0]: https://forum.proxmox.com/threads/100154 Signed-off-by: Fabian Ebner --- src/PVE/Replication.pm | 3 +++ src/PVE/ReplicationState.pm | 7 +++++++ 2 files changed, 10 insertions(+) diff --git a/src/PVE/Replication.pm b/src/PVE/Replication.pm index 051dfd9..de652f2 100644 --- a/src/PVE/Replication.pm +++ b/src/PVE/Replication.pm @@ -372,6 +372,9 @@ sub replicate { die $err; } + # Ensure that new sync is recorded before removing old replication snapshots. + PVE::ReplicationState::record_sync_end($jobcfg, $state, $start_time); + # remove old snapshots because they are no longer needed $cleanup_local_snapshots->($last_snapshots, $last_sync_snapname); diff --git a/src/PVE/ReplicationState.pm b/src/PVE/ReplicationState.pm index 81a1b31..8efe0e2 100644 --- a/src/PVE/ReplicationState.pm +++ b/src/PVE/ReplicationState.pm @@ -159,6 +159,13 @@ sub delete_guest_states { PVE::Tools::lock_file($state_lock, 10, $code); } +sub record_sync_end { + my ($jobcfg, $state, $start_time) = @_; + + $state->{last_sync} = $start_time; + write_job_state($jobcfg, $state); +} + sub record_job_end { my ($jobcfg, $state, $start_time, $duration, $err) = @_; -- 2.30.2