From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3641894CDA for ; Mon, 27 Feb 2023 10:50:21 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 200CE291F8 for ; Mon, 27 Feb 2023 10:50:21 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 27 Feb 2023 10:50:19 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7AD9F48969 for ; Mon, 27 Feb 2023 10:50:19 +0100 (CET) From: =?UTF-8?q?Fabian=20Gr=C3=BCnbichler?= To: pbs-devel@lists.proxmox.com Date: Mon, 27 Feb 2023 10:50:12 +0100 Message-Id: <20230227095012.291373-1-f.gruenbichler@proxmox.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.126 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [RFC proxmox-backup] drop exclusive lock for verify-after-complete X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Feb 2023 09:50:21 -0000 the backup is finished at that point, the only lock clash that is possible when dropping the exclusive and attempting to obtain a shared lock would be - the snapshot is pruned/removed - the backup is in a pre-upgrade process, and the post-upgrade process opens a reader the first case is OK, if the other invocation wins the race and removes the snapshot verification is pointless anyway. the second case means the snapshot is not verified directly after completion (this fact would be logged in the backup task log), but usable immediately for pulling/restoring/.. this should decrease the chances of triggering the issues described in #4523 Signed-off-by: Fabian Grünbichler --- Notes: right now our locking helpers don't support a direct downgrade (or attempt to upgrade, for that matter). given that we don't have many use cases that require either, I am not sure whether it's worth it to include that option in the planned revamp by Stefan (Sterz). for fully fixing #4523, we'd also need to improve our "snapshot is still being created" heuristics as described in the comment there. this would entail writing and removing some sort of marker in the backup session (and all other code paths that create snapshot dirs, like pull/sync, tape restore, ..) and checking that when listing snapshots, similar to protection status. this is mainly relevant for systems that use syncfs for ensuring datastore consistency. Tested by doing a backup with big delta and verify-after-complete set while doing pulls in a loop. The window where the snapshot was no longer consider in-progress but still exclusively locked still exists, but got much smaller. src/api2/backup/environment.rs | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs index 4f07f9b4..5291bce8 100644 --- a/src/api2/backup/environment.rs +++ b/src/api2/backup/environment.rs @@ -7,7 +7,7 @@ use ::serde::Serialize; use serde_json::{json, Value}; use proxmox_router::{RpcEnvironment, RpcEnvironmentType}; -use proxmox_sys::fs::{replace_file, CreateOptions}; +use proxmox_sys::fs::{lock_dir_noblock_shared, replace_file, CreateOptions}; use pbs_api_types::Authid; use pbs_datastore::backup_info::{BackupDir, BackupInfo}; @@ -634,7 +634,7 @@ impl BackupEnvironment { /// If verify-new is set on the datastore, this will run a new verify task /// for the backup. If not, this will return and also drop the passed lock /// immediately. - pub fn verify_after_complete(&self, snap_lock: Dir) -> Result<(), Error> { + pub fn verify_after_complete(&self, excl_snap_lock: Dir) -> Result<(), Error> { self.ensure_finished()?; if !self.datastore.verify_new() { @@ -642,6 +642,14 @@ impl BackupEnvironment { return Ok(()); } + // Downgrade to shared lock, the backup itself is finished + drop(excl_snap_lock); + let snap_lock = lock_dir_noblock_shared( + &self.backup_dir.full_path(), + "snapshot", + "snapshot is already locked by another operation", + )?; + let worker_id = format!( "{}:{}/{}/{:08X}", self.datastore.name(), -- 2.30.2