From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 05D016B2FF for ; Fri, 18 Mar 2022 15:07:21 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 03D581E839 for ; Fri, 18 Mar 2022 15:07:21 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id ED34E1E822 for ; Fri, 18 Mar 2022 15:07:19 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id C49A542E1E for ; Fri, 18 Mar 2022 15:07:19 +0100 (CET) From: Stefan Sterz To: pbs-devel@lists.proxmox.com Date: Fri, 18 Mar 2022 15:06:54 +0100 Message-Id: <20220318140655.170656-5-s.sterz@proxmox.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220318140655.170656-1-s.sterz@proxmox.com> References: <20220318140655.170656-1-s.sterz@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.000 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pbs-devel] [RFC PATCH 4/5] fix #3935: datastore: keep manifest locks, avoid similar locking issue X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2022 14:07:21 -0000 just as removing the directories we locked on is an issue, so is removing the locks for the manifest. Signed-off-by: Stefan Sterz --- Similarly to the issue that led to this patch series the following seems plausible to me: A wants to remove a group, locks group, locks manifest (via `remove_group`) B wants to interact with the manifest, begins locking manifest via `lock_manifest`, it opens file handle to the lock file A deletes group, deletes manifest C wants to interact with the manifest, re-creates lock via `lock_manifest`, now holds a lock to the manifest A drops group and manifest lock B acquires lock B and C now hold the lock to the same manifest I am not sure how likely this is or how "bad", but given that `lock_manifest` has a 5 second timeout, it might be more likely than it appears at first. Correct me if I am wrong though. If I am not mistaken, this is also the reason why we can basically _never_ remove a file/directory that acts as a lock. One solution would be to require a higher lock protecting the lower lock, which would need to be acquired by all threads interacting with the given object. However, this basically negates the principle of acquiring locks with the least privileges and would more or less converge towards a global lock. For example: To remove a snapshot lock, we need to lock the group. If its the last snapshot and we want to remove the group, we'd need to lock the data store. Now every thread that wants to access a snapshot, needs to acquire a data store lock. There might be a nicer solution that I am missing. If I am wrong please tell me :) pbs-datastore/src/datastore.rs | 6 ------ 1 file changed, 6 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 11bd578d..60383860 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -346,12 +346,6 @@ impl DataStore { ) })?; - // the manifest does not exists anymore, we do not need to keep the lock - if let Ok(path) = self.manifest_lock_path(backup_dir) { - // ignore errors - let _ = std::fs::remove_file(path); - } - Ok(()) } -- 2.30.2