From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 048761FF191 for ; Tue, 4 Nov 2025 17:59:02 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0414B1266D; Tue, 4 Nov 2025 17:59:40 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Tue, 4 Nov 2025 17:58:27 +0100 Message-ID: <20251104165933.81174-1-f.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762275558474 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.021 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH cluster 1/1] cfs lock: attempt to acquire lock more frequently X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" The cfs lock helper would only try N times to acquire a lock when a timeout of N seconds was specified (default 10). That is not very much and was noticed while testing patches for properly locking shared LVM storages during snapshot operations [0]. There, with only two contenders doing a loop of operations that require taking a cfs lock, one could already starve. With three instances of a synthetic reproducer [1] running on three nodes, not even 10 iterations each could be reached in testing. Thomas suggested having the frequency of attempts to acquire the lock depend on how much time is left until hitting the timeout, rather than just increasing the frequency in general. Like this, contenders already waiting longer have better chances. Until reaching 10 seconds remaining, the sleep time for one iteration stays the same, namely 1 second. With less than 10 seconds remaining, the sleep time is the integer amount of seconds remaining divided by ten. With this approach, three instances of [1] can reliably reach 100 iterations each. [0]: https://lore.proxmox.com/pve-devel/20251103162330.112603-5-f.ebner@proxmox.com/ [1]: use v5.36; use Time::HiRes qw(usleep); use PVE::Cluster; my $count = shift or die "specify number of lock acquisitions\n"; for (my $i = 0; $i < $count; $i++) { PVE::Cluster::cfs_lock_storage( "foo", 10, sub { print "got lock $i\n"; usleep(500_000); }, ); die $@ if $@; usleep(100_000); } Suggested-by: Thomas Lamprecht Signed-off-by: Fiona Ebner --- src/PVE/Cluster.pm | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm index e829687..bdb465f 100644 --- a/src/PVE/Cluster.pm +++ b/src/PVE/Cluster.pm @@ -7,10 +7,12 @@ use Encode; use File::stat qw(); use File::Path qw(make_path); use JSON; +use List::Util; use Net::SSLeay; use POSIX qw(ENOENT); use Socket; use Storable qw(dclone); +use Time::HiRes qw(usleep); use PVE::Certificate; use PVE::INotify; @@ -622,19 +624,29 @@ my $cfs_lock = sub { my $timeout_err = sub { die "got lock request timeout\n"; }; local $SIG{ALRM} = $timeout_err; + my $slept_usec = 0; while (1) { - alarm($timeout); + my $slept_sec = int($slept_usec / 1_000_000); + # Below increases by the actual amount of time slept, so in principle, the value + # $timeout - $slept_sec could end up being zero. + my $remaining_timeout_sec = List::Util::max($timeout - $slept_sec, 1); + + alarm($remaining_timeout_sec); $got_lock = mkdir($filename); - $timeout = alarm(0) - 1; # we'll sleep for 1s, see down below last if $got_lock; - $timeout_err->() if $timeout <= 0; + my $sleep_usec = List::Util::min($remaining_timeout_sec, 10) * 100_000; + my $next_slept_sec = int(($slept_usec + $sleep_usec) / 1_000_000); - print STDERR "trying to acquire cfs lock '$lockid' ...\n"; + $timeout_err->() if $next_slept_sec >= $timeout; + + if ($next_slept_sec > $slept_sec) { # don't log more often than once per second + print STDERR "trying to acquire cfs lock '$lockid' ...\n"; + } utime(0, 0, $filename); # cfs unlock request - sleep(1); + $slept_usec += usleep($sleep_usec); } # fixed command timeout: cfs locks have a timeout of 120 -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel