[PATCH-SERIES cluster v2 0/2] cfs lock: small improvements

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements
@ 2026-02-18 15:44 Fiona Ebner
  2026-02-18 15:44 ` [PATCH cluster v2 1/2] cfs lock: attempt to acquire lock more frequently Fiona Ebner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Fiona Ebner @ 2026-02-18 15:44 UTC (permalink / raw)
  To: pve-devel

Changes in v2:
* Add patch to improve how signals are handled.

Because of a recent report in the forum [0], I wanted to ping the v1
of the submission [1], which could've helped in this case.

During re-testing, I ran into another issue and noticed that signals
are not nicely handled yet, so there is a second patch now :)

[0]: https://forum.proxmox.com/threads/75902/post-838426
[1]: https://lore.proxmox.com/pve-devel/20251104165933.81174-1-f.ebner@proxmox.com/

pve-cluster:

Fiona Ebner (2):
  cfs lock: attempt to acquire lock more frequently
  cfs lock: unlock when encountering signal

 src/PVE/Cluster.pm | 38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)


Summary over all repositories:
  1 files changed, 33 insertions(+), 5 deletions(-)

-- 
Generated by git-murpp 0.5.0




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH cluster v2 1/2] cfs lock: attempt to acquire lock more frequently
  2026-02-18 15:44 [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Fiona Ebner
@ 2026-02-18 15:44 ` Fiona Ebner
  2026-02-18 15:44 ` [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal Fiona Ebner
  2026-02-18 18:33 ` partially-applied: [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Thomas Lamprecht
  2 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2026-02-18 15:44 UTC (permalink / raw)
  To: pve-devel

The cfs lock helper would only try N times to acquire a lock when a
timeout of N seconds was specified (default 10). That is not very much
and was noticed while testing patches for properly locking shared LVM
storages during snapshot operations [0]. There, with only two
contenders doing a loop of operations that require taking a cfs lock,
one could already starve. With three instances of a synthetic
reproducer [1] running on three nodes, not even 10 iterations each
could be reached in testing.

Thomas suggested having the frequency of attempts to acquire the lock
depend on how much time is left until hitting the timeout, rather than
just increasing the frequency in general. Like this, contenders
already waiting longer have better chances. Until reaching 10 seconds
remaining, the sleep time for one iteration stays the same, namely 1
second. With less than 10 seconds remaining, the sleep time is the
integer amount of seconds remaining divided by ten.

With this approach, three instances of [1] can reliably reach 100
iterations each.

[0]: https://lore.proxmox.com/pve-devel/20251103162330.112603-5-f.ebner@proxmox.com/
[1]:
use v5.36;
use Time::HiRes qw(usleep);
use PVE::Cluster;
my $count = shift or die "specify number of lock acquisitions\n";
for (my $i = 0; $i < $count; $i++) {
    PVE::Cluster::cfs_lock_storage(
        "foo",
        10,
        sub { print "got lock $i\n"; usleep(500_000); },
    );
    die $@ if $@;
    usleep(100_000);
}

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/Cluster.pm | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index e829687..bdb465f 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -7,10 +7,12 @@ use Encode;
 use File::stat qw();
 use File::Path qw(make_path);
 use JSON;
+use List::Util;
 use Net::SSLeay;
 use POSIX qw(ENOENT);
 use Socket;
 use Storable qw(dclone);
+use Time::HiRes qw(usleep);
 
 use PVE::Certificate;
 use PVE::INotify;
@@ -622,19 +624,29 @@ my $cfs_lock = sub {
 
         my $timeout_err = sub { die "got lock request timeout\n"; };
         local $SIG{ALRM} = $timeout_err;
+        my $slept_usec = 0;
 
         while (1) {
-            alarm($timeout);
+            my $slept_sec = int($slept_usec / 1_000_000);
+            # Below increases by the actual amount of time slept, so in principle, the value
+            # $timeout - $slept_sec could end up being zero.
+            my $remaining_timeout_sec = List::Util::max($timeout - $slept_sec, 1);
+
+            alarm($remaining_timeout_sec);
             $got_lock = mkdir($filename);
-            $timeout = alarm(0) - 1; # we'll sleep for 1s, see down below
 
             last if $got_lock;
 
-            $timeout_err->() if $timeout <= 0;
+            my $sleep_usec = List::Util::min($remaining_timeout_sec, 10) * 100_000;
+            my $next_slept_sec = int(($slept_usec + $sleep_usec) / 1_000_000);
 
-            print STDERR "trying to acquire cfs lock '$lockid' ...\n";
+            $timeout_err->() if $next_slept_sec >= $timeout;
+
+            if ($next_slept_sec > $slept_sec) { # don't log more often than once per second
+                print STDERR "trying to acquire cfs lock '$lockid' ...\n";
+            }
             utime(0, 0, $filename); # cfs unlock request
-            sleep(1);
+            $slept_usec += usleep($sleep_usec);
         }
 
         # fixed command timeout: cfs locks have a timeout of 120
-- 
2.47.3





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal
  2026-02-18 15:44 [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Fiona Ebner
  2026-02-18 15:44 ` [PATCH cluster v2 1/2] cfs lock: attempt to acquire lock more frequently Fiona Ebner
@ 2026-02-18 15:44 ` Fiona Ebner
  2026-02-18 18:33   ` Thomas Lamprecht
  2026-02-18 18:33 ` partially-applied: [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Thomas Lamprecht
  2 siblings, 1 reply; 6+ messages in thread
From: Fiona Ebner @ 2026-02-18 15:44 UTC (permalink / raw)
  To: pve-devel

If the lock directory is not removed after failing because of a
signal, it won't be possible to acquire the lock anymore before the
120 second timeout imposed on the lock by pmxcfs. This can easily
happen by a second, unrelated task in production and is quite
surprising. Install a signal handler that releases the lock if it was
already acquired. If an old handler is defined, it is invoked,
otherwise the signal is raised again. Just using 'die' would change
the execution flow compared to before the change.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 src/PVE/Cluster.pm | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
index bdb465f..7165d1c 100644
--- a/src/PVE/Cluster.pm
+++ b/src/PVE/Cluster.pm
@@ -615,6 +615,22 @@ my $cfs_lock = sub {
 
     my $is_code_err = 0;
     eval {
+        # catch signals to release the lock - further defer to old handler if one was set
+        my $old_sig;
+        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);
+
+        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local $SIG{HUP} =
+            local $SIG{PIPE} = sub {
+                my $signame = $_[0];
+                rmdir $filename if $got_lock; # if we held the lock always unlock again
+                if ($old_sig->{$signame}) {
+                    $old_sig->{$signame}->(@_);
+                } else {
+                    $SIG{$signame} = 'DEFAULT';
+                    POSIX::raise($signame);
+                }
+                die "interrupted by signal\n";
+            };
 
         mkdir $lockdir;
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal
  2026-02-18 15:44 ` [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal Fiona Ebner
@ 2026-02-18 18:33   ` Thomas Lamprecht
  2026-02-19 13:45     ` Fiona Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Lamprecht @ 2026-02-18 18:33 UTC (permalink / raw)
  To: Fiona Ebner, pve-devel

Am 18.02.26 um 16:45 schrieb Fiona Ebner:
> If the lock directory is not removed after failing because of a
> signal, it won't be possible to acquire the lock anymore before the
> 120 second timeout imposed on the lock by pmxcfs. This can easily
> happen by a second, unrelated task in production and is quite
> surprising. Install a signal handler that releases the lock if it was
> already acquired. If an old handler is defined, it is invoked,
> otherwise the signal is raised again. Just using 'die' would change
> the execution flow compared to before the change.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>  src/PVE/Cluster.pm | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
> index bdb465f..7165d1c 100644
> --- a/src/PVE/Cluster.pm
> +++ b/src/PVE/Cluster.pm
> @@ -615,6 +615,22 @@ my $cfs_lock = sub {
>  
>      my $is_code_err = 0;
>      eval {
> +        # catch signals to release the lock - further defer to old handler if one was set
> +        my $old_sig;
> +        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);

really a non-issue in practice and basically the same thing under the hood, but
this could probably just a map, something like (untested):

my $old_sig = { map { $_ => $SIG{$_} qw(INT TERM QUIT HUP PIPE) };

> +
> +        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local $SIG{HUP} =
> +            local $SIG{PIPE} = sub {
> +                my $signame = $_[0];
> +                rmdir $filename if $got_lock; # if we held the lock always unlock again

Could be nice to output a warning if above rmdir fails?

> +                if ($old_sig->{$signame}) {
> +                    $old_sig->{$signame}->(@_);
> +                } else {
> +                    $SIG{$signame} = 'DEFAULT';
> +                    POSIX::raise($signame);

hmm, this reads alright, but then I'm wondering if it should be added elsewhere?
As I found not a single "POSIX::raise" or "raise\(" instance in our perl code
inside the /usr/share/perl5/{PVE,Proxmox} directories on a recent PVE 9 system, but
we have quite a few signal overrides, and while I did not checked those, I do believe
to remember that some of those fallback to the handler defined by the calling site.

Describing how exactly the code flow changes would be nice in any case.

> +                }
> +                die "interrupted by signal\n";
> +            };
>  
>          mkdir $lockdir;
>  




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal
  2026-02-18 18:33   ` Thomas Lamprecht
@ 2026-02-19 13:45     ` Fiona Ebner
  0 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2026-02-19 13:45 UTC (permalink / raw)
  To: Thomas Lamprecht, pve-devel

Am 18.02.26 um 7:33 PM schrieb Thomas Lamprecht:
> Am 18.02.26 um 16:45 schrieb Fiona Ebner:
>> If the lock directory is not removed after failing because of a
>> signal, it won't be possible to acquire the lock anymore before the
>> 120 second timeout imposed on the lock by pmxcfs. This can easily
>> happen by a second, unrelated task in production and is quite
>> surprising. Install a signal handler that releases the lock if it was
>> already acquired. If an old handler is defined, it is invoked,
>> otherwise the signal is raised again. Just using 'die' would change
>> the execution flow compared to before the change.
>>
>> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
>> ---
>>  src/PVE/Cluster.pm | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/src/PVE/Cluster.pm b/src/PVE/Cluster.pm
>> index bdb465f..7165d1c 100644
>> --- a/src/PVE/Cluster.pm
>> +++ b/src/PVE/Cluster.pm
>> @@ -615,6 +615,22 @@ my $cfs_lock = sub {
>>  
>>      my $is_code_err = 0;
>>      eval {
>> +        # catch signals to release the lock - further defer to old handler if one was set
>> +        my $old_sig;
>> +        $old_sig->{$_} = $SIG{$_} for qw(INT TERM QUIT HUP PIPE);
> 
> really a non-issue in practice and basically the same thing under the hood, but
> this could probably just a map, something like (untested):
> 
> my $old_sig = { map { $_ => $SIG{$_} qw(INT TERM QUIT HUP PIPE) };

Will do!

>> +
>> +        local $SIG{INT} = local $SIG{TERM} = local $SIG{QUIT} = local $SIG{HUP} =
>> +            local $SIG{PIPE} = sub {
>> +                my $signame = $_[0];
>> +                rmdir $filename if $got_lock; # if we held the lock always unlock again
> 
> Could be nice to output a warning if above rmdir fails?

Good point! Will also add it to the original line I copied this from.

>> +                if ($old_sig->{$signame}) {
>> +                    $old_sig->{$signame}->(@_);
>> +                } else {
>> +                    $SIG{$signame} = 'DEFAULT';
>> +                    POSIX::raise($signame);
> 
> hmm, this reads alright, but then I'm wondering if it should be added elsewhere?
> As I found not a single "POSIX::raise" or "raise\(" instance in our perl code
> inside the /usr/share/perl5/{PVE,Proxmox} directories on a recent PVE 9 system, but
> we have quite a few signal overrides, and while I did not checked those, I do believe
> to remember that some of those fallback to the handler defined by the calling site.

The only ones I found that do invoke the previous handler are in
PVE::Daemon. They also do not use raise, but terminate the server.

For some other ones it's most likely intentional to convert the signal
to a simple die. For example PVE:VZDump::QemuServer, where it makes
sense to just catch the signal and proceed with aborting the backup
rather than raise it again.

Compared to those, cfs_lock() is quite low in the call chains and there
are callers that just warn about an error from cfs_lock(). So while it
is essential to not convert a signal to a simple die in cfs_lock(), it
might not be for other current signal overrides.

> Describing how exactly the code flow changes would be nice in any case.

Do you mean expanding on the sentence mentioning "code flow" in the
commit message or something else?

>> +                }
>> +                die "interrupted by signal\n";
>> +            };
>>  
>>          mkdir $lockdir;
>>  





^ permalink raw reply	[flat|nested] 6+ messages in thread

* partially-applied: [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements
  2026-02-18 15:44 [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Fiona Ebner
  2026-02-18 15:44 ` [PATCH cluster v2 1/2] cfs lock: attempt to acquire lock more frequently Fiona Ebner
  2026-02-18 15:44 ` [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal Fiona Ebner
@ 2026-02-18 18:33 ` Thomas Lamprecht
  2 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2026-02-18 18:33 UTC (permalink / raw)
  To: pve-devel, Fiona Ebner

On Wed, 18 Feb 2026 16:44:28 +0100, Fiona Ebner wrote:
> Changes in v2:
> * Add patch to improve how signals are handled.
> 
> Because of a recent report in the forum [0], I wanted to ping the v1
> of the submission [1], which could've helped in this case.
> 
> During re-testing, I ran into another issue and noticed that signals
> are not nicely handled yet, so there is a second patch now :)
> 
> [...]

Applied the first one for now, thanks!

[1/2] cfs lock: attempt to acquire lock more frequently
      commit: 13a71af58bd3ecad3be9b960ae12e3de6343585c




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-19 13:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-18 15:44 [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Fiona Ebner
2026-02-18 15:44 ` [PATCH cluster v2 1/2] cfs lock: attempt to acquire lock more frequently Fiona Ebner
2026-02-18 15:44 ` [PATCH cluster v2 2/2] cfs lock: unlock when encountering signal Fiona Ebner
2026-02-18 18:33   ` Thomas Lamprecht
2026-02-19 13:45     ` Fiona Ebner
2026-02-18 18:33 ` partially-applied: [PATCH-SERIES cluster v2 0/2] cfs lock: small improvements Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal