all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [PATCH v2 storage] lvm plugin: fix locking for rollback when using CLI
Date: Thu, 20 Nov 2025 14:21:53 +0100	[thread overview]
Message-ID: <1763644497.6m8gy24914.astroid@yuna.none> (raw)
In-Reply-To: <20251120101742.24843-1-f.ebner@proxmox.com>

On November 20, 2025 11:17 am, Fiona Ebner wrote:
> Doing a rollback via CLI on an LVM storage with 'saferemove' and
> 'snapshot-as-volume-chain' would run into a locking issue, because
> the forked zero-out worker would try to acquire the lock while the
> main CLI task is still inside the locked section for
> volume_snapshot_rollback_locked(). The same issue does not happen when
> the rollback is done via UI. The reason for this can be found in the
> note regarding fork_worker():
> 
>> we simulate running in foreground if ($self->{type} eq 'cli')
> 
> So the worker will be awaited synchronously in CLI context, resulting
> in the deadlock, while via API/UI, the main task would move on and
> release the lock allowing the zero-out worker to acquire it.
> 
> Avoid doing fork_cleanup_worker() inside the locked section to avoid
> the issue.
> 
> Fixes: 8eabcc7 ("lvm plugin: snapshot-as-volume-chain: use locking for snapshot operations")
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> Changes in v2:
> * Better explanation in commit message.
> * Also spawn cleanup worker in error scenario like before the patch.
> 
>  src/PVE/Storage/LVMPlugin.pm | 43 ++++++++++++++++++++++--------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 97f7bf4..102cf22 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -1117,23 +1117,17 @@ sub volume_rollback_is_possible {
>  }
>  
>  my sub volume_snapshot_rollback_locked {
> -    my ($class, $scfg, $storeid, $volname, $snap) = @_;
> +    my ($class, $scfg, $storeid, $volname, $snap, $cleanup_worker) = @_;
>  
>      my $format = ($class->parse_volname($volname))[6];
>  
>      die "can't rollback snapshot for '$format' volume\n" if $format ne 'qcow2';
>  
> -    my $cleanup_worker = eval { free_snap_image($class, $storeid, $scfg, $volname, 'current'); };
> +    $cleanup_worker->$* = eval { free_snap_image($class, $storeid, $scfg, $volname, 'current'); };
>      die "error deleting snapshot $snap $@\n" if $@;
>  
>      eval { alloc_snap_image($class, $storeid, $scfg, $volname, $snap) };
> -    my $alloc_err = $@;
> -
> -    fork_cleanup_worker($cleanup_worker);
> -
> -    if ($alloc_err) {
> -        die "can't allocate new volume $volname: $alloc_err\n";
> -    }
> +    die "can't allocate new volume $volname: $@\n" if $@;
>  
>      return undef;
>  }
> @@ -1141,14 +1135,29 @@ my sub volume_snapshot_rollback_locked {
>  sub volume_snapshot_rollback {
>      my ($class, $scfg, $storeid, $volname, $snap) = @_;
>  
> -    return $class->cluster_lock_storage(
> -        $storeid,
> -        $scfg->{shared},
> -        undef,
> -        sub {
> -            return volume_snapshot_rollback_locked($class, $scfg, $storeid, $volname, $snap);
> -        },
> -    );
> +    my $cleanup_worker;
> +
> +    eval {
> +        $class->cluster_lock_storage(
> +            $storeid,
> +            $scfg->{shared},
> +            undef,
> +            sub {
> +                volume_snapshot_rollback_locked(
> +                    $class, $scfg, $storeid, $volname, $snap, \$cleanup_worker,
> +                );
> +            },
> +        );
> +    };
> +    my $err = $@;
> +
> +    # Spawn outside of the locked section, because with 'saferemove', the cleanup worker also needs
> +    # to obtain the lock, and in CLI context, it will be awaited synchronously, see fork_worker().
> +    fork_cleanup_worker($cleanup_worker);

I mean, this just worked because the forked cleanup sub waited long
enough by chance for the lock/before acquiring the lock to allow the
parent to release it in the non-CLI case, right? i.e., it was still
wrong, because volume_snapshot_rollback_locked might have done more work
after forking the cleanup worker while still holding the lock..

all the other code paths here have this pattern:

my $cleanup_worker = eval { lock and do something };
fork_cleanup_worker($cleanup_worker)

which is exactly what this patch switches to, so consider this

Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>

> +
> +    die $err if $err;
> +
> +    return;
>  }
>  
>  sub volume_snapshot_delete {
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  reply	other threads:[~2025-11-20 13:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-20 10:17 Fiona Ebner
2025-11-20 13:21 ` Fabian Grünbichler [this message]
2025-11-20 13:35   ` Fiona Ebner
2025-11-20 14:06 ` [pve-devel] applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1763644497.6m8gy24914.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal