public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs
@ 2026-06-16 10:13 Lukas Sichert
  2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Lukas Sichert @ 2026-06-16 10:13 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

Logical volumes (LV) in an LVM (thick) volume group (VG) are
thick-provisioned, but the underlying backing storage can be
thin-provisioned. In particular, this can be the case if the VG resides
on a LUN provided by a SAN via ISCSI/FC/SAS [1], where the LUN may be
thin-provisioned on the SAN side.

In such setups, one usually wants that deleting an LV (e.g. VM disk)
frees up space on the SAN side, especially when using
snapshots-as-volume-chains, because snapshot LVs are thick-provisioned
LVs from the LVM point of view, so users may want to over-provision the
LUN on the SAN side.

One option to free up space when deleting an LV is to set
`issue_discards = 1` in the LVM config. With this setting, `lvremove`
will send discards for the regions previously used by the LV, which will
(if the SAN supports it) inform the SAN that the space is not in use
anymore and can be freed up. Since 'lvremove' modifies LVM metadata, it
has to be issued while holding a cluster-wide lock on the storage.
Unfortunately, depending on the setup, 'issue_discards = 1' can make
`lvremove` take very long for big disks (due to the large number of
discards being issued), so that it eventually hits the 60s timeout of
the cluster lock. The 60s are a hard-coded limit and cannot be easily
changed [2].

A better option is to issue discard before the final `lvremove`. This
informs the backing storage that the LV's blocks are no longer in use
without tying the potentially long-running discard operation to the
metadata update done by `lvremove`.

There is already a setting for `saferemove`, which zeroes out
to-be-deleted LVs before removing them. This series reworks that worker
to process the LV range by range instead of zeroing the whole LV in one
separate pass. This allows zero-out and discard to be combined: if both
actions are enabled, the worker zeroes one range, discards that same
range, and only then continues with the next range. This avoids forcing a
thin-provisioned SAN to allocate the whole LV with zeroes before the
space can be reclaimed again.

This series adds a new `on-volume-remove` property string with an
initial `discard` action. Following Fabian's feedback [4], the frontend
serializes the selected option into that property string, which is then
passed to the backend and parsed there. If only discard is enabled, the
renamed LV is discarded before the final remove. If `saferemove` is
enabled too, the worker performs the range-by-range zero-out and discard
described above.

Changes from v6 to v7 (thanks @Friedrich and Michael):
-rework the cleanup-worker to zero-out range by range
-rework the log and frontend messages
-document new 'saferemove' and 'discard' behavior
-make 'discard' opt-in and not default for newly created storages

Changes from v5 to v6:
-replace old storage config usage in the 'if'/'else' branches with the
new API variables

Changes from v4 to v5 (thanks @Friedrich and Fabian):
-rework the API layout to use a property string for volume-removal
options
-use 'if'/'else' instead of nested '?:' expressions for task log output
-revert renaming 'cleanup worker' to `discard worker`

Changes from v3 to v4 (thanks @Friedrich and Thomas):
-rework the worker-starting logic to avoid breaking abstraction layers
-rewrite the 'imgdel' task description in the UI to better match the
worker's behavior
-extend the code comment to also describe discard handling
-add additional syslog logging

Changes from v2 to v3 (thanks @Michael):
-correct issue_blkdiscard -> 'issue-blkdiscard' in the commit message
for the pve-manager
-replace 'previous commit' with a more obvious reference to first commit
of this series in the commit message for pve-manager

Changes from v1 to v2 (thanks @Michael, Maximiliano, Fabian):
-add more explicit descriptions in front- and backend, specifically
mentioning discard (TRIM)
-add a verbose description in the backend explaining the mechanism and
why it should be used for thin-provisioned storage
-add a forked fallback worker execution to allow other plugins to
issue workers without these config options
-rename variable issue_blkdiscard -> 'issue-blkdiscard' to conform to
newer style

[1] https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE#Storage_boxes_(SAN/NAS)
[2] https://forum.proxmox.com/threads/175849/post-820043
[3] https://man7.org/linux/man-pages/man8/blkdiscard.8.html
[4] https://lore.proxmox.com/all/177885528916.1932366.10236780530533306479@yuna.proxmox.com/
Link: bugzilla.proxmox.com/show_bug.cgi?id=7339


storage:

Lukas Sichert (2):
  lvm: saferemove: zero out volumes range by range
  fix #7339: lvm: add discard action for removed volumes

 src/PVE/Storage/LVMPlugin.pm | 257 +++++++++++++++++++++++++++--------
 1 file changed, 199 insertions(+), 58 deletions(-)


manager:

Lukas Sichert (1):
  fix #7339: lvmthick: ui: add UI option to free storage

 www/manager6/Utils.js           |  2 +-
 www/manager6/storage/LVMEdit.js | 45 +++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)


docs:

Lukas Sichert (1):
  fix #7339: lvm: document discard option

 pve-storage-lvm.adoc | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)


Summary over all repositories:
  4 files changed, 264 insertions(+), 65 deletions(-)

-- 
Generated by murpp 0.12.0




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
  2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
@ 2026-06-16 10:13 ` Lukas Sichert
  2026-06-30 11:49   ` Fiona Ebner
  2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Lukas Sichert @ 2026-06-16 10:13 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

'saferemove' currently uses different full-volume zero-out paths:
`blkdiscard --zeroout` for devices with write-zeroes support and
`cstream` otherwise. This makes progress and throttling inconsistent and
does not allow future discard cleanup to be interleaved with zeroing. On
thin-provisioned backing storage, zeroing the whole LV before discarding
it can also force unnecessary allocation.

Move zeroing into an explicit range loop. Use BLKZEROOUT when supported,
capped to the device limit, and fall back to manual zero writes
otherwise. Keep progress and throttling in the shared loop, and keep the
renamed LV if zeroing fails so cleanup can be retried.

Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
---
 src/PVE/Storage/LVMPlugin.pm | 168 ++++++++++++++++++++++++-----------
 1 file changed, 117 insertions(+), 51 deletions(-)

diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index 443d292..f0a7a80 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -13,11 +13,16 @@ use PVE::Tools qw(run_command file_read_firstline trim);
 
 use PVE::Storage::Common;
 use PVE::Storage::Plugin;
+use PVE::RESTEnvironment qw(log_warn);
 
 use base qw(PVE::Storage::Plugin);
 
 # lvm helper functions
 
+use constant {
+    BLKZEROOUT => 0x127f,
+};
+
 my $ignore_no_medium_warnings = sub {
     my $line = shift;
     # ignore those, most of the time they're from (virtual) IPMI/iKVM devices
@@ -279,6 +284,13 @@ sub lvm_list_volumes {
     return $lvs;
 }
 
+my sub blockdev_ioctl_range {
+    my ($fh, $ioctl_name, $ioctl, $offset, $length) = @_;
+
+    my $range = pack('QQ', $offset, $length);
+    ioctl($fh, $ioctl, $range) or die "$ioctl_name failed - $!\n";
+}
+
 my sub free_lvm_volumes_locked {
     my ($class, $scfg, $storeid, $volnames) = @_;
 
@@ -304,49 +316,97 @@ my sub free_lvm_volumes_locked {
             file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
         ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint
 
+        my $size = file_read_firstline("$sysdir/size") // 0;
+        ($size) = $size =~ m/^(\d+)$/; # untaint
+        $size *= 512; # sysfs size is in 512-byte sectors
+
+        my $zeroout_variant = 'blkzeroout';
+
+        # If the storage does not support write_zeroes fall back to writing zeroes manually using
+        # syswrite. Otherwise if the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
+        # the maximum supported by the storage.
         if ($write_zeroes_max_bytes == 0) {
-            # If the storage does not support 'write zeroes', we fallback to cstream.
-            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
-            my $throughput = '-10485760';
-            if ($scfg->{saferemove_throughput}) {
-                $throughput = $scfg->{saferemove_throughput};
-            }
+            print "falling back to syswrite to zero-out '$lvmpath'\n";
+            $stepsize = 1024 * 1024; # 1 MiB
+            $zeroout_variant = 'syswrite';
+        } elsif ($stepsize > $write_zeroes_max_bytes) {
+            print "reduce stepsize to the maximum supported by the storage:"
+                . " $write_zeroes_max_bytes bytes\n";
+            $stepsize = $write_zeroes_max_bytes;
+        }
+        my $zeroes = "\0" x $stepsize;
+        my $throughput = -1;
+        if ($scfg->{saferemove_throughput}) {
+            # use abs as legacy cstream accepted negative values
+            $throughput = abs($scfg->{saferemove_throughput});
+        }
 
-            my $cmd = [
-                '/usr/bin/cstream',
-                '-i',
-                '/dev/zero',
-                '-o',
-                $lvmpath,
-                '-T',
-                '10',
-                '-v',
-                '1',
-                '-b',
-                '1048576',
-                '-t',
-                "$throughput",
-            ];
-            eval {
-                run_command(
-                    $cmd,
-                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
-                );
-            };
-            warn $@ if $@;
-        } else {
-            # If the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
-            # the maximum supported by the storage.
-            if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) {
-                print "reduce stepsize to the maximum supported by the storage:"
-                    . " $write_zeroes_max_bytes bytes\n";
+        open(my $fh, '+<', $lvmpath) or die "can't open '$lvmpath' - $!\n";
 
-                $stepsize = $write_zeroes_max_bytes;
-            }
+        #eval block, so that filehandle is closed even if something fails below
+        eval {
+            my $start = time();
+            my $written_total = 0;
+            my $lastprint = -1;
+            for (my $offset = 0; $offset < $size; $offset += $stepsize) {
+
+                if ($offset + $stepsize > $size) {
+                    $stepsize = $size - $offset;
+                }
+
+                if ($zeroout_variant eq 'blkzeroout') {
+                    eval {
+                        blockdev_ioctl_range($fh, 'BLKZEROOUT', BLKZEROOUT, $offset, $stepsize);
+                    };
+                    if ($@) {
+                        die "blkzeroout failed: $@";
+                    }
+                } elsif ($zeroout_variant eq 'syswrite') {
+                    # if the $offset is 0, sysseek can return 0, therefore use // to only
+                    # throw an error, if it returns undef
+                    sysseek($fh, $offset, 0) // die "sysseek failed: $!\n";
 
-            my $cmd = ['blkdiscard', $lvmpath, '-v', '--zeroout', '--step', "${stepsize}"];
-            eval { run_command($cmd); };
-            warn $@ if $@;
+                    my $written = syswrite($fh, $zeroes, $stepsize)
+                        // die "syswrite failed: $!\n";
+
+                    if ($written != $stepsize) {
+                        die "short syswrite: wrote $written of $stepsize bytes\n";
+                    }
+                }
+                $written_total += $stepsize;
+
+                my $curr_time = time();
+                if (($curr_time - $lastprint) >= 3) {
+                    my $percent_finished = 100 * $written_total / $size;
+                    my $written_gb = $written_total / (1024**3);
+                    my $size_gb = $size / (1024**3);
+                    my $curr_seconds = $curr_time - $start;
+                    printf(
+                        "zeroed out %.2f GiB of %.2f GiB (%.2f%%) using %s in %d seconds\n",
+                        $written_gb,
+                        $size_gb,
+                        $percent_finished,
+                        $zeroout_variant,
+                        $curr_seconds,
+                    );
+                    $lastprint = $curr_time;
+                }
+
+                if ($throughput > 0) {
+                    my $expected_elapsed = $written_total / $throughput;
+                    my $actual_elapsed = $curr_time - $start;
+                    my $delay = $expected_elapsed - $actual_elapsed;
+                    if ($delay > 0) {
+                        sleep($delay);
+                    }
+                }
+            }
+        };
+        # close filehandle before throwing an error
+        my $err = $@;
+        close($fh);
+        if ($err) {
+            die "zeroing out failed: $err";
         }
     };
 
@@ -368,18 +428,24 @@ my sub free_lvm_volumes_locked {
                 errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
             );
 
-            $secure_delete_cmd->($lvmpath);
-
-            $class->cluster_lock_storage(
-                $storeid,
-                $scfg->{shared},
-                undef,
-                sub {
-                    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
-                    run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
-                },
-            );
-            print "successfully removed volume $name ($vg/del-$name)\n";
+            my $err = undef;
+            eval { $secure_delete_cmd->($lvmpath); };
+            $err = $@ if $@;
+
+            if (!$err) {
+                $class->cluster_lock_storage(
+                    $storeid,
+                    $scfg->{shared},
+                    undef,
+                    sub {
+                        my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
+                        run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
+                    },
+                );
+                print "successfully removed volume $name ($vg/del-$name)\n";
+            } else {
+                log_warn("$vg/del-$name is not being removed: $err");
+            }
         }
     };
 
-- 
2.47.3





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes
  2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
  2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
@ 2026-06-16 10:13 ` Lukas Sichert
  2026-06-30 10:55   ` Michael Köppl
  2026-06-30 12:14   ` Fiona Ebner
  2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
  2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
  3 siblings, 2 replies; 15+ messages in thread
From: Lukas Sichert @ 2026-06-16 10:13 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

On LVM storages backed by thin-provisioned SAN LUNs, removing an LV does
not release the allocated space on the backing storage. Using LVM's
`issue_discards` can avoid that, but makes `lvremove` issue the discards
while holding the cluster-wide storage lock, which can hit the lock
timeout for large volumes.

Add an `on-volume-remove` property with an initial `discard` action. The
cleanup worker issues the discard for the renamed LV before the final
remove, so the long-running operation happens outside the storage lock.
When combined with `saferemove`, the worker zeroes and discards the LV
range by range, avoiding allocation of the whole LV with zeroes on
thin-provisioned backing storage.

Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
Link: bugzilla.proxmox.com/show_bug.cgi?id=7339
---
 src/PVE/Storage/LVMPlugin.pm | 95 ++++++++++++++++++++++++++++++++----
 1 file changed, 85 insertions(+), 10 deletions(-)

diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
index f0a7a80..ee07543 100644
--- a/src/PVE/Storage/LVMPlugin.pm
+++ b/src/PVE/Storage/LVMPlugin.pm
@@ -13,6 +13,7 @@ use PVE::Tools qw(run_command file_read_firstline trim);
 
 use PVE::Storage::Common;
 use PVE::Storage::Plugin;
+use PVE::SafeSyslog;
 use PVE::RESTEnvironment qw(log_warn);
 
 use base qw(PVE::Storage::Plugin);
@@ -20,6 +21,7 @@ use base qw(PVE::Storage::Plugin);
 # lvm helper functions
 
 use constant {
+    BLKDISCARD => 0x1277,
     BLKZEROOUT => 0x127f,
 };
 
@@ -296,6 +298,12 @@ my sub free_lvm_volumes_locked {
 
     my $vg = $scfg->{vgname};
 
+    my $on_remove_opts;
+    if ($scfg->{'on-volume-remove'}) {
+        $on_remove_opts =
+            PVE::JSONSchema::parse_property_string('on-volume-remove', $scfg->{'on-volume-remove'});
+    }
+
     my $secure_delete_cmd = sub {
         my ($lvmpath) = @_;
 
@@ -373,6 +381,14 @@ my sub free_lvm_volumes_locked {
                         die "short syswrite: wrote $written of $stepsize bytes\n";
                     }
                 }
+                if ($on_remove_opts->{discard}) {
+                    eval {
+                        blockdev_ioctl_range($fh, 'BLKDISCARD', BLKDISCARD, $offset, $stepsize);
+                    };
+                    if ($@) {
+                        log_warn("blkdiscard failed: $@");
+                    }
+                }
                 $written_total += $stepsize;
 
                 my $curr_time = time();
@@ -409,28 +425,49 @@ my sub free_lvm_volumes_locked {
             die "zeroing out failed: $err";
         }
     };
-
     # we need to zero out LVM data for security reasons
-    # and to allow thin provisioning
-    my $zero_out_worker = sub {
+    # and discard images to free storage space to allow
+    # thin provisioning
+    my $cleanup_worker = sub {
+
         for my $name (@$volnames) {
             my $lvmpath = "/dev/$vg/del-$name";
-            print "zero-out data on image $name ($lvmpath)\n";
+
+            my $discard_action;
+            if ($scfg->{saferemove} && $on_remove_opts->{discard}) {
+                $discard_action = 'zero-out and discard (TRIM)';
+            } elsif ($scfg->{saferemove}) {
+                $discard_action = 'zero-out';
+            } elsif ($on_remove_opts->{discard}) {
+                $discard_action = 'discard (TRIM)';
+            }
+            print "$discard_action data on image $name ($lvmpath)\n";
 
             my $cmd_activate = ['/sbin/lvchange', '-aly', $lvmpath];
             run_command(
                 $cmd_activate,
-                errmsg => "can't activate LV '$lvmpath' to zero-out its data",
+                errmsg => "can't activate LV '$lvmpath' to $discard_action its data",
             );
             $cmd_activate = ['/sbin/lvchange', '--refresh', $lvmpath];
             run_command(
                 $cmd_activate,
-                errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
+                errmsg => "can't refresh LV '$lvmpath' to $discard_action its data",
             );
+            syslog('info', "starting to $discard_action $name ($lvmpath)");
 
             my $err = undef;
-            eval { $secure_delete_cmd->($lvmpath); };
-            $err = $@ if $@;
+            if ($scfg->{saferemove}) {
+                eval { $secure_delete_cmd->($lvmpath); };
+                $err = $@ if $@;
+            }
+
+            # also don't remove the volume if blkdiscard fails, because otherwise the
+            # storage on a thinly-backed storage can't be reclaimed without removing the
+            # complete pool
+            if ($on_remove_opts->{discard} && !$scfg->{saferemove}) {
+                eval { run_command(['/sbin/blkdiscard', $lvmpath]); };
+                $err = $@ if $@;
+            }
 
             if (!$err) {
                 $class->cluster_lock_storage(
@@ -449,13 +486,13 @@ my sub free_lvm_volumes_locked {
         }
     };
 
-    if ($scfg->{saferemove}) {
+    if ($scfg->{saferemove} || $on_remove_opts->{discard}) {
         for my $name (@$volnames) {
             # avoid long running task, so we only rename here
             my $cmd = ['/sbin/lvrename', $vg, $name, "del-$name"];
             run_command($cmd, errmsg => "lvrename '$vg/$name' error");
         }
-        return $zero_out_worker;
+        return $cleanup_worker;
     } else {
         for my $name (@$volnames) {
             my $cmd = ['/sbin/lvremove', '-f', "$vg/$name"];
@@ -478,6 +515,36 @@ sub plugindata {
     };
 }
 
+my $on_volume_remove_format = {
+    discard => {
+        description => "Issue discard (TRIM) requests for LVs before removing them.",
+        type => 'boolean',
+        optional => 1,
+        verbose_description => "If enabled, blkdiscard is issued for the LV before removing it."
+            . " This sends discard (TRIM) requests for the LV's block range, allowing"
+            . " thin-provisioned storage to reclaim previously allocated physical"
+            . " space, provided the storage supports discard.",
+    },
+};
+
+sub verify_on_volume_remove {
+    my ($value, $noerr) = @_;
+
+    return undef if !defined($value);
+
+    if (!keys %$value) {
+        return undef if $noerr;
+        die "at least one on-volume-remove option must be specified if the property is set\n";
+    }
+    return $value;
+}
+
+PVE::JSONSchema::register_format(
+    'on-volume-remove',
+    $on_volume_remove_format,
+    \&verify_on_volume_remove,
+);
+
 sub properties {
     return {
         vgname => {
@@ -494,6 +561,13 @@ sub properties {
             description => "Zero-out data when removing LVs.",
             type => 'boolean',
         },
+        'on-volume-remove' => {
+            description => "Optional actions when removing LVs.",
+            type => 'string',
+            format => 'on-volume-remove',
+            verbose_description => "Configure actions performed before removing an LV."
+                . " Use 'discard=1' to issue discard (TRIM) requests before removal.",
+        },
         'saferemove-stepsize' => {
             description => "Wipe step size in MiB."
                 . " It will be capped to the maximum supported by the storage.",
@@ -519,6 +593,7 @@ sub options {
         shared => { optional => 1 },
         disable => { optional => 1 },
         saferemove => { optional => 1 },
+        'on-volume-remove' => { optional => 1 },
         'saferemove-stepsize' => { optional => 1 },
         saferemove_throughput => { optional => 1 },
         content => { optional => 1 },
-- 
2.47.3





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage
  2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
  2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
  2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
@ 2026-06-16 10:13 ` Lukas Sichert
  2026-06-30 11:38   ` Michael Köppl
  2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
  3 siblings, 1 reply; 15+ messages in thread
From: Lukas Sichert @ 2026-06-16 10:13 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

The storage commit 'fix #7339: lvm: add discard action for removed
volumes' adds backend support to discard allocated space of a VM disk on
a SAN, when a VM is deleted. The backend checks whether to use this
option by parsing storage.cfg for the 'on-volume-remove' property string
and checking if the 'discard' property is set. The variable
'on-volume-remove' will automatically be stored into the config file if
it is present in the 'PUT' API request.

Expose this option in the GUI by adding a checkbox mapped to the local
form field `on-remove-discard`. Map `on-remove-discard` to the `discard`
property of the `on-volume-remove` property string. If at least one
`on-volume-remove` option is enabled, serialize the selected options
into `on-volume-remove` and include it in the returned API parameters.
Also rename the 'imgdel' task description from 'Erase data' to 'Destroy
image', since the task can now also discard storage blocks in addition
to zeroing data.

Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
Link: bugzilla.proxmox.com/show_bug.cgi?id=7339
---
 www/manager6/Utils.js           |  2 +-
 www/manager6/storage/LVMEdit.js | 45 +++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/www/manager6/Utils.js b/www/manager6/Utils.js
index bbf59d8f..db867743 100644
--- a/www/manager6/Utils.js
+++ b/www/manager6/Utils.js
@@ -2182,7 +2182,7 @@ Ext.define('PVE.Utils', {
             hastart: ['HA', gettext('Start')],
             hastop: ['HA', gettext('Stop')],
             imgcopy: ['', gettext('Copy data')],
-            imgdel: ['', gettext('Erase data')],
+            imgdel: ['', gettext('Destroy image')],
             lvmcreate: [gettext('LVM Storage'), gettext('Create')],
             lvmremove: ['Volume Group', gettext('Remove')],
             lvmthincreate: [gettext('LVM-Thin Storage'), gettext('Create')],
diff --git a/www/manager6/storage/LVMEdit.js b/www/manager6/storage/LVMEdit.js
index 148f0601..30ae275b 100644
--- a/www/manager6/storage/LVMEdit.js
+++ b/www/manager6/storage/LVMEdit.js
@@ -148,6 +148,39 @@ Ext.define('PVE.storage.LVMInputPanel', {
 
     onlineHelp: 'storage_lvm',
 
+    onGetValues: function (values) {
+        let me = this;
+
+        let onRemove = {};
+        if (values['on-remove-discard']) {
+            onRemove.discard = 1;
+        }
+        delete values['on-remove-discard'];
+
+        let onRemoveString = PVE.Parser.printPropertyString(onRemove);
+        if (onRemoveString !== '') {
+            values['on-volume-remove'] = onRemoveString;
+        } else if (!me.isCreate) {
+            if (!values.delete) {
+                values.delete = [];
+            }
+            values.delete.push('on-volume-remove');
+        }
+
+        return me.callParent([values]);
+    },
+
+    setValues: function (values) {
+        if (values['on-volume-remove']) {
+            let onRemove = PVE.Parser.parsePropertyString(values['on-volume-remove']);
+            values['on-remove-discard'] = onRemove.discard;
+        }
+
+        delete values['on-volume-remove'];
+
+        return this.callParent([values]);
+    },
+
     column1: [
         {
             xtype: 'pveBaseStorageSelector',
@@ -241,5 +274,17 @@ Ext.define('PVE.storage.LVMInputPanel', {
             uncheckedValue: 0,
             fieldLabel: gettext('Wipe Removed Volumes'),
         },
+        {
+            xtype: 'proxmoxcheckbox',
+            name: 'on-remove-discard',
+            uncheckedValue: 0,
+            fieldLabel: gettext('Discard Removed Volumes'),
+            autoEl: {
+                tag: 'div',
+                'data-qtip': gettext(
+                    'Enable to issue discard (TRIM) requests for logical volumes before removing them.',
+                ),
+            },
+        },
     ],
 });
-- 
2.47.3





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH docs v7 4/4] fix #7339: lvm: document discard option
  2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
                   ` (2 preceding siblings ...)
  2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
@ 2026-06-16 10:13 ` Lukas Sichert
  2026-06-30 11:48   ` Michael Köppl
  2026-06-30 12:50   ` Fiona Ebner
  3 siblings, 2 replies; 15+ messages in thread
From: Lukas Sichert @ 2026-06-16 10:13 UTC (permalink / raw)
  To: pve-devel; +Cc: Lukas Sichert

Document the new `on-volume-remove` property for LVM storage and its
initial `discard` action.

Also update the `saferemove` description to match the range-based
zero-out worker and avoid referring to the old command-specific
implementation details.

Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
Link: bugzilla.proxmox.com/show_bug.cgi?id=7339
---
 pve-storage-lvm.adoc | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/pve-storage-lvm.adoc b/pve-storage-lvm.adoc
index ba78663..a297f31 100644
--- a/pve-storage-lvm.adoc
+++ b/pve-storage-lvm.adoc
@@ -44,18 +44,31 @@ accessed by other LVs created later (which happen to be assigned the same
 physical extents). This is a costly operation, but may be required as a security
 measure in certain environments.
 +
-Storage devices that support the "write zeroes" operation will use `blkdiscard`
-to zero blocks. Otherwise, a fallback to `cstream` is performed.
+Storage devices that support the "write zeroes" operation use it to zero blocks.
+Otherwise, zeroes are written manually. The volume is processed range by range,
+according to `saferemove-stepsize`.
+
+`on-volume-remove`::
+
+Configure additional actions to run before removing an LV.
++
+The initial supported option is `discard=1`. This issues discard requests for
+the removed LV so thin-provisioned backing storage, for example a SAN LUN, can
+reclaim the space used by the LV.
++
+If `saferemove` is enabled too, the LV is processed range by range: one range is
+zeroed out and then discarded before continuing with the next range. This avoids
+allocating the whole LV with zeroes on thin-provisioned backing storage before
+the space can be reclaimed again.
 
 `saferemove-stepsize`::
 
-Wipe step size in MiB (`blkdiscard -p` parameter value), capped to the maximum
-step size supported by the underlying storage. Up to 32 MiB (maximum) by
-default.
+Wipe step size in MiB, capped to the maximum step size supported by the
+underlying storage. Up to 32 MiB (maximum) by default.
 
 `saferemove_throughput`::
 
-Wipe throughput (`cstream -t` parameter value), up to 10 MiB/s by default.
+Wipe throughput, unlimited by default.
 
 `snapshot-as-volume-chain`::
 
-- 
2.47.3





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes
  2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
@ 2026-06-30 10:55   ` Michael Köppl
  2026-06-30 12:14   ` Fiona Ebner
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Köppl @ 2026-06-30 10:55 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

On Tue Jun 16, 2026 at 12:13 PM CEST, Lukas Sichert wrote:

[snip]

> +            # also don't remove the volume if blkdiscard fails, because otherwise the
> +            # storage on a thinly-backed storage can't be reclaimed without removing the
> +            # complete pool
> +            if ($on_remove_opts->{discard} && !$scfg->{saferemove}) {
> +                eval { run_command(['/sbin/blkdiscard', $lvmpath]); };
> +                $err = $@ if $@;

I'm not quite sure regarding this one. I understand why it's there, but
I think the case where discard is simply not supported should maybe be
covered separately. Consider the following scenario:

- Block device that does not support discard
  (/sys/block/<device>/queue/discard_max_bytes is 0)
- Create LVM storage on it (discard on, saferemove off)
    - pvcreate /dev/<device>
    - vgcreate vgtest <device>
    - pvesm add lvm test --vgname vgtest --content images
    - pvesm set test --on-volume-remove discard=1
- Allocate a disk image (pvesm alloc test 9999 vm-disk-0 1G)

If you now call `pvesm free test:vm-disk-0`, the task will show an
"Operation not supported" warning for blkdiscard. dev-vm-disk-0
remains, the space is not cleared, which is expected. But if I now try
to `alloc` again and `pvesm free` again, this will always fail due to
a naming collision with the leftover `del-vm-disk-0`.

One could argue that enabling discard for a device that does not support
it is a user error, but on the other hand the "fallout" from this
mistake is IMO at least annoying and it's relatively easy to prevent by
e.g. explicitly checking /sys/block/<dev>/queue/discard_max_bytes and
warning the user that the operation is not supported, but carrying on
with the regular removal of the volume? Because if the block device does
not support discard, there's nothing to be done anyway to reclaim the
space later, no? If the device does not support discard today, it won't
support it tomorrow. Although the best UX would probably be to perform
this check when the storage is created to prevent the discard option
from being set if it's not supported?

> +            }
>  
>              if (!$err) {
>                  $class->cluster_lock_storage(
> @@ -449,13 +486,13 @@ my sub free_lvm_volumes_locked {
>          }
>      };

[snip]




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage
  2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
@ 2026-06-30 11:38   ` Michael Köppl
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Köppl @ 2026-06-30 11:38 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Tested this on my local setup. As suggested in my comment on your other
patch, I think UX-wise it'd be nice if there was a check for discard
support when the checkbox is set. It's just something I noticed while
testing the UI side, but other than that this lgtm.

Reviewed-by: Michael Köppl <m.koeppl@proxmox.com>
Tested-by: Michael Köppl <m.koeppl@proxmox.com>

On Tue Jun 16, 2026 at 12:13 PM CEST, Lukas Sichert wrote:
> The storage commit 'fix #7339: lvm: add discard action for removed
> volumes' adds backend support to discard allocated space of a VM disk on
> a SAN, when a VM is deleted. The backend checks whether to use this

[snip]




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH docs v7 4/4] fix #7339: lvm: document discard option
  2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
@ 2026-06-30 11:48   ` Michael Köppl
  2026-06-30 12:50   ` Fiona Ebner
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Köppl @ 2026-06-30 11:48 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Added 1 suggestion inline. Other than that, consider this:

Reviewed-by: Michael Köppl <m.koeppl@proxmox.com>

On Tue Jun 16, 2026 at 12:13 PM CEST, Lukas Sichert wrote:

[snip]

> +
> +`on-volume-remove`::
> +
> +Configure additional actions to run before removing an LV.
> ++
> +The initial supported option is `discard=1`. This issues discard requests for

IMO the way this is phrased reads a bit "roadmappy". Yes, there might be
more options to come, but I think something like this would work better:

"Configure additional actions to run before an LV is removed. Set
`discard=1` to issue a discard (TRIM) request for the LV's blocks before
the LV is remove, so thin-provisioned backing storage, such as a SAN
LUN, can reclaim space the LV occupied."

Just a suggestion, though.

> +the removed LV so thin-provisioned backing storage, for example a SAN LUN, can
> +reclaim the space used by the LV.
> ++
> +If `saferemove` is enabled too, the LV is processed range by range: one range is

[snip]




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
  2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
@ 2026-06-30 11:49   ` Fiona Ebner
  2026-06-30 14:16     ` Lukas Sichert
  0 siblings, 1 reply; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 11:49 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 16.06.26 um 12:12 PM schrieb Lukas Sichert:
> 'saferemove' currently uses different full-volume zero-out paths:
> `blkdiscard --zeroout` for devices with write-zeroes support and
> `cstream` otherwise. This makes progress and throttling inconsistent and
> does not allow future discard cleanup to be interleaved with zeroing. On
> thin-provisioned backing storage, zeroing the whole LV before discarding
> it can also force unnecessary allocation.
> 
> Move zeroing into an explicit range loop. Use BLKZEROOUT when supported,
> capped to the device limit, and fall back to manual zero writes
> otherwise. Keep progress and throttling in the shared loop, and keep the
> renamed LV if zeroing fails so cleanup can be retried.
> 
> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 168 ++++++++++++++++++++++++-----------
>  1 file changed, 117 insertions(+), 51 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 443d292..f0a7a80 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -13,11 +13,16 @@ use PVE::Tools qw(run_command file_read_firstline trim);
>  
>  use PVE::Storage::Common;
>  use PVE::Storage::Plugin;
> +use PVE::RESTEnvironment qw(log_warn);

Style nit: please adhere to the style guide for module ordering:
https://pve.proxmox.com/wiki/Perl_Style_Guide#Module_Dependencies

>  
>  use base qw(PVE::Storage::Plugin);
>  
>  # lvm helper functions
>  
> +use constant {
> +    BLKZEROOUT => 0x127f,
> +};
> +
>  my $ignore_no_medium_warnings = sub {
>      my $line = shift;
>      # ignore those, most of the time they're from (virtual) IPMI/iKVM devices
> @@ -279,6 +284,13 @@ sub lvm_list_volumes {
>      return $lvs;
>  }
>  
> +my sub blockdev_ioctl_range {
> +    my ($fh, $ioctl_name, $ioctl, $offset, $length) = @_;
> +
> +    my $range = pack('QQ', $offset, $length);
> +    ioctl($fh, $ioctl, $range) or die "$ioctl_name failed - $!\n";
> +}
> +
>  my sub free_lvm_volumes_locked {
>      my ($class, $scfg, $storeid, $volnames) = @_;
>  
> @@ -304,49 +316,97 @@ my sub free_lvm_volumes_locked {
>              file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
>          ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint
>  
> +        my $size = file_read_firstline("$sysdir/size") // 0;
> +        ($size) = $size =~ m/^(\d+)$/; # untaint

Shouldn't we rather die when the size cannot be read or parsed?

> +        $size *= 512; # sysfs size is in 512-byte sectors
> +
> +        my $zeroout_variant = 'blkzeroout';
> +
> +        # If the storage does not support write_zeroes fall back to writing zeroes manually using
> +        # syswrite. Otherwise if the storage supports write_zeroes but stepsize is too big, reduce the stepsize to

Style nit: line too long

> +        # the maximum supported by the storage.
>          if ($write_zeroes_max_bytes == 0) {
> -            # If the storage does not support 'write zeroes', we fallback to cstream.
> -            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
> -            my $throughput = '-10485760';
> -            if ($scfg->{saferemove_throughput}) {
> -                $throughput = $scfg->{saferemove_throughput};
> -            }
> +            print "falling back to syswrite to zero-out '$lvmpath'\n";

Nit: maybe also mention the why, e.g. "WRITE_ZEROES operation not
supported, falling back ..."

> +            $stepsize = 1024 * 1024; # 1 MiB
> +            $zeroout_variant = 'syswrite';
> +        } elsif ($stepsize > $write_zeroes_max_bytes) {
> +            print "reduce stepsize to the maximum supported by the storage:"
> +                . " $write_zeroes_max_bytes bytes\n";
> +            $stepsize = $write_zeroes_max_bytes;
> +        }
> +        my $zeroes = "\0" x $stepsize;
> +        my $throughput = -1;

The old default was 10 MiB/s, we should continue using it for the
syswrite case. For the BLKZEROOUT case, we can use unlimited as a default.

> +        if ($scfg->{saferemove_throughput}) {
> +            # use abs as legacy cstream accepted negative values
> +            $throughput = abs($scfg->{saferemove_throughput});
> +        }
>  
> -            my $cmd = [
> -                '/usr/bin/cstream',
> -                '-i',
> -                '/dev/zero',
> -                '-o',
> -                $lvmpath,
> -                '-T',
> -                '10',
> -                '-v',
> -                '1',
> -                '-b',
> -                '1048576',
> -                '-t',
> -                "$throughput",
> -            ];
> -            eval {
> -                run_command(
> -                    $cmd,
> -                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
> -                );
> -            };
> -            warn $@ if $@;
> -        } else {
> -            # If the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
> -            # the maximum supported by the storage.
> -            if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) {
> -                print "reduce stepsize to the maximum supported by the storage:"
> -                    . " $write_zeroes_max_bytes bytes\n";
> +        open(my $fh, '+<', $lvmpath) or die "can't open '$lvmpath' - $!\n";
>  
> -                $stepsize = $write_zeroes_max_bytes;
> -            }
> +        #eval block, so that filehandle is closed even if something fails below
> +        eval {
> +            my $start = time();
> +            my $written_total = 0;
> +            my $lastprint = -1;
> +            for (my $offset = 0; $offset < $size; $offset += $stepsize) {
> +
> +                if ($offset + $stepsize > $size) {
> +                    $stepsize = $size - $offset;
> +                }
> +
> +                if ($zeroout_variant eq 'blkzeroout') {
> +                    eval {
> +                        blockdev_ioctl_range($fh, 'BLKZEROOUT', BLKZEROOUT, $offset, $stepsize);
> +                    };
> +                    if ($@) {
> +                        die "blkzeroout failed: $@";

I would also log offset and length of the request, so that a failure can
be analyzed better in practice.

> +                    }
> +                } elsif ($zeroout_variant eq 'syswrite') {
> +                    # if the $offset is 0, sysseek can return 0, therefore use // to only
> +                    # throw an error, if it returns undef
> +                    sysseek($fh, $offset, 0) // die "sysseek failed: $!\n";

You can import the constant for the WHENCE from Fcntl like is done in
PVE::File, then it's more readable than 0.

>  
> -            my $cmd = ['blkdiscard', $lvmpath, '-v', '--zeroout', '--step', "${stepsize}"];
> -            eval { run_command($cmd); };
> -            warn $@ if $@;
> +                    my $written = syswrite($fh, $zeroes, $stepsize)
> +                        // die "syswrite failed: $!\n";
> +
> +                    if ($written != $stepsize) {
> +                        die "short syswrite: wrote $written of $stepsize bytes\n";

We should re-attempt to write the rest after a short write. I'd only die
if there was really no progress at all, i.e. $written == 0.

> +                    }
> +                }
> +                $written_total += $stepsize;
> +
> +                my $curr_time = time();
> +                if (($curr_time - $lastprint) >= 3) {
> +                    my $percent_finished = 100 * $written_total / $size;
> +                    my $written_gb = $written_total / (1024**3);
> +                    my $size_gb = $size / (1024**3);

You can use render_bytes() and render_duration() from PVE::Format

> +                    my $curr_seconds = $curr_time - $start;
> +                    printf(
> +                        "zeroed out %.2f GiB of %.2f GiB (%.2f%%) using %s in %d seconds\n",
> +                        $written_gb,
> +                        $size_gb,
> +                        $percent_finished,
> +                        $zeroout_variant,
> +                        $curr_seconds,
> +                    );
> +                    $lastprint = $curr_time;
> +                }
> +
> +                if ($throughput > 0) {
> +                    my $expected_elapsed = $written_total / $throughput;
> +                    my $actual_elapsed = $curr_time - $start;
> +                    my $delay = $expected_elapsed - $actual_elapsed;
> +                    if ($delay > 0) {
> +                        sleep($delay);
> +                    }
> +                }
> +            }
> +        };
> +        # close filehandle before throwing an error
> +        my $err = $@;
> +        close($fh);
> +        if ($err) {
> +            die "zeroing out failed: $err";
>          }
>      };
>  
> @@ -368,18 +428,24 @@ my sub free_lvm_volumes_locked {
>                  errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
>              );
>  
> -            $secure_delete_cmd->($lvmpath);
> -
> -            $class->cluster_lock_storage(
> -                $storeid,
> -                $scfg->{shared},
> -                undef,
> -                sub {
> -                    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
> -                    run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
> -                },
> -            );
> -            print "successfully removed volume $name ($vg/del-$name)\n";
> +            my $err = undef;
> +            eval { $secure_delete_cmd->($lvmpath); };
> +            $err = $@ if $@;
> +
> +            if (!$err) {

Style nit: you could switch the branches and just write
eval { ... };
if (my $err = $@) {

> +                $class->cluster_lock_storage(
> +                    $storeid,
> +                    $scfg->{shared},
> +                    undef,
> +                    sub {
> +                        my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
> +                        run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
> +                    },
> +                );
> +                print "successfully removed volume $name ($vg/del-$name)\n";
> +            } else {
> +                log_warn("$vg/del-$name is not being removed: $err");
> +            }
>          }
>      };
>  





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes
  2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
  2026-06-30 10:55   ` Michael Köppl
@ 2026-06-30 12:14   ` Fiona Ebner
  2026-06-30 12:20     ` Fiona Ebner
  2026-06-30 12:37     ` Fiona Ebner
  1 sibling, 2 replies; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 12:14 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 16.06.26 um 12:13 PM schrieb Lukas Sichert:
> On LVM storages backed by thin-provisioned SAN LUNs, removing an LV does
> not release the allocated space on the backing storage. Using LVM's
> `issue_discards` can avoid that, but makes `lvremove` issue the discards
> while holding the cluster-wide storage lock, which can hit the lock
> timeout for large volumes.
> 
> Add an `on-volume-remove` property with an initial `discard` action. The
> cleanup worker issues the discard for the renamed LV before the final
> remove, so the long-running operation happens outside the storage lock.
> When combined with `saferemove`, the worker zeroes and discards the LV
> range by range, avoiding allocation of the whole LV with zeroes on
> thin-provisioned backing storage.

If a storage has
on-volume-remove discard=1
saferemove 0
then zero-out is still done. I would expect only discard to be done then.

> 
> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
> Link: bugzilla.proxmox.com/show_bug.cgi?id=7339
> ---
>  src/PVE/Storage/LVMPlugin.pm | 95 ++++++++++++++++++++++++++++++++----
>  1 file changed, 85 insertions(+), 10 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index f0a7a80..ee07543 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -13,6 +13,7 @@ use PVE::Tools qw(run_command file_read_firstline trim);
>  
>  use PVE::Storage::Common;
>  use PVE::Storage::Plugin;
> +use PVE::SafeSyslog;

Same nit as in patch 1

>  use PVE::RESTEnvironment qw(log_warn);
>  
>  use base qw(PVE::Storage::Plugin);
> @@ -20,6 +21,7 @@ use base qw(PVE::Storage::Plugin);
>  # lvm helper functions
>  
>  use constant {
> +    BLKDISCARD => 0x1277,
>      BLKZEROOUT => 0x127f,
>  };
>  
> @@ -296,6 +298,12 @@ my sub free_lvm_volumes_locked {
>  
>      my $vg = $scfg->{vgname};
>  
> +    my $on_remove_opts;
> +    if ($scfg->{'on-volume-remove'}) {
> +        $on_remove_opts =
> +            PVE::JSONSchema::parse_property_string('on-volume-remove', $scfg->{'on-volume-remove'});
> +    }
> +
>      my $secure_delete_cmd = sub {
>          my ($lvmpath) = @_;
>  
> @@ -373,6 +381,14 @@ my sub free_lvm_volumes_locked {
>                          die "short syswrite: wrote $written of $stepsize bytes\n";
>                      }
>                  }
> +                if ($on_remove_opts->{discard}) {
> +                    eval {
> +                        blockdev_ioctl_range($fh, 'BLKDISCARD', BLKDISCARD, $offset, $stepsize);
> +                    };
> +                    if ($@) {
> +                        log_warn("blkdiscard failed: $@");

Similar to the last patch, I would log the offset and length for
completeness. I also wonder if we should

> +                    }
> +                }
>                  $written_total += $stepsize;
>  
>                  my $curr_time = time();




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes
  2026-06-30 12:14   ` Fiona Ebner
@ 2026-06-30 12:20     ` Fiona Ebner
  2026-06-30 12:37     ` Fiona Ebner
  1 sibling, 0 replies; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 12:20 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 30.06.26 um 2:15 PM schrieb Fiona Ebner:
> Am 16.06.26 um 12:13 PM schrieb Lukas Sichert:
>> @@ -373,6 +381,14 @@ my sub free_lvm_volumes_locked {
>>                          die "short syswrite: wrote $written of $stepsize bytes\n";
>>                      }
>>                  }
>> +                if ($on_remove_opts->{discard}) {
>> +                    eval {
>> +                        blockdev_ioctl_range($fh, 'BLKDISCARD', BLKDISCARD, $offset, $stepsize);
>> +                    };
>> +                    if ($@) {
>> +                        log_warn("blkdiscard failed: $@");
> 
> Similar to the last patch, I would log the offset and length for
> completeness. I also wonder if we should

Oh, and I wonder if we should really log it for every failure
independently. Maybe just the first one and then count and log how many
operations failed in total? Depending on the scenario, it might be a
huge number of messages.

Together with checking for actual support first like Michael suggested.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes
  2026-06-30 12:14   ` Fiona Ebner
  2026-06-30 12:20     ` Fiona Ebner
@ 2026-06-30 12:37     ` Fiona Ebner
  1 sibling, 0 replies; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 12:37 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 30.06.26 um 2:15 PM schrieb Fiona Ebner:
> Am 16.06.26 um 12:13 PM schrieb Lukas Sichert:
>> On LVM storages backed by thin-provisioned SAN LUNs, removing an LV does
>> not release the allocated space on the backing storage. Using LVM's
>> `issue_discards` can avoid that, but makes `lvremove` issue the discards
>> while holding the cluster-wide storage lock, which can hit the lock
>> timeout for large volumes.
>>
>> Add an `on-volume-remove` property with an initial `discard` action. The
>> cleanup worker issues the discard for the renamed LV before the final
>> remove, so the long-running operation happens outside the storage lock.
>> When combined with `saferemove`, the worker zeroes and discards the LV
>> range by range, avoiding allocation of the whole LV with zeroes on
>> thin-provisioned backing storage.
> 
> If a storage has
> on-volume-remove discard=1
> saferemove 0
> then zero-out is still done. I would expect only discard to be done then.

My bad. As Lukas pointed out off-list, this is actually the case. It's
just done outside the $secure_delete_cmd helper.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH docs v7 4/4] fix #7339: lvm: document discard option
  2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
  2026-06-30 11:48   ` Michael Köppl
@ 2026-06-30 12:50   ` Fiona Ebner
  1 sibling, 0 replies; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 12:50 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 16.06.26 um 12:13 PM schrieb Lukas Sichert:
>  `saferemove_throughput`::
>  
> -Wipe throughput (`cstream -t` parameter value), up to 10 MiB/s by default.
> +Wipe throughput, unlimited by default.

As mentioned in patch 1, we should keep the default for the syswrite
case. We can mention that if the more efficient zeroing operation is
supported, then the default throughput is unlimited.

And as discussed off-list: currently we do not look at the throughput at
all when BLKZEROOUT is supported. People might be surprised if they have
a (left-over) throughput limit configured which suddenly gets applied
after we roll out this series. But it can also be considered a bug that
it wasn't applied. What we should do is adding a log message that the
limit gets applied so that people notice. And adding a recommendation to
the docs that if BLKZEROOUT is supported, then removing the limit is
recommended (and how to check for support).

>  
>  `snapshot-as-volume-chain`::
>  





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
  2026-06-30 11:49   ` Fiona Ebner
@ 2026-06-30 14:16     ` Lukas Sichert
  2026-06-30 14:28       ` Fiona Ebner
  0 siblings, 1 reply; 15+ messages in thread
From: Lukas Sichert @ 2026-06-30 14:16 UTC (permalink / raw)
  To: Fiona Ebner, pve-devel

On 2026-06-30 13:49, Fiona Ebner <f.ebner@proxmox.com> wrote:

> Am 16.06.26 um 12:12 PM schrieb Lukas Sichert:
>> 'saferemove' currently uses different full-volume zero-out paths:
>> `blkdiscard --zeroout` for devices with write-zeroes support and
>> `cstream` otherwise. This makes progress and throttling inconsistent and
>> does not allow future discard cleanup to be interleaved with zeroing. On
>> thin-provisioned backing storage, zeroing the whole LV before discarding
>> it can also force unnecessary allocation.
>> 
>> Move zeroing into an explicit range loop. Use BLKZEROOUT when supported,
>> capped to the device limit, and fall back to manual zero writes
>> otherwise. Keep progress and throttling in the shared loop, and keep the
>> renamed LV if zeroing fails so cleanup can be retried.
>> 
>> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
>> ---
>>  src/PVE/Storage/LVMPlugin.pm | 168 ++++++++++++++++++++++++-----------
>>  1 file changed, 117 insertions(+), 51 deletions(-)
>> 
>> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
>> index 443d292..f0a7a80 100644
>> --- a/src/PVE/Storage/LVMPlugin.pm
>> +++ b/src/PVE/Storage/LVMPlugin.pm
>> @@ -13,11 +13,16 @@ use PVE::Tools qw(run_command file_read_firstline trim);
>>  
>>  use PVE::Storage::Common;
>>  use PVE::Storage::Plugin;
>> +use PVE::RESTEnvironment qw(log_warn);
>
> Style nit: please adhere to the style guide for module ordering:
> https://pve.proxmox.com/wiki/Perl_Style_Guide#Module_Dependencies
>
>>  
>>  use base qw(PVE::Storage::Plugin);
>>  
>>  # lvm helper functions
>>  
>> +use constant {
>> +    BLKZEROOUT => 0x127f,
>> +};
>> +
>>  my $ignore_no_medium_warnings = sub {
>>      my $line = shift;
>>      # ignore those, most of the time they're from (virtual) IPMI/iKVM devices
>> @@ -279,6 +284,13 @@ sub lvm_list_volumes {
>>      return $lvs;
>>  }
>>  
>> +my sub blockdev_ioctl_range {
>> +    my ($fh, $ioctl_name, $ioctl, $offset, $length) = @_;
>> +
>> +    my $range = pack('QQ', $offset, $length);
>> +    ioctl($fh, $ioctl, $range) or die "$ioctl_name failed - $!\n";
>> +}
>> +
>>  my sub free_lvm_volumes_locked {
>>      my ($class, $scfg, $storeid, $volnames) = @_;
>>  
>> @@ -304,49 +316,97 @@ my sub free_lvm_volumes_locked {
>>              file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
>>          ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint
>>  
>> +        my $size = file_read_firstline("$sysdir/size") // 0;
>> +        ($size) = $size =~ m/^(\d+)$/; # untaint
>
> Shouldn't we rather die when the size cannot be read or parsed?
>
>> +        $size *= 512; # sysfs size is in 512-byte sectors
>> +
>> +        my $zeroout_variant = 'blkzeroout';
>> +
>> +        # If the storage does not support write_zeroes fall back to writing zeroes manually using
>> +        # syswrite. Otherwise if the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
>
> Style nit: line too long
>
>> +        # the maximum supported by the storage.
>>          if ($write_zeroes_max_bytes == 0) {
>> -            # If the storage does not support 'write zeroes', we fallback to cstream.
>> -            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
>> -            my $throughput = '-10485760';
>> -            if ($scfg->{saferemove_throughput}) {
>> -                $throughput = $scfg->{saferemove_throughput};
>> -            }
>> +            print "falling back to syswrite to zero-out '$lvmpath'\n";
>
> Nit: maybe also mention the why, e.g. "WRITE_ZEROES operation not
> supported, falling back ..."
>
>> +            $stepsize = 1024 * 1024; # 1 MiB
>> +            $zeroout_variant = 'syswrite';
>> +        } elsif ($stepsize > $write_zeroes_max_bytes) {
>> +            print "reduce stepsize to the maximum supported by the storage:"
>> +                . " $write_zeroes_max_bytes bytes\n";
>> +            $stepsize = $write_zeroes_max_bytes;
>> +        }
>> +        my $zeroes = "\0" x $stepsize;
>> +        my $throughput = -1;
>
> The old default was 10 MiB/s, we should continue using it for the
> syswrite case. For the BLKZEROOUT case, we can use unlimited as a default.
>
>> +        if ($scfg->{saferemove_throughput}) {
>> +            # use abs as legacy cstream accepted negative values
>> +            $throughput = abs($scfg->{saferemove_throughput});
>> +        }
>>  
>> -            my $cmd = [
>> -                '/usr/bin/cstream',
>> -                '-i',
>> -                '/dev/zero',
>> -                '-o',
>> -                $lvmpath,
>> -                '-T',
>> -                '10',
>> -                '-v',
>> -                '1',
>> -                '-b',
>> -                '1048576',
>> -                '-t',
>> -                "$throughput",
>> -            ];
>> -            eval {
>> -                run_command(
>> -                    $cmd,
>> -                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
>> -                );
>> -            };
>> -            warn $@ if $@;
>> -        } else {
>> -            # If the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
>> -            # the maximum supported by the storage.
>> -            if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) {
>> -                print "reduce stepsize to the maximum supported by the storage:"
>> -                    . " $write_zeroes_max_bytes bytes\n";
>> +        open(my $fh, '+<', $lvmpath) or die "can't open '$lvmpath' - $!\n";
>>  
>> -                $stepsize = $write_zeroes_max_bytes;
>> -            }
>> +        #eval block, so that filehandle is closed even if something fails below
>> +        eval {
>> +            my $start = time();
>> +            my $written_total = 0;
>> +            my $lastprint = -1;
>> +            for (my $offset = 0; $offset < $size; $offset += $stepsize) {
>> +
>> +                if ($offset + $stepsize > $size) {
>> +                    $stepsize = $size - $offset;
>> +                }
>> +
>> +                if ($zeroout_variant eq 'blkzeroout') {
>> +                    eval {
>> +                        blockdev_ioctl_range($fh, 'BLKZEROOUT', BLKZEROOUT, $offset, $stepsize);
>> +                    };
>> +                    if ($@) {
>> +                        die "blkzeroout failed: $@";
>
> I would also log offset and length of the request, so that a failure can
> be analyzed better in practice.
>
>> +                    }
>> +                } elsif ($zeroout_variant eq 'syswrite') {
>> +                    # if the $offset is 0, sysseek can return 0, therefore use // to only
>> +                    # throw an error, if it returns undef
>> +                    sysseek($fh, $offset, 0) // die "sysseek failed: $!\n";
>
> You can import the constant for the WHENCE from Fcntl like is done in
> PVE::File, then it's more readable than 0.
>
>>  
>> -            my $cmd = ['blkdiscard', $lvmpath, '-v', '--zeroout', '--step', "${stepsize}"];
>> -            eval { run_command($cmd); };
>> -            warn $@ if $@;
>> +                    my $written = syswrite($fh, $zeroes, $stepsize)
>> +                        // die "syswrite failed: $!\n";
>> +
>> +                    if ($written != $stepsize) {
>> +                        die "short syswrite: wrote $written of $stepsize bytes\n";
>
> We should re-attempt to write the rest after a short write. I'd only die
> if there was really no progress at all, i.e. $written == 0.
>
>> +                    }
>> +                }
>> +                $written_total += $stepsize;
>> +
>> +                my $curr_time = time();
>> +                if (($curr_time - $lastprint) >= 3) {
>> +                    my $percent_finished = 100 * $written_total / $size;
>> +                    my $written_gb = $written_total / (1024**3);
>> +                    my $size_gb = $size / (1024**3);
>
> You can use render_bytes() and render_duration() from PVE::Format
>
>> +                    my $curr_seconds = $curr_time - $start;
>> +                    printf(
>> +                        "zeroed out %.2f GiB of %.2f GiB (%.2f%%) using %s in %d seconds\n",
>> +                        $written_gb,
>> +                        $size_gb,
>> +                        $percent_finished,
>> +                        $zeroout_variant,
>> +                        $curr_seconds,
>> +                    );
>> +                    $lastprint = $curr_time;
>> +                }
>> +
>> +                if ($throughput > 0) {
>> +                    my $expected_elapsed = $written_total / $throughput;
>> +                    my $actual_elapsed = $curr_time - $start;
>> +                    my $delay = $expected_elapsed - $actual_elapsed;
>> +                    if ($delay > 0) {
>> +                        sleep($delay);
>> +                    }
>> +                }
>> +            }
>> +        };
>> +        # close filehandle before throwing an error
>> +        my $err = $@;
>> +        close($fh);
>> +        if ($err) {
>> +            die "zeroing out failed: $err";
>>          }
>>      };
>>  
>> @@ -368,18 +428,24 @@ my sub free_lvm_volumes_locked {
>>                  errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
>>              );
>>  
>> -            $secure_delete_cmd->($lvmpath);
>> -
>> -            $class->cluster_lock_storage(
>> -                $storeid,
>> -                $scfg->{shared},
>> -                undef,
>> -                sub {
>> -                    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
>> -                    run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
>> -                },
>> -            );
>> -            print "successfully removed volume $name ($vg/del-$name)\n";
>> +            my $err = undef;
>> +            eval { $secure_delete_cmd->($lvmpath); };
>> +            $err = $@ if $@;
>> +
>> +            if (!$err) {
>
> Style nit: you could switch the branches and just write
> eval { ... };
> if (my $err = $@) {

Yes, for this patch alone the inverted form would be nicer. I kept the
separate '$err' variable intentionally because patch 2/4 adds a discard-only
path that reuses the same error handling. That avoids reshuffling this
block again in the follow-up patch.

>
>> +                $class->cluster_lock_storage(
>> +                    $storeid,
>> +                    $scfg->{shared},
>> +                    undef,
>> +                    sub {
>> +                        my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
>> +                        run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
>> +                    },
>> +                );
>> +                print "successfully removed volume $name ($vg/del-$name)\n";
>> +            } else {
>> +                log_warn("$vg/del-$name is not being removed: $err");
>> +            }
>>          }
>>      };
>>  





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
  2026-06-30 14:16     ` Lukas Sichert
@ 2026-06-30 14:28       ` Fiona Ebner
  0 siblings, 0 replies; 15+ messages in thread
From: Fiona Ebner @ 2026-06-30 14:28 UTC (permalink / raw)
  To: Lukas Sichert, pve-devel

Am 30.06.26 um 4:16 PM schrieb Lukas Sichert:
> On 2026-06-30 13:49, Fiona Ebner <f.ebner@proxmox.com> wrote:
>> Am 16.06.26 um 12:12 PM schrieb Lukas Sichert:
>>> +            my $err = undef;
>>> +            eval { $secure_delete_cmd->($lvmpath); };
>>> +            $err = $@ if $@;
>>> +
>>> +            if (!$err) {
>>
>> Style nit: you could switch the branches and just write
>> eval { ... };
>> if (my $err = $@) {
> 
> Yes, for this patch alone the inverted form would be nicer. I kept the
> separate '$err' variable intentionally because patch 2/4 adds a discard-only
> path that reuses the same error handling. That avoids reshuffling this
> block again in the follow-up patch.

Ah, makes sense :)




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-30 14:28 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
2026-06-30 11:49   ` Fiona Ebner
2026-06-30 14:16     ` Lukas Sichert
2026-06-30 14:28       ` Fiona Ebner
2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
2026-06-30 10:55   ` Michael Köppl
2026-06-30 12:14   ` Fiona Ebner
2026-06-30 12:20     ` Fiona Ebner
2026-06-30 12:37     ` Fiona Ebner
2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
2026-06-30 11:38   ` Michael Köppl
2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
2026-06-30 11:48   ` Michael Köppl
2026-06-30 12:50   ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal