all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Lukas Sichert <l.sichert@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
Date: Tue, 30 Jun 2026 13:49:22 +0200	[thread overview]
Message-ID: <9d099f5a-3121-4617-be7d-b3aa5e2c6c81@proxmox.com> (raw)
In-Reply-To: <20260616101323.24981-2-l.sichert@proxmox.com>

Am 16.06.26 um 12:12 PM schrieb Lukas Sichert:
> 'saferemove' currently uses different full-volume zero-out paths:
> `blkdiscard --zeroout` for devices with write-zeroes support and
> `cstream` otherwise. This makes progress and throttling inconsistent and
> does not allow future discard cleanup to be interleaved with zeroing. On
> thin-provisioned backing storage, zeroing the whole LV before discarding
> it can also force unnecessary allocation.
> 
> Move zeroing into an explicit range loop. Use BLKZEROOUT when supported,
> capped to the device limit, and fall back to manual zero writes
> otherwise. Keep progress and throttling in the shared loop, and keep the
> renamed LV if zeroing fails so cleanup can be retried.
> 
> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
> ---
>  src/PVE/Storage/LVMPlugin.pm | 168 ++++++++++++++++++++++++-----------
>  1 file changed, 117 insertions(+), 51 deletions(-)
> 
> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
> index 443d292..f0a7a80 100644
> --- a/src/PVE/Storage/LVMPlugin.pm
> +++ b/src/PVE/Storage/LVMPlugin.pm
> @@ -13,11 +13,16 @@ use PVE::Tools qw(run_command file_read_firstline trim);
>  
>  use PVE::Storage::Common;
>  use PVE::Storage::Plugin;
> +use PVE::RESTEnvironment qw(log_warn);

Style nit: please adhere to the style guide for module ordering:
https://pve.proxmox.com/wiki/Perl_Style_Guide#Module_Dependencies

>  
>  use base qw(PVE::Storage::Plugin);
>  
>  # lvm helper functions
>  
> +use constant {
> +    BLKZEROOUT => 0x127f,
> +};
> +
>  my $ignore_no_medium_warnings = sub {
>      my $line = shift;
>      # ignore those, most of the time they're from (virtual) IPMI/iKVM devices
> @@ -279,6 +284,13 @@ sub lvm_list_volumes {
>      return $lvs;
>  }
>  
> +my sub blockdev_ioctl_range {
> +    my ($fh, $ioctl_name, $ioctl, $offset, $length) = @_;
> +
> +    my $range = pack('QQ', $offset, $length);
> +    ioctl($fh, $ioctl, $range) or die "$ioctl_name failed - $!\n";
> +}
> +
>  my sub free_lvm_volumes_locked {
>      my ($class, $scfg, $storeid, $volnames) = @_;
>  
> @@ -304,49 +316,97 @@ my sub free_lvm_volumes_locked {
>              file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
>          ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint
>  
> +        my $size = file_read_firstline("$sysdir/size") // 0;
> +        ($size) = $size =~ m/^(\d+)$/; # untaint

Shouldn't we rather die when the size cannot be read or parsed?

> +        $size *= 512; # sysfs size is in 512-byte sectors
> +
> +        my $zeroout_variant = 'blkzeroout';
> +
> +        # If the storage does not support write_zeroes fall back to writing zeroes manually using
> +        # syswrite. Otherwise if the storage supports write_zeroes but stepsize is too big, reduce the stepsize to

Style nit: line too long

> +        # the maximum supported by the storage.
>          if ($write_zeroes_max_bytes == 0) {
> -            # If the storage does not support 'write zeroes', we fallback to cstream.
> -            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
> -            my $throughput = '-10485760';
> -            if ($scfg->{saferemove_throughput}) {
> -                $throughput = $scfg->{saferemove_throughput};
> -            }
> +            print "falling back to syswrite to zero-out '$lvmpath'\n";

Nit: maybe also mention the why, e.g. "WRITE_ZEROES operation not
supported, falling back ..."

> +            $stepsize = 1024 * 1024; # 1 MiB
> +            $zeroout_variant = 'syswrite';
> +        } elsif ($stepsize > $write_zeroes_max_bytes) {
> +            print "reduce stepsize to the maximum supported by the storage:"
> +                . " $write_zeroes_max_bytes bytes\n";
> +            $stepsize = $write_zeroes_max_bytes;
> +        }
> +        my $zeroes = "\0" x $stepsize;
> +        my $throughput = -1;

The old default was 10 MiB/s, we should continue using it for the
syswrite case. For the BLKZEROOUT case, we can use unlimited as a default.

> +        if ($scfg->{saferemove_throughput}) {
> +            # use abs as legacy cstream accepted negative values
> +            $throughput = abs($scfg->{saferemove_throughput});
> +        }
>  
> -            my $cmd = [
> -                '/usr/bin/cstream',
> -                '-i',
> -                '/dev/zero',
> -                '-o',
> -                $lvmpath,
> -                '-T',
> -                '10',
> -                '-v',
> -                '1',
> -                '-b',
> -                '1048576',
> -                '-t',
> -                "$throughput",
> -            ];
> -            eval {
> -                run_command(
> -                    $cmd,
> -                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
> -                );
> -            };
> -            warn $@ if $@;
> -        } else {
> -            # If the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
> -            # the maximum supported by the storage.
> -            if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) {
> -                print "reduce stepsize to the maximum supported by the storage:"
> -                    . " $write_zeroes_max_bytes bytes\n";
> +        open(my $fh, '+<', $lvmpath) or die "can't open '$lvmpath' - $!\n";
>  
> -                $stepsize = $write_zeroes_max_bytes;
> -            }
> +        #eval block, so that filehandle is closed even if something fails below
> +        eval {
> +            my $start = time();
> +            my $written_total = 0;
> +            my $lastprint = -1;
> +            for (my $offset = 0; $offset < $size; $offset += $stepsize) {
> +
> +                if ($offset + $stepsize > $size) {
> +                    $stepsize = $size - $offset;
> +                }
> +
> +                if ($zeroout_variant eq 'blkzeroout') {
> +                    eval {
> +                        blockdev_ioctl_range($fh, 'BLKZEROOUT', BLKZEROOUT, $offset, $stepsize);
> +                    };
> +                    if ($@) {
> +                        die "blkzeroout failed: $@";

I would also log offset and length of the request, so that a failure can
be analyzed better in practice.

> +                    }
> +                } elsif ($zeroout_variant eq 'syswrite') {
> +                    # if the $offset is 0, sysseek can return 0, therefore use // to only
> +                    # throw an error, if it returns undef
> +                    sysseek($fh, $offset, 0) // die "sysseek failed: $!\n";

You can import the constant for the WHENCE from Fcntl like is done in
PVE::File, then it's more readable than 0.

>  
> -            my $cmd = ['blkdiscard', $lvmpath, '-v', '--zeroout', '--step', "${stepsize}"];
> -            eval { run_command($cmd); };
> -            warn $@ if $@;
> +                    my $written = syswrite($fh, $zeroes, $stepsize)
> +                        // die "syswrite failed: $!\n";
> +
> +                    if ($written != $stepsize) {
> +                        die "short syswrite: wrote $written of $stepsize bytes\n";

We should re-attempt to write the rest after a short write. I'd only die
if there was really no progress at all, i.e. $written == 0.

> +                    }
> +                }
> +                $written_total += $stepsize;
> +
> +                my $curr_time = time();
> +                if (($curr_time - $lastprint) >= 3) {
> +                    my $percent_finished = 100 * $written_total / $size;
> +                    my $written_gb = $written_total / (1024**3);
> +                    my $size_gb = $size / (1024**3);

You can use render_bytes() and render_duration() from PVE::Format

> +                    my $curr_seconds = $curr_time - $start;
> +                    printf(
> +                        "zeroed out %.2f GiB of %.2f GiB (%.2f%%) using %s in %d seconds\n",
> +                        $written_gb,
> +                        $size_gb,
> +                        $percent_finished,
> +                        $zeroout_variant,
> +                        $curr_seconds,
> +                    );
> +                    $lastprint = $curr_time;
> +                }
> +
> +                if ($throughput > 0) {
> +                    my $expected_elapsed = $written_total / $throughput;
> +                    my $actual_elapsed = $curr_time - $start;
> +                    my $delay = $expected_elapsed - $actual_elapsed;
> +                    if ($delay > 0) {
> +                        sleep($delay);
> +                    }
> +                }
> +            }
> +        };
> +        # close filehandle before throwing an error
> +        my $err = $@;
> +        close($fh);
> +        if ($err) {
> +            die "zeroing out failed: $err";
>          }
>      };
>  
> @@ -368,18 +428,24 @@ my sub free_lvm_volumes_locked {
>                  errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
>              );
>  
> -            $secure_delete_cmd->($lvmpath);
> -
> -            $class->cluster_lock_storage(
> -                $storeid,
> -                $scfg->{shared},
> -                undef,
> -                sub {
> -                    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
> -                    run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
> -                },
> -            );
> -            print "successfully removed volume $name ($vg/del-$name)\n";
> +            my $err = undef;
> +            eval { $secure_delete_cmd->($lvmpath); };
> +            $err = $@ if $@;
> +
> +            if (!$err) {

Style nit: you could switch the branches and just write
eval { ... };
if (my $err = $@) {

> +                $class->cluster_lock_storage(
> +                    $storeid,
> +                    $scfg->{shared},
> +                    undef,
> +                    sub {
> +                        my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
> +                        run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
> +                    },
> +                );
> +                print "successfully removed volume $name ($vg/del-$name)\n";
> +            } else {
> +                log_warn("$vg/del-$name is not being removed: $err");
> +            }
>          }
>      };
>  





  reply	other threads:[~2026-06-30 11:50 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
2026-06-30 11:49   ` Fiona Ebner [this message]
2026-06-30 14:16     ` Lukas Sichert
2026-06-30 14:28       ` Fiona Ebner
2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
2026-06-30 10:55   ` Michael Köppl
2026-06-30 12:14   ` Fiona Ebner
2026-06-30 12:20     ` Fiona Ebner
2026-06-30 12:37     ` Fiona Ebner
2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
2026-06-30 11:38   ` Michael Köppl
2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
2026-06-30 11:48   ` Michael Köppl
2026-06-30 12:50   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d099f5a-3121-4617-be7d-b3aa5e2c6c81@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=l.sichert@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal