public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Lukas Sichert" <l.sichert@proxmox.com>
To: "Fiona Ebner" <f.ebner@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range
Date: Tue, 30 Jun 2026 16:16:56 +0200	[thread overview]
Message-ID: <DJMG539MX7J8.25O7YMZEYEE3M@proxmox.com> (raw)
In-Reply-To: <9d099f5a-3121-4617-be7d-b3aa5e2c6c81@proxmox.com>

On 2026-06-30 13:49, Fiona Ebner <f.ebner@proxmox.com> wrote:

> Am 16.06.26 um 12:12 PM schrieb Lukas Sichert:
>> 'saferemove' currently uses different full-volume zero-out paths:
>> `blkdiscard --zeroout` for devices with write-zeroes support and
>> `cstream` otherwise. This makes progress and throttling inconsistent and
>> does not allow future discard cleanup to be interleaved with zeroing. On
>> thin-provisioned backing storage, zeroing the whole LV before discarding
>> it can also force unnecessary allocation.
>> 
>> Move zeroing into an explicit range loop. Use BLKZEROOUT when supported,
>> capped to the device limit, and fall back to manual zero writes
>> otherwise. Keep progress and throttling in the shared loop, and keep the
>> renamed LV if zeroing fails so cleanup can be retried.
>> 
>> Signed-off-by: Lukas Sichert <l.sichert@proxmox.com>
>> ---
>>  src/PVE/Storage/LVMPlugin.pm | 168 ++++++++++++++++++++++++-----------
>>  1 file changed, 117 insertions(+), 51 deletions(-)
>> 
>> diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm
>> index 443d292..f0a7a80 100644
>> --- a/src/PVE/Storage/LVMPlugin.pm
>> +++ b/src/PVE/Storage/LVMPlugin.pm
>> @@ -13,11 +13,16 @@ use PVE::Tools qw(run_command file_read_firstline trim);
>>  
>>  use PVE::Storage::Common;
>>  use PVE::Storage::Plugin;
>> +use PVE::RESTEnvironment qw(log_warn);
>
> Style nit: please adhere to the style guide for module ordering:
> https://pve.proxmox.com/wiki/Perl_Style_Guide#Module_Dependencies
>
>>  
>>  use base qw(PVE::Storage::Plugin);
>>  
>>  # lvm helper functions
>>  
>> +use constant {
>> +    BLKZEROOUT => 0x127f,
>> +};
>> +
>>  my $ignore_no_medium_warnings = sub {
>>      my $line = shift;
>>      # ignore those, most of the time they're from (virtual) IPMI/iKVM devices
>> @@ -279,6 +284,13 @@ sub lvm_list_volumes {
>>      return $lvs;
>>  }
>>  
>> +my sub blockdev_ioctl_range {
>> +    my ($fh, $ioctl_name, $ioctl, $offset, $length) = @_;
>> +
>> +    my $range = pack('QQ', $offset, $length);
>> +    ioctl($fh, $ioctl, $range) or die "$ioctl_name failed - $!\n";
>> +}
>> +
>>  my sub free_lvm_volumes_locked {
>>      my ($class, $scfg, $storeid, $volnames) = @_;
>>  
>> @@ -304,49 +316,97 @@ my sub free_lvm_volumes_locked {
>>              file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0;
>>          ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint
>>  
>> +        my $size = file_read_firstline("$sysdir/size") // 0;
>> +        ($size) = $size =~ m/^(\d+)$/; # untaint
>
> Shouldn't we rather die when the size cannot be read or parsed?
>
>> +        $size *= 512; # sysfs size is in 512-byte sectors
>> +
>> +        my $zeroout_variant = 'blkzeroout';
>> +
>> +        # If the storage does not support write_zeroes fall back to writing zeroes manually using
>> +        # syswrite. Otherwise if the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
>
> Style nit: line too long
>
>> +        # the maximum supported by the storage.
>>          if ($write_zeroes_max_bytes == 0) {
>> -            # If the storage does not support 'write zeroes', we fallback to cstream.
>> -            # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput
>> -            my $throughput = '-10485760';
>> -            if ($scfg->{saferemove_throughput}) {
>> -                $throughput = $scfg->{saferemove_throughput};
>> -            }
>> +            print "falling back to syswrite to zero-out '$lvmpath'\n";
>
> Nit: maybe also mention the why, e.g. "WRITE_ZEROES operation not
> supported, falling back ..."
>
>> +            $stepsize = 1024 * 1024; # 1 MiB
>> +            $zeroout_variant = 'syswrite';
>> +        } elsif ($stepsize > $write_zeroes_max_bytes) {
>> +            print "reduce stepsize to the maximum supported by the storage:"
>> +                . " $write_zeroes_max_bytes bytes\n";
>> +            $stepsize = $write_zeroes_max_bytes;
>> +        }
>> +        my $zeroes = "\0" x $stepsize;
>> +        my $throughput = -1;
>
> The old default was 10 MiB/s, we should continue using it for the
> syswrite case. For the BLKZEROOUT case, we can use unlimited as a default.
>
>> +        if ($scfg->{saferemove_throughput}) {
>> +            # use abs as legacy cstream accepted negative values
>> +            $throughput = abs($scfg->{saferemove_throughput});
>> +        }
>>  
>> -            my $cmd = [
>> -                '/usr/bin/cstream',
>> -                '-i',
>> -                '/dev/zero',
>> -                '-o',
>> -                $lvmpath,
>> -                '-T',
>> -                '10',
>> -                '-v',
>> -                '1',
>> -                '-b',
>> -                '1048576',
>> -                '-t',
>> -                "$throughput",
>> -            ];
>> -            eval {
>> -                run_command(
>> -                    $cmd,
>> -                    errmsg => "zero out finished (note: 'No space left on device' is ok here)",
>> -                );
>> -            };
>> -            warn $@ if $@;
>> -        } else {
>> -            # If the storage supports write_zeroes but stepsize is too big, reduce the stepsize to
>> -            # the maximum supported by the storage.
>> -            if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) {
>> -                print "reduce stepsize to the maximum supported by the storage:"
>> -                    . " $write_zeroes_max_bytes bytes\n";
>> +        open(my $fh, '+<', $lvmpath) or die "can't open '$lvmpath' - $!\n";
>>  
>> -                $stepsize = $write_zeroes_max_bytes;
>> -            }
>> +        #eval block, so that filehandle is closed even if something fails below
>> +        eval {
>> +            my $start = time();
>> +            my $written_total = 0;
>> +            my $lastprint = -1;
>> +            for (my $offset = 0; $offset < $size; $offset += $stepsize) {
>> +
>> +                if ($offset + $stepsize > $size) {
>> +                    $stepsize = $size - $offset;
>> +                }
>> +
>> +                if ($zeroout_variant eq 'blkzeroout') {
>> +                    eval {
>> +                        blockdev_ioctl_range($fh, 'BLKZEROOUT', BLKZEROOUT, $offset, $stepsize);
>> +                    };
>> +                    if ($@) {
>> +                        die "blkzeroout failed: $@";
>
> I would also log offset and length of the request, so that a failure can
> be analyzed better in practice.
>
>> +                    }
>> +                } elsif ($zeroout_variant eq 'syswrite') {
>> +                    # if the $offset is 0, sysseek can return 0, therefore use // to only
>> +                    # throw an error, if it returns undef
>> +                    sysseek($fh, $offset, 0) // die "sysseek failed: $!\n";
>
> You can import the constant for the WHENCE from Fcntl like is done in
> PVE::File, then it's more readable than 0.
>
>>  
>> -            my $cmd = ['blkdiscard', $lvmpath, '-v', '--zeroout', '--step', "${stepsize}"];
>> -            eval { run_command($cmd); };
>> -            warn $@ if $@;
>> +                    my $written = syswrite($fh, $zeroes, $stepsize)
>> +                        // die "syswrite failed: $!\n";
>> +
>> +                    if ($written != $stepsize) {
>> +                        die "short syswrite: wrote $written of $stepsize bytes\n";
>
> We should re-attempt to write the rest after a short write. I'd only die
> if there was really no progress at all, i.e. $written == 0.
>
>> +                    }
>> +                }
>> +                $written_total += $stepsize;
>> +
>> +                my $curr_time = time();
>> +                if (($curr_time - $lastprint) >= 3) {
>> +                    my $percent_finished = 100 * $written_total / $size;
>> +                    my $written_gb = $written_total / (1024**3);
>> +                    my $size_gb = $size / (1024**3);
>
> You can use render_bytes() and render_duration() from PVE::Format
>
>> +                    my $curr_seconds = $curr_time - $start;
>> +                    printf(
>> +                        "zeroed out %.2f GiB of %.2f GiB (%.2f%%) using %s in %d seconds\n",
>> +                        $written_gb,
>> +                        $size_gb,
>> +                        $percent_finished,
>> +                        $zeroout_variant,
>> +                        $curr_seconds,
>> +                    );
>> +                    $lastprint = $curr_time;
>> +                }
>> +
>> +                if ($throughput > 0) {
>> +                    my $expected_elapsed = $written_total / $throughput;
>> +                    my $actual_elapsed = $curr_time - $start;
>> +                    my $delay = $expected_elapsed - $actual_elapsed;
>> +                    if ($delay > 0) {
>> +                        sleep($delay);
>> +                    }
>> +                }
>> +            }
>> +        };
>> +        # close filehandle before throwing an error
>> +        my $err = $@;
>> +        close($fh);
>> +        if ($err) {
>> +            die "zeroing out failed: $err";
>>          }
>>      };
>>  
>> @@ -368,18 +428,24 @@ my sub free_lvm_volumes_locked {
>>                  errmsg => "can't refresh LV '$lvmpath' to zero-out its data",
>>              );
>>  
>> -            $secure_delete_cmd->($lvmpath);
>> -
>> -            $class->cluster_lock_storage(
>> -                $storeid,
>> -                $scfg->{shared},
>> -                undef,
>> -                sub {
>> -                    my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
>> -                    run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
>> -                },
>> -            );
>> -            print "successfully removed volume $name ($vg/del-$name)\n";
>> +            my $err = undef;
>> +            eval { $secure_delete_cmd->($lvmpath); };
>> +            $err = $@ if $@;
>> +
>> +            if (!$err) {
>
> Style nit: you could switch the branches and just write
> eval { ... };
> if (my $err = $@) {

Yes, for this patch alone the inverted form would be nicer. I kept the
separate '$err' variable intentionally because patch 2/4 adds a discard-only
path that reuses the same error handling. That avoids reshuffling this
block again in the follow-up patch.

>
>> +                $class->cluster_lock_storage(
>> +                    $storeid,
>> +                    $scfg->{shared},
>> +                    undef,
>> +                    sub {
>> +                        my $cmd = ['/sbin/lvremove', '-f', "$vg/del-$name"];
>> +                        run_command($cmd, errmsg => "lvremove '$vg/del-$name' error");
>> +                    },
>> +                );
>> +                print "successfully removed volume $name ($vg/del-$name)\n";
>> +            } else {
>> +                log_warn("$vg/del-$name is not being removed: $err");
>> +            }
>>          }
>>      };
>>  





  reply	other threads:[~2026-06-30 14:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 10:13 [PATCH docs/manager/storage v7 0/4] fix #7339: lvmthick: add option to free storage for deleted VMs Lukas Sichert
2026-06-16 10:13 ` [PATCH storage v7 1/4] lvm: saferemove: zero out volumes range by range Lukas Sichert
2026-06-30 11:49   ` Fiona Ebner
2026-06-30 14:16     ` Lukas Sichert [this message]
2026-06-30 14:28       ` Fiona Ebner
2026-06-16 10:13 ` [PATCH storage v7 2/4] fix #7339: lvm: add discard action for removed volumes Lukas Sichert
2026-06-30 10:55   ` Michael Köppl
2026-06-30 12:14   ` Fiona Ebner
2026-06-30 12:20     ` Fiona Ebner
2026-06-30 12:37     ` Fiona Ebner
2026-06-16 10:13 ` [PATCH manager v7 3/4] fix #7339: lvmthick: ui: add UI option to free storage Lukas Sichert
2026-06-30 11:38   ` Michael Köppl
2026-06-16 10:13 ` [PATCH docs v7 4/4] fix #7339: lvm: document discard option Lukas Sichert
2026-06-30 11:48   ` Michael Köppl
2026-06-30 12:50   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DJMG539MX7J8.25O7YMZEYEE3M@proxmox.com \
    --to=l.sichert@proxmox.com \
    --cc=f.ebner@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal