From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 7DB691FF185 for ; Mon, 18 Aug 2025 14:07:53 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D7A3A36EB2; Mon, 18 Aug 2025 14:09:34 +0200 (CEST) To: pve-devel@lists.proxmox.com Date: Mon, 18 Aug 2025 14:08:54 +0200 MIME-Version: 1.0 Message-ID: List-Id: Proxmox VE development discussion List-Post: From: Alexandre Derumier via pve-devel Precedence: list Cc: Alexandre Derumier X-Mailman-Version: 2.1.29 X-BeenThere: pve-devel@lists.proxmox.com List-Subscribe: , List-Unsubscribe: , List-Archive: Reply-To: Proxmox VE development discussion List-Help: Subject: [pve-devel] [PATCH V3 pve-storage] lvm: use blkdiscard instead cstream to saferemove drive Content-Type: multipart/mixed; boundary="===============0600966502695385794==" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" --===============0600966502695385794== Content-Type: message/rfc822 Content-Disposition: inline Return-Path: X-Original-To: pve-devel@lists.proxmox.com Delivered-To: pve-devel@lists.proxmox.com Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 4E14FD2201 for ; Mon, 18 Aug 2025 14:09:34 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2BD7F36E9F for ; Mon, 18 Aug 2025 14:09:04 +0200 (CEST) Received: from bastiontest.odiso.net (unknown [IPv6:2a0a:1580:2000:6700::14]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 18 Aug 2025 14:09:02 +0200 (CEST) Received: from formationkvm1.odiso.net (unknown [10.11.201.57]) by bastiontest.odiso.net (Postfix) with ESMTP id 2F506860F15; Mon, 18 Aug 2025 14:08:55 +0200 (CEST) Received: by formationkvm1.odiso.net (Postfix, from userid 0) id 1F81B1115FB8; Mon, 18 Aug 2025 14:08:55 +0200 (CEST) From: Alexandre Derumier To: pve-devel@lists.proxmox.com Subject: [PATCH V3 pve-storage] lvm: use blkdiscard instead cstream to saferemove drive Date: Mon, 18 Aug 2025 14:08:54 +0200 Message-ID: <20250818120854.2668135-1-alexandre.derumier@groupe-cyllene.com> X-Mailer: git-send-email 2.47.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.124 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_NONE 0.1 DMARC none policy HEADER_FROM_DIFFERENT_DOMAINS 0.059 From and EnvelopeFrom 2nd level mail domains are different KAM_DMARC_NONE 0.25 DKIM has Failed or SPF has failed on the message and the domain has no DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_LAZY_DOMAIN_SECURITY 1 Sending domain does not have any anti-forgery methods RDNS_NONE 0.793 Delivered to internal network by a host with no rDNS SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_NONE 0.001 SPF: sender does not publish an SPF Record Current cstream implementation is pretty slow as hell, even without throttling. use blkdiscard --zeroout instead, which is lot of magnetude faster Another benefit is that blkdiscard is skipping already zeroed block, so for empty temp images like snapshot, is pretty fast. blkdiscard don't have throttling like cstream, but we can tune the step size of zeroes pushed to the storage. I'm using 32MB stepsize by default , like ovirt, where it seem to be the best balance between speed and load. https://github.com/oVirt/vdsm/commit/79f1d79058aad863ca4b6672d4a5ce2be8e48986 but it can be reduce with "saferemove_stepsize" option. stepsize is also autoreduce to sysfs write_zeroes_max_bytes, which is the maximum zeroing batch supported by the storage Also adding an option "saferemove_discard", to use discard instead zeroing. test with a 100G volume (empty): time /usr/bin/cstream -i /dev/zero -o /dev/test/vm-100-disk-0.qcow2 -T 10 -v 1 -b 1048576 13561233408 B 12.6 GB 10.00 s 1356062979 B/s 1.26 GB/s 26021462016 B 24.2 GB 20.00 s 1301029969 B/s 1.21 GB/s 38585499648 B 35.9 GB 30.00 s 1286135343 B/s 1.20 GB/s 50998542336 B 47.5 GB 40.00 s 1274925312 B/s 1.19 GB/s 63702765568 B 59.3 GB 50.00 s 1274009877 B/s 1.19 GB/s 76721885184 B 71.5 GB 60.00 s 1278640698 B/s 1.19 GB/s 89126539264 B 83.0 GB 70.00 s 1273178488 B/s 1.19 GB/s 101666459648 B 94.7 GB 80.00 s 1270779024 B/s 1.18 GB/s 107390959616 B 100.0 GB 84.39 s 1272531142 B/s 1.19 GB/s write: No space left on device real 1m24.394s user 0m0.171s sys 1m24.052s time blkdiscard --zeroout /dev/test/vm-100-disk-0.qcow2 -v /dev/test/vm-100-disk-0.qcow2: Zero-filled 107390959616 bytes from the offset 0 real 0m3.641s user 0m0.001s sys 0m3.433s test with a 100G volume with random data: time blkdiscard --zeroout /dev/test/vm-100-disk-0.qcow2 -v /dev/test/vm-112-disk-1: Zero-filled 4764729344 bytes from the offset 0 /dev/test/vm-112-disk-1: Zero-filled 4664066048 bytes from the offset 4764729344 /dev/test/vm-112-disk-1: Zero-filled 4831838208 bytes from the offset 9428795392 /dev/test/vm-112-disk-1: Zero-filled 4831838208 bytes from the offset 14260633600 /dev/test/vm-112-disk-1: Zero-filled 4831838208 bytes from the offset 19092471808 /dev/test/vm-112-disk-1: Zero-filled 4865392640 bytes from the offset 23924310016 /dev/test/vm-112-disk-1: Zero-filled 4596957184 bytes from the offset 28789702656 /dev/test/vm-112-disk-1: Zero-filled 4731174912 bytes from the offset 33386659840 /dev/test/vm-112-disk-1: Zero-filled 4294967296 bytes from the offset 38117834752 /dev/test/vm-112-disk-1: Zero-filled 4664066048 bytes from the offset 42412802048 /dev/test/vm-112-disk-1: Zero-filled 4697620480 bytes from the offset 47076868096 /dev/test/vm-112-disk-1: Zero-filled 4664066048 bytes from the offset 51774488576 /dev/test/vm-112-disk-1: Zero-filled 4261412864 bytes from the offset 56438554624 /dev/test/vm-112-disk-1: Zero-filled 4362076160 bytes from the offset 60699967488 /dev/test/vm-112-disk-1: Zero-filled 4127195136 bytes from the offset 65062043648 /dev/test/vm-112-disk-1: Zero-filled 4328521728 bytes from the offset 69189238784 /dev/test/vm-112-disk-1: Zero-filled 4731174912 bytes from the offset 73517760512 /dev/test/vm-112-disk-1: Zero-filled 4026531840 bytes from the offset 78248935424 /dev/test/vm-112-disk-1: Zero-filled 4194304000 bytes from the offset 82275467264 /dev/test/vm-112-disk-1: Zero-filled 4664066048 bytes from the offset 86469771264 /dev/test/vm-112-disk-1: Zero-filled 4395630592 bytes from the offset 91133837312 /dev/test/vm-112-disk-1: Zero-filled 3623878656 bytes from the offset 95529467904 /dev/test/vm-112-disk-1: Zero-filled 4462739456 bytes from the offset 99153346560 /dev/test/vm-112-disk-1: Zero-filled 3758096384 bytes from the offset 103616086016 real 0m23.969s user 0m0.030s sys 0m0.144s Signed-off-by: Alexandre Derumier --- src/PVE/Storage/LVMPlugin.pm | 74 +++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 26 deletions(-) diff --git a/src/PVE/Storage/LVMPlugin.pm b/src/PVE/Storage/LVMPlugin.pm index 0416c9e..c3a864e 100644 --- a/src/PVE/Storage/LVMPlugin.pm +++ b/src/PVE/Storage/LVMPlugin.pm @@ -6,7 +6,7 @@ use warnings; use File::Basename; use IO::File; -use PVE::Tools qw(run_command trim); +use PVE::Tools qw(run_command file_read_firstline trim); use PVE::Storage::Plugin; use PVE::JSONSchema qw(get_standard_option); @@ -287,35 +287,46 @@ my sub free_lvm_volumes { # we need to zero out LVM data for security reasons # and to allow thin provisioning my $zero_out_worker = sub { - # wipe throughput up to 10MB/s by default; may be overwritten with saferemove_throughput - my $throughput = '-10485760'; - if ($scfg->{saferemove_throughput}) { - $throughput = $scfg->{saferemove_throughput}; - } for my $name (@$volnames) { - print "zero-out data on image $name (/dev/$vg/del-$name)\n"; + my $lvmpath = "/dev/$vg/del-$name"; + print "zero-out data on image $name ($lvmpath)\n"; + + warn + "The 'saferemove_throughput' option set for '${storeid}' is deprecated and will be removed." + if $scfg->{saferemove_throughput}; + + my $stepsize = $scfg->{saferemove_stepsize} // 32; + $stepsize = $stepsize * 1048576; + + my $bdev = readlink($lvmpath); + if ($bdev && $bdev =~ m|^../(dm-(\d+))|) { + my $sysdir = "/sys/block/$1"; + + my $write_zeroes_max_bytes = + file_read_firstline("$sysdir/queue/write_zeroes_max_bytes") // 0; + ($write_zeroes_max_bytes) = $write_zeroes_max_bytes =~ m/^(\d+)$/; #untaint + + #if the storage support write_zeroes but stepsize is too big, reduce the stepsize to the max possible + if ($write_zeroes_max_bytes > 0 && $stepsize > $write_zeroes_max_bytes) { + warn + "reduce stepsize to the maximum supported by the storage : $write_zeroes_max_bytes bytes\n"; + $stepsize = $write_zeroes_max_bytes; + } + + my $discard_zeroes_data = + file_read_firstline("$sysdir/queue/discard_zeroes_data") // 0; + if ($scfg->{saferemove_discard} && $discard_zeroes_data == 0) { + warn "disabling saferemove_discard as the storage don't support it\n"; + $scfg->{saferemove_discard} = undef; + } + } my $cmd = [ - '/usr/bin/cstream', - '-i', - '/dev/zero', - '-o', - "/dev/$vg/del-$name", - '-T', - '10', - '-v', - '1', - '-b', - '1048576', - '-t', - "$throughput", + '/usr/sbin/blkdiscard', $lvmpath, '-v', '--step', "${stepsize}", ]; - eval { - run_command( - $cmd, - errmsg => "zero out finished (note: 'No space left on device' is ok here)", - ); - }; + push @$cmd, '--zeroout' if !$scfg->{saferemove_discard}; + + eval { run_command($cmd); }; warn $@ if $@; $class->cluster_lock_storage( @@ -376,6 +387,15 @@ sub properties { description => "Zero-out data when removing LVs.", type => 'boolean', }, + saferemove_discard => { + description => "Wipe with discard instead zeroing.", + type => 'boolean', + default => 0, + }, + saferemove_stepsize => { + description => "Wipe step size (default 32MB).", + enum => [qw(1 2 4 8 16 32)], + }, saferemove_throughput => { description => "Wipe throughput (cstream -t parameter value).", type => 'string', @@ -394,6 +414,8 @@ sub options { shared => { optional => 1 }, disable => { optional => 1 }, saferemove => { optional => 1 }, + saferemove_discard => { optional => 1 }, + saferemove_stepsize => { optional => 1 }, saferemove_throughput => { optional => 1 }, content => { optional => 1 }, base => { fixed => 1, optional => 1 }, -- 2.47.2 --===============0600966502695385794== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel --===============0600966502695385794==--