From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 07C5E8672 for ; Mon, 21 Aug 2023 17:06:10 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D733036C8 for ; Mon, 21 Aug 2023 17:05:39 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Mon, 21 Aug 2023 17:05:38 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 2815B42DC5 for ; Mon, 21 Aug 2023 17:05:38 +0200 (CEST) Message-ID: Date: Mon, 21 Aug 2023 17:05:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Content-Language: en-US To: Proxmox VE development discussion , Aaron Lauterer References: <20230614111022.1432946-1-a.lauterer@proxmox.com> From: Fiona Ebner In-Reply-To: <20230614111022.1432946-1-a.lauterer@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 2.069 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -4.279 Looks like a legit reply (A) SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Aug 2023 15:06:10 -0000 Am 14.06.23 um 13:10 schrieb Aaron Lauterer: > It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls > -l' will then show errors that it is not possible to open the image in > question: > ``` > rbd: error opening vm-103-disk-1: (2) No such file or directory > rbd: listing images failed: (2) No such file or directory > ``` > > Originally we only showed the last error line which is too generic and > doesn't give a good hint what is actually wrong. > > We can improve that by catching these specific errors and add the > problematic disk images to the returned list with a size of '-1'. > What do you think about logging a warning instead, hinting that it might be a partially removed image? The thing I'm a bit worried about is that existing scripts/tools interacting with our API might get confused by the -1. And if I use the UI, I don't see it with either approach, because your next patch hides it. If I use the CLI, I'll see either the warning or the -1 depending on the approach. > @@ -207,13 +209,28 @@ sub rbd_ls { > my $raw = ''; > my $parser = sub { $raw .= shift }; > > + my $show_err = 1; > + my $missing_images = {}; > + my $err_parser = sub { > + my $line = shift; > + if ($line =~ m/$missing_image_err_regex/) { > + $show_err = 0; While both might be edge cases: What if there was some other error before this one that we should die on? Or what if another error happens in such a way that I don't get another stderr log line? Then $show_err will still be 0 below and the function doesn't die. It might be slightly better to do: 1. if there was any stderr log line we don't want to ignore, die 2. if there was none, base the decision off whether the final log line was the "rbd: listing images failed: (2) No such file or directory" > + $missing_images->{$1} = 1; > + } elsif ($line ne "rbd: listing images failed: (2) No such file or directory") { > + # this generic error is shown after the image specific "No such file..." one, > + # ignore it but not other errors > + $show_err = 1; > + die $line; > + } > + }; > + > my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json'); > eval { > - run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => $parser); > + run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => $parser); > }; > my $err = $@; > > - die $err if $err && $err !~ m/doesn't contain rbd images/ ; > + die $err if $err && $show_err && $err !~ m/doesn't contain rbd images/ ; > The "doesn't contain rbd images" bit could also be added to the err_parser() :) > my $result; > if ($raw eq '') { > @@ -224,6 +241,13 @@ sub rbd_ls { > die "got unexpected data from rbd ls: '$raw'\n"; > } > > + for my $image (keys %$missing_images) { > + push @$result, { > + image => $image, > + size => -1, > + }; > + } > + > my $list = {}; > > foreach my $el (@$result) { > @@ -251,7 +275,20 @@ sub rbd_ls_snap { > my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'ls', $name, '--format', 'json'); > > my $raw = ''; > - run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => sub { $raw .= shift; }); > + my $show_err = 0; Similar to the above, but this can happen more easily I think: What if there is no stderr log line, but the command fails? Slightly better: 1. if we got no log lines at all, but command failed, die 2. if there was any stderr log line we don't want to ignore, also die 3. If we only got log lines we want to ignore, don't die > + my $err_parser = sub { > + my $line = shift; > + if ($line !~ m/$missing_image_err_regex/) { > + $show_err = 1; > + die $line; > + } > + }; > + eval { > + run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => sub { $raw .= shift; }); > + }; > + my $err = $@; > + die $err if $err && $show_err; > + return {} if $err && !$show_err; # could not open image, probably missing > > my $list; > if ($raw =~ m/^(\[.*\])$/s) { # untaint > @@ -633,10 +670,13 @@ sub free_image { > > $class->deactivate_volume($storeid, $scfg, $volname); > > - my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge', $name); > - run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error"); > > - $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name); > + if (keys %{$snaps}) { > + my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge', $name); > + run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error"); > + } > + > + my $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name); > run_rbd_command($cmd, errmsg => "rbd rm '$name' error"); > > return undef; Does the 'snap purge' command on such a partially removed image also fail? If that was the motivation for this change, please mention it in the commit message. Otherwise, it can be it's own patch ;)