public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images
@ 2023-06-14 11:10 Aaron Lauterer
  2023-06-14 11:10 ` [pve-devel] [PATCH v2 manager 2/2] ui: ContentView: don't show size if it is -1 Aaron Lauterer
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Aaron Lauterer @ 2023-06-14 11:10 UTC (permalink / raw)
  To: pve-devel

It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls
-l' will then show errors that it is not possible to open the image in
question:
```
rbd: error opening vm-103-disk-1: (2) No such file or directory
rbd: listing images failed: (2) No such file or directory
```

Originally we only showed the last error line which is too generic and
doesn't give a good hint what is actually wrong.

We can improve that by catching these specific errors and add the
problematic disk images to the returned list with a size of '-1'.

When the 'rbd rm' command is used on such an image, it will clean up
whatever is still left.
But for that to work, we also need to handle these errors in the
'rbd_ls_snap' sub as it is called from 'free_image'.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
no changes since v1

 src/PVE/Storage/RBDPlugin.pm | 52 +++++++++++++++++++++++++++++++-----
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
index f45ad3f..c4e4467 100644
--- a/src/PVE/Storage/RBDPlugin.pm
+++ b/src/PVE/Storage/RBDPlugin.pm
@@ -169,6 +169,8 @@ my $krbd_feature_update = sub {
     }
 };
 
+my $missing_image_err_regex = '((?:vm|base)-\d+-.*): \(2\) No such file or directory$';
+
 sub run_rbd_command {
     my ($cmd, %args) = @_;
 
@@ -207,13 +209,28 @@ sub rbd_ls {
     my $raw = '';
     my $parser = sub { $raw .= shift };
 
+    my $show_err = 1;
+    my $missing_images = {};
+    my $err_parser = sub {
+	my $line = shift;
+	if ($line =~ m/$missing_image_err_regex/) {
+	    $show_err = 0;
+	    $missing_images->{$1} = 1;
+	} elsif ($line ne "rbd: listing images failed: (2) No such file or directory") {
+	    # this generic error is shown after the image specific "No such file..." one,
+	    # ignore it but not other errors
+	    $show_err = 1;
+	    die $line;
+	}
+    };
+
     my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
     eval {
-	run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => $parser);
+	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => $parser);
     };
     my $err = $@;
 
-    die $err if $err && $err !~ m/doesn't contain rbd images/ ;
+    die $err if $err && $show_err && $err !~ m/doesn't contain rbd images/ ;
 
     my $result;
     if ($raw eq '') {
@@ -224,6 +241,13 @@ sub rbd_ls {
 	die "got unexpected data from rbd ls: '$raw'\n";
     }
 
+    for my $image (keys %$missing_images) {
+	push @$result, {
+	    image => $image,
+	    size => -1,
+	};
+    }
+
     my $list = {};
 
     foreach my $el (@$result) {
@@ -251,7 +275,20 @@ sub rbd_ls_snap {
     my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'ls', $name, '--format', 'json');
 
     my $raw = '';
-    run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => sub { $raw .= shift; });
+    my $show_err = 0;
+    my $err_parser = sub {
+	my $line = shift;
+	if ($line !~ m/$missing_image_err_regex/) {
+	    $show_err = 1;
+	    die $line;
+	}
+    };
+    eval {
+	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => sub { $raw .= shift; });
+    };
+    my $err = $@;
+    die $err if $err && $show_err;
+    return {} if $err && !$show_err; # could not open image, probably missing
 
     my $list;
     if ($raw =~ m/^(\[.*\])$/s) { # untaint
@@ -633,10 +670,13 @@ sub free_image {
 
     $class->deactivate_volume($storeid, $scfg, $volname);
 
-    my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
-    run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
 
-    $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
+    if (keys %{$snaps}) {
+	my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
+	run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
+    }
+
+    my $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
     run_rbd_command($cmd, errmsg => "rbd rm '$name' error");
 
     return undef;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH v2 manager 2/2] ui: ContentView: don't show size if it is -1
  2023-06-14 11:10 [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
@ 2023-06-14 11:10 ` Aaron Lauterer
  2023-07-18  7:26 ` [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
  2023-08-21 15:05 ` Fiona Ebner
  2 siblings, 0 replies; 5+ messages in thread
From: Aaron Lauterer @ 2023-06-14 11:10 UTC (permalink / raw)
  To: pve-devel

If a disk image reports a size of '-1', something is most likely amiss.
The RBD storage plugin for example returns it, if the image is broken
and only remnants remain.

In such a situation, instead of showing '-1 B', we bette show nothing.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
This patch is not necessary, but I think will make it nicer to look at
and see broken images.

changes since v1:
instead of changing the size formatter in the widget toolkit, we handle
the case of size=-1 in the renderer directly

 www/manager6/storage/ContentView.js | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/www/manager6/storage/ContentView.js b/www/manager6/storage/ContentView.js
index 2761b48e..29926612 100644
--- a/www/manager6/storage/ContentView.js
+++ b/www/manager6/storage/ContentView.js
@@ -182,7 +182,12 @@ Ext.define('PVE.storage.ContentView', {
 	    'size': {
 		header: gettext('Size'),
 		width: 100,
-		renderer: Proxmox.Utils.format_size,
+		renderer: function(size) {
+		    if (Number(size) === -1) {
+			return '';
+		    }
+		    return Proxmox.Utils.format_size(size);
+		},
 		dataIndex: 'size',
 	    },
 	};
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images
  2023-06-14 11:10 [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
  2023-06-14 11:10 ` [pve-devel] [PATCH v2 manager 2/2] ui: ContentView: don't show size if it is -1 Aaron Lauterer
@ 2023-07-18  7:26 ` Aaron Lauterer
  2023-08-21 15:05 ` Fiona Ebner
  2 siblings, 0 replies; 5+ messages in thread
From: Aaron Lauterer @ 2023-07-18  7:26 UTC (permalink / raw)
  To: pve-devel

ping?

On 6/14/23 13:10, Aaron Lauterer wrote:
> It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls
> -l' will then show errors that it is not possible to open the image in
> question:
> ```
> rbd: error opening vm-103-disk-1: (2) No such file or directory
> rbd: listing images failed: (2) No such file or directory
> ```
> 
> Originally we only showed the last error line which is too generic and
> doesn't give a good hint what is actually wrong.
> 
> We can improve that by catching these specific errors and add the
> problematic disk images to the returned list with a size of '-1'.
> 
> When the 'rbd rm' command is used on such an image, it will clean up
> whatever is still left.
> But for that to work, we also need to handle these errors in the
> 'rbd_ls_snap' sub as it is called from 'free_image'.
> 
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> no changes since v1
> 
>   src/PVE/Storage/RBDPlugin.pm | 52 +++++++++++++++++++++++++++++++-----
>   1 file changed, 46 insertions(+), 6 deletions(-)
> 
> diff --git a/src/PVE/Storage/RBDPlugin.pm b/src/PVE/Storage/RBDPlugin.pm
> index f45ad3f..c4e4467 100644
> --- a/src/PVE/Storage/RBDPlugin.pm
> +++ b/src/PVE/Storage/RBDPlugin.pm
> @@ -169,6 +169,8 @@ my $krbd_feature_update = sub {
>       }
>   };
>   
> +my $missing_image_err_regex = '((?:vm|base)-\d+-.*): \(2\) No such file or directory$';
> +
>   sub run_rbd_command {
>       my ($cmd, %args) = @_;
>   
> @@ -207,13 +209,28 @@ sub rbd_ls {
>       my $raw = '';
>       my $parser = sub { $raw .= shift };
>   
> +    my $show_err = 1;
> +    my $missing_images = {};
> +    my $err_parser = sub {
> +	my $line = shift;
> +	if ($line =~ m/$missing_image_err_regex/) {
> +	    $show_err = 0;
> +	    $missing_images->{$1} = 1;
> +	} elsif ($line ne "rbd: listing images failed: (2) No such file or directory") {
> +	    # this generic error is shown after the image specific "No such file..." one,
> +	    # ignore it but not other errors
> +	    $show_err = 1;
> +	    die $line;
> +	}
> +    };
> +
>       my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
>       eval {
> -	run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => $parser);
> +	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => $parser);
>       };
>       my $err = $@;
>   
> -    die $err if $err && $err !~ m/doesn't contain rbd images/ ;
> +    die $err if $err && $show_err && $err !~ m/doesn't contain rbd images/ ;
>   
>       my $result;
>       if ($raw eq '') {
> @@ -224,6 +241,13 @@ sub rbd_ls {
>   	die "got unexpected data from rbd ls: '$raw'\n";
>       }
>   
> +    for my $image (keys %$missing_images) {
> +	push @$result, {
> +	    image => $image,
> +	    size => -1,
> +	};
> +    }
> +
>       my $list = {};
>   
>       foreach my $el (@$result) {
> @@ -251,7 +275,20 @@ sub rbd_ls_snap {
>       my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'ls', $name, '--format', 'json');
>   
>       my $raw = '';
> -    run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => sub { $raw .= shift; });
> +    my $show_err = 0;
> +    my $err_parser = sub {
> +	my $line = shift;
> +	if ($line !~ m/$missing_image_err_regex/) {
> +	    $show_err = 1;
> +	    die $line;
> +	}
> +    };
> +    eval {
> +	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => sub { $raw .= shift; });
> +    };
> +    my $err = $@;
> +    die $err if $err && $show_err;
> +    return {} if $err && !$show_err; # could not open image, probably missing
>   
>       my $list;
>       if ($raw =~ m/^(\[.*\])$/s) { # untaint
> @@ -633,10 +670,13 @@ sub free_image {
>   
>       $class->deactivate_volume($storeid, $scfg, $volname);
>   
> -    my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
> -    run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
>   
> -    $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
> +    if (keys %{$snaps}) {
> +	my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
> +	run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
> +    }
> +
> +    my $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
>       run_rbd_command($cmd, errmsg => "rbd rm '$name' error");
>   
>       return undef;




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images
  2023-06-14 11:10 [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
  2023-06-14 11:10 ` [pve-devel] [PATCH v2 manager 2/2] ui: ContentView: don't show size if it is -1 Aaron Lauterer
  2023-07-18  7:26 ` [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
@ 2023-08-21 15:05 ` Fiona Ebner
  2023-08-22  9:42   ` Aaron Lauterer
  2 siblings, 1 reply; 5+ messages in thread
From: Fiona Ebner @ 2023-08-21 15:05 UTC (permalink / raw)
  To: Proxmox VE development discussion, Aaron Lauterer

Am 14.06.23 um 13:10 schrieb Aaron Lauterer:
> It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls
> -l' will then show errors that it is not possible to open the image in
> question:
> ```
> rbd: error opening vm-103-disk-1: (2) No such file or directory
> rbd: listing images failed: (2) No such file or directory
> ```
> 
> Originally we only showed the last error line which is too generic and
> doesn't give a good hint what is actually wrong.
> 
> We can improve that by catching these specific errors and add the
> problematic disk images to the returned list with a size of '-1'.
> 

What do you think about logging a warning instead, hinting that it might
be a partially removed image? The thing I'm a bit worried about is that
existing scripts/tools interacting with our API might get confused by
the -1. And if I use the UI, I don't see it with either approach,
because your next patch hides it. If I use the CLI, I'll see either the
warning or the -1 depending on the approach.

> @@ -207,13 +209,28 @@ sub rbd_ls {
>      my $raw = '';
>      my $parser = sub { $raw .= shift };
>  
> +    my $show_err = 1;
> +    my $missing_images = {};
> +    my $err_parser = sub {
> +	my $line = shift;
> +	if ($line =~ m/$missing_image_err_regex/) {
> +	    $show_err = 0;

While both might be edge cases: What if there was some other error
before this one that we should die on? Or what if another error happens
in such a way that I don't get another stderr log line? Then $show_err
will still be 0 below and the function doesn't die.

It might be slightly better to do:
1. if there was any stderr log line we don't want to ignore, die
2. if there was none, base the decision off whether the final log line
was the "rbd: listing images failed: (2) No such file or directory"

> +	    $missing_images->{$1} = 1;
> +	} elsif ($line ne "rbd: listing images failed: (2) No such file or directory") {
> +	    # this generic error is shown after the image specific "No such file..." one,
> +	    # ignore it but not other errors
> +	    $show_err = 1;
> +	    die $line;
> +	}
> +    };
> +
>      my $cmd = $rbd_cmd->($scfg, $storeid, 'ls', '-l', '--format', 'json');
>      eval {
> -	run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => $parser);
> +	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => $parser);
>      };
>      my $err = $@;
>  
> -    die $err if $err && $err !~ m/doesn't contain rbd images/ ;
> +    die $err if $err && $show_err && $err !~ m/doesn't contain rbd images/ ;
>  
The "doesn't contain rbd images" bit could also be added to the
err_parser() :)

>      my $result;
>      if ($raw eq '') {
> @@ -224,6 +241,13 @@ sub rbd_ls {
>  	die "got unexpected data from rbd ls: '$raw'\n";
>      }
>  
> +    for my $image (keys %$missing_images) {
> +	push @$result, {
> +	    image => $image,
> +	    size => -1,
> +	};
> +    }
> +
>      my $list = {};
>  
>      foreach my $el (@$result) {
> @@ -251,7 +275,20 @@ sub rbd_ls_snap {
>      my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'ls', $name, '--format', 'json');
>  
>      my $raw = '';
> -    run_rbd_command($cmd, errmsg => "rbd error", errfunc => sub {}, outfunc => sub { $raw .= shift; });
> +    my $show_err = 0;

Similar to the above, but this can happen more easily I think: What if
there is no stderr log line, but the command fails?

Slightly better:
1. if we got no log lines at all, but command failed, die
2. if there was any stderr log line we don't want to ignore, also die
3. If we only got log lines we want to ignore, don't die

> +    my $err_parser = sub {
> +	my $line = shift;
> +	if ($line !~ m/$missing_image_err_regex/) {
> +	    $show_err = 1;
> +	    die $line;
> +	}
> +    };
> +    eval {
> +	run_rbd_command($cmd, errmsg => "rbd error", errfunc => $err_parser, outfunc => sub { $raw .= shift; });
> +    };
> +    my $err = $@;
> +    die $err if $err && $show_err;
> +    return {} if $err && !$show_err; # could not open image, probably missing
>  
>      my $list;
>      if ($raw =~ m/^(\[.*\])$/s) { # untaint
> @@ -633,10 +670,13 @@ sub free_image {
>  
>      $class->deactivate_volume($storeid, $scfg, $volname);
>  
> -    my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
> -    run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
>  
> -    $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
> +    if (keys %{$snaps}) {
> +	my $cmd = $rbd_cmd->($scfg, $storeid, 'snap', 'purge',  $name);
> +	run_rbd_command($cmd, errmsg => "rbd snap purge '$name' error");
> +    }
> +
> +    my $cmd = $rbd_cmd->($scfg, $storeid, 'rm', $name);
>      run_rbd_command($cmd, errmsg => "rbd rm '$name' error");
>  
>      return undef;

Does the 'snap purge' command on such a partially removed image also
fail? If that was the motivation for this change, please mention it in
the commit message. Otherwise, it can be it's own patch ;)




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images
  2023-08-21 15:05 ` Fiona Ebner
@ 2023-08-22  9:42   ` Aaron Lauterer
  0 siblings, 0 replies; 5+ messages in thread
From: Aaron Lauterer @ 2023-08-22  9:42 UTC (permalink / raw)
  To: Fiona Ebner, Proxmox VE development discussion

Thanks. I'll look into your comments regarding the actual error handling. But 
there is one important question that we need to decide on. See below.

On 8/21/23 17:05, Fiona Ebner wrote:
> Am 14.06.23 um 13:10 schrieb Aaron Lauterer:
>> It can happen, that an RBD image isn't cleaned up 100%. Calling 'rbd ls
>> -l' will then show errors that it is not possible to open the image in
>> question:
>> ```
>> rbd: error opening vm-103-disk-1: (2) No such file or directory
>> rbd: listing images failed: (2) No such file or directory
>> ```
>>
>> Originally we only showed the last error line which is too generic and
>> doesn't give a good hint what is actually wrong.
>>
>> We can improve that by catching these specific errors and add the
>> problematic disk images to the returned list with a size of '-1'.
>>
> 
> What do you think about logging a warning instead, hinting that it might
> be a partially removed image? The thing I'm a bit worried about is that
> existing scripts/tools interacting with our API might get confused by
> the -1. And if I use the UI, I don't see it with either approach,
> because your next patch hides it. If I use the CLI, I'll see either the
> warning or the -1 depending on the approach.
> 

Showing warnings on the CLI is a good idea either way, but the question is, do 
we want to list a broken image? If yes, then the users are more likely to notice 
that something is amiss. If we only log it, then chances are rather low as only 
users who use the CLI will see the warnings.
The downside though is, as you mentioned, that some external tools/scripts might 
be confused about it.

I am not sure how easy it is to pass through a dedicated parameter to the 
storage plugin. If, we could indicate that we want the disk images listed when 
we call it from the GUI for example. Though that might introduce much more 
complexity and discrepancy depending on which tool is used. Therefore probably 
not a good idea.

How we render a broken image in the GUI is something that can then be done 
almost any way we seem fit. It could be even something like "-1 (broken?)" in 
the Size column.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-22  9:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-14 11:10 [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
2023-06-14 11:10 ` [pve-devel] [PATCH v2 manager 2/2] ui: ContentView: don't show size if it is -1 Aaron Lauterer
2023-07-18  7:26 ` [pve-devel] [PATCH v2 storage 1/2] rbd: improve handling of missing images Aaron Lauterer
2023-08-21 15:05 ` Fiona Ebner
2023-08-22  9:42   ` Aaron Lauterer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal