public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs
@ 2024-03-12 11:59 Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 1/3] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Fiona Ebner @ 2024-03-12 11:59 UTC (permalink / raw)
  To: pve-devel

When auto-dismiss=true (the default), a failed job can disappear very
quickly from the job list and there might not be any chance to see the
error in the result of 'query-block-jobs'. For jobs with $completion
being 'auto', like 'block-stream', it couldn't even be detected that
the job failed.

Jobs with auto-dismiss=false on the other hand, will wait in
'concluded' state until manually dismissed. For those, it will be
possible to query the error if the job failed.

This series makes 'drive-mirror' and 'block-stream' jobs do just that.

There doesn't seem to be a way to have only failed jobs stay around,
e.g. something like auto-dismiss=on-success.


Fiona Ebner (3):
  blockjob: anticipate jobs with auto-dismiss=false for better error
    messages and detection
  mirror: do not auto-dismiss to allow getting error message from job
  live restore: do not auto-dismiss stream job to improve error message
    and detection

 PVE/QemuServer.pm | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 1/3] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection
  2024-03-12 11:59 [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
@ 2024-03-12 11:59 ` Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 2/3] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2024-03-12 11:59 UTC (permalink / raw)
  To: pve-devel

When auto-dismiss=true (the default), a failed job can disappear very
quickly from the job list and there might not be any chance to see the
error in the result of 'query-block-jobs'. For jobs with $completion
being 'auto', like 'block-stream', it couldn't even be detected that
the job failed.

Jobs with auto-dismiss=false on the other hand, will wait in
'concluded' state until manually dismissed. For those, it will be
possible to query the error if the job failed.

There doesn't seem to be a way to have only failed jobs stay around,
e.g. something like auto-dismiss=on-success.

Planned to be used for the 'drive-mirror' and 'block-stream' jobs
initially.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index ed8b054e..07e005eb 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7883,6 +7883,9 @@ sub qemu_drive_mirror_monitor {
 
 		die "$job_id: '$op' has been cancelled\n" if !defined($job);
 
+		qemu_handle_concluded_blockjob($vmid, $job_id, $job)
+		    if $job && $job->{status} eq 'concluded';
+
 		my $busy = $job->{busy};
 		my $ready = $job->{ready};
 		if (my $total = $job->{len}) {
@@ -7983,6 +7986,19 @@ sub qemu_drive_mirror_monitor {
     }
 }
 
+# If the job was started with auto-dismiss=false, it's necessary to dismiss it manually. Using this
+# option is useful to get the error for failed jobs here. QEMU's job lock should make it impossible
+# to see a job in 'concluded' state when auto-dismiss=true.
+# $info is the 'BlockJobInfo' for the job returned by query-block-jobs.
+sub qemu_handle_concluded_blockjob {
+    my ($vmid, $job_id, $info) = @_;
+
+    eval { mon_cmd($vmid, 'job-dismiss', id => $job_id); };
+    log_warn("$job_id: failed to dismiss job - $@") if $@;
+
+    die "$job_id: $info->{error} (io-status: $info->{'io-status'})\n" if $info->{error};
+}
+
 sub qemu_blockjobs_cancel {
     my ($vmid, $jobs) = @_;
 
@@ -8001,8 +8017,14 @@ sub qemu_blockjobs_cancel {
 	}
 
 	foreach my $job (keys %$jobs) {
+	    my $info = $running_jobs->{$job};
+	    eval {
+		qemu_handle_concluded_blockjob($vmid, $job, $info)
+		    if $info && $info->{status} eq 'concluded';
+	    };
+	    log_warn($@) if $@; # only warn and proceed with canceling other jobs
 
-	    if (defined($jobs->{$job}->{cancel}) && !defined($running_jobs->{$job})) {
+	    if (defined($jobs->{$job}->{cancel}) && !defined($info)) {
 		print "$job: Done.\n";
 		delete $jobs->{$job};
 	    }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 2/3] mirror: do not auto-dismiss to allow getting error message from job
  2024-03-12 11:59 [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 1/3] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
@ 2024-03-12 11:59 ` Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 3/3] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
  2024-04-11 11:17 ` [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2024-03-12 11:59 UTC (permalink / raw)
  To: pve-devel

upon failure. Otherwise, the job would disappear too quickly from the
job list and cannot be queried for the actual error anymore.

Relevant part of the error in actual examples (note that the fact that
it's a mirror job is already mentioned earlier in the full error, with
"block job (mirror) error:"):

Before:
> 'mirror' has been cancelled
> 'mirror' has been cancelled

After:
> Source and target image have different sizes (io-status: ok)
> No space left on device (io-status: ok)

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 07e005eb..664ae38e 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7814,7 +7814,14 @@ sub qemu_drive_mirror {
 	$qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
     }
 
-    my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target };
+    my $opts = {
+	timeout => 10,
+	device => "drive-$drive",
+	mode => "existing",
+	sync => "full",
+	target => $qemu_target,
+	'auto-dismiss' => JSON::false,
+    };
     $opts->{format} = $format if $format;
 
     if (defined($src_bitmap)) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pve-devel] [PATCH qemu-server 3/3] live restore: do not auto-dismiss stream job to improve error message and detection
  2024-03-12 11:59 [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 1/3] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 2/3] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
@ 2024-03-12 11:59 ` Fiona Ebner
  2024-04-11 11:17 ` [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2024-03-12 11:59 UTC (permalink / raw)
  To: pve-devel

upon failure. Otherwise, the job would disappear too quickly from the
job list and cannot be queried for the actual error anymore.

Relevant part of the error in an actual example:

Before:
> VM 112 qmp command 'blockdev-del' failed - Node 'drive-scsi0-pbs' is busy: node is used as backing hd of '#block046'

After:
> block job (stream) error: restore-drive-scsi0: No space left on device (io-status: ok)

Note that previously, it was not even detected that the stream job
failed and the error message is because the subsequent cleanup failed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 664ae38e..51dfc0d9 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7252,6 +7252,7 @@ sub pbs_live_restore {
 	    mon_cmd($vmid, 'block-stream',
 		'job-id' => $job_id,
 		device => "$ds",
+		'auto-dismiss' => JSON::false,
 	    );
 	    $jobs->{$job_id} = {};
 	}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs
  2024-03-12 11:59 [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
                   ` (2 preceding siblings ...)
  2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 3/3] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
@ 2024-04-11 11:17 ` Fiona Ebner
  3 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:17 UTC (permalink / raw)
  To: pve-devel

Am 12.03.24 um 12:59 schrieb Fiona Ebner:
> When auto-dismiss=true (the default), a failed job can disappear very
> quickly from the job list and there might not be any chance to see the
> error in the result of 'query-block-jobs'. For jobs with $completion
> being 'auto', like 'block-stream', it couldn't even be detected that
> the job failed.
> 
> Jobs with auto-dismiss=false on the other hand, will wait in
> 'concluded' state until manually dismissed. For those, it will be
> possible to query the error if the job failed.
> 
> This series makes 'drive-mirror' and 'block-stream' jobs do just that.
> 
> There doesn't seem to be a way to have only failed jobs stay around,
> e.g. something like auto-dismiss=on-success.
> 

Superseded by a v2 also covering the new live-import:
https://lists.proxmox.com/pipermail/pve-devel/2024-April/062845.html




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-11 11:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 11:59 [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner
2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 1/3] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 2/3] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
2024-03-12 11:59 ` [pve-devel] [PATCH qemu-server 3/3] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
2024-04-11 11:17 ` [pve-devel] [PATCH-SERIES qemu-server] improve error detection/messages for some block jobs Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal