public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs
@ 2024-04-11 11:16 Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 1/4] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:16 UTC (permalink / raw)
  To: pve-devel

Changes in v2:
    * Also do not auto-dismiss for the stream job for the new
      live-import feature.

When auto-dismiss=true (the default), a failed job can disappear very
quickly from the job list and there might not be any chance to see the
error in the result of 'query-block-jobs'. For jobs with $completion
being 'auto', like 'block-stream', it couldn't even be detected that
the job failed.

Jobs with auto-dismiss=false on the other hand, will wait in
'concluded' state until manually dismissed. For those, it will be
possible to query the error if the job failed.

This series makes 'drive-mirror' and 'block-stream' jobs do just that.

There doesn't seem to be a way to have only failed jobs stay around,
e.g. something like auto-dismiss=on-success.


Fiona Ebner (4):
  blockjob: anticipate jobs with auto-dismiss=false for better error
    messages and detection
  mirror: do not auto-dismiss to allow getting error message from job
  live restore: do not auto-dismiss stream job to improve error message
    and detection
  live import: do not auto-dismiss stream job to improve error message
    and detection

 PVE/QemuServer.pm | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

-- 
2.39.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 1/4] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection
  2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
@ 2024-04-11 11:16 ` Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 2/4] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:16 UTC (permalink / raw)
  To: pve-devel

When auto-dismiss=true (the default), a failed job can disappear very
quickly from the job list and there might not be any chance to see the
error in the result of 'query-block-jobs'. For jobs with $completion
being 'auto', like 'block-stream', it couldn't even be detected that
the job failed.

Jobs with auto-dismiss=false on the other hand, will wait in
'concluded' state until manually dismissed. For those, it will be
possible to query the error if the job failed.

There doesn't seem to be a way to have only failed jobs stay around,
e.g. something like auto-dismiss=on-success.

Planned to be used for the 'drive-mirror' and 'block-stream' jobs
initially.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index abe175a4..e5543237 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7976,6 +7976,9 @@ sub qemu_drive_mirror_monitor {
 
 		die "$job_id: '$op' has been cancelled\n" if !defined($job);
 
+		qemu_handle_concluded_blockjob($vmid, $job_id, $job)
+		    if $job && $job->{status} eq 'concluded';
+
 		my $busy = $job->{busy};
 		my $ready = $job->{ready};
 		if (my $total = $job->{len}) {
@@ -8076,6 +8079,19 @@ sub qemu_drive_mirror_monitor {
     }
 }
 
+# If the job was started with auto-dismiss=false, it's necessary to dismiss it manually. Using this
+# option is useful to get the error for failed jobs here. QEMU's job lock should make it impossible
+# to see a job in 'concluded' state when auto-dismiss=true.
+# $info is the 'BlockJobInfo' for the job returned by query-block-jobs.
+sub qemu_handle_concluded_blockjob {
+    my ($vmid, $job_id, $info) = @_;
+
+    eval { mon_cmd($vmid, 'job-dismiss', id => $job_id); };
+    log_warn("$job_id: failed to dismiss job - $@") if $@;
+
+    die "$job_id: $info->{error} (io-status: $info->{'io-status'})\n" if $info->{error};
+}
+
 sub qemu_blockjobs_cancel {
     my ($vmid, $jobs) = @_;
 
@@ -8094,8 +8110,14 @@ sub qemu_blockjobs_cancel {
 	}
 
 	foreach my $job (keys %$jobs) {
+	    my $info = $running_jobs->{$job};
+	    eval {
+		qemu_handle_concluded_blockjob($vmid, $job, $info)
+		    if $info && $info->{status} eq 'concluded';
+	    };
+	    log_warn($@) if $@; # only warn and proceed with canceling other jobs
 
-	    if (defined($jobs->{$job}->{cancel}) && !defined($running_jobs->{$job})) {
+	    if (defined($jobs->{$job}->{cancel}) && !defined($info)) {
 		print "$job: Done.\n";
 		delete $jobs->{$job};
 	    }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 2/4] mirror: do not auto-dismiss to allow getting error message from job
  2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 1/4] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
@ 2024-04-11 11:16 ` Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 3/4] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:16 UTC (permalink / raw)
  To: pve-devel

upon failure. Otherwise, the job would disappear too quickly from the
job list and cannot be queried for the actual error anymore.

Relevant part of the error in actual examples (note that the fact that
it's a mirror job is already mentioned earlier in the full error, with
"block job (mirror) error:"):

Before:
> 'mirror' has been cancelled
> 'mirror' has been cancelled

After:
> Source and target image have different sizes (io-status: ok)
> No space left on device (io-status: ok)

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index e5543237..77aaf718 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7907,7 +7907,14 @@ sub qemu_drive_mirror {
 	$qemu_target = $is_zero_initialized ? "zeroinit:$dst_path" : $dst_path;
     }
 
-    my $opts = { timeout => 10, device => "drive-$drive", mode => "existing", sync => "full", target => $qemu_target };
+    my $opts = {
+	timeout => 10,
+	device => "drive-$drive",
+	mode => "existing",
+	sync => "full",
+	target => $qemu_target,
+	'auto-dismiss' => JSON::false,
+    };
     $opts->{format} = $format if $format;
 
     if (defined($src_bitmap)) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 3/4] live restore: do not auto-dismiss stream job to improve error message and detection
  2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 1/4] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 2/4] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
@ 2024-04-11 11:16 ` Fiona Ebner
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 4/4] live import: " Fiona Ebner
  2024-07-02 14:08 ` [pve-devel] applied-series: [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fabian Grünbichler
  4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:16 UTC (permalink / raw)
  To: pve-devel

upon failure. Otherwise, the job would disappear too quickly from the
job list and cannot be queried for the actual error anymore.

Relevant part of the error in an actual example:

Before:
> VM 112 qmp command 'blockdev-del' failed - Node 'drive-scsi0-pbs' is busy: node is used as backing hd of '#block046'

After:
> block job (stream) error: restore-drive-scsi0: No space left on device (io-status: ok)

Note that previously, it was not even detected that the stream job
failed and the error message is because the subsequent cleanup failed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 77aaf718..a3d7d727 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7254,6 +7254,7 @@ sub pbs_live_restore {
 	    mon_cmd($vmid, 'block-stream',
 		'job-id' => $job_id,
 		device => "$ds",
+		'auto-dismiss' => JSON::false,
 	    );
 	    $jobs->{$job_id} = {};
 	}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] [PATCH v2 qemu-server 4/4] live import: do not auto-dismiss stream job to improve error message and detection
  2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
                   ` (2 preceding siblings ...)
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 3/4] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
@ 2024-04-11 11:16 ` Fiona Ebner
  2024-07-02 14:08 ` [pve-devel] applied-series: [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fabian Grünbichler
  4 siblings, 0 replies; 6+ messages in thread
From: Fiona Ebner @ 2024-04-11 11:16 UTC (permalink / raw)
  To: pve-devel

upon failure. Otherwise, the job would disappear too quickly from the
job list and cannot be queried for the actual error anymore.

Relevant part of the error in an actual example:

Before:
> VM 106 qmp command 'blockdev-del' failed - Node 'drive-scsi0-restore' is busy: node is used as backing hd of '#block655'

After:
> block job (stream) error: restore-scsi0: No space left on device (io-status: ok)

Note that previously, it was not even detected that the stream job
failed and the error message is because the subsequent cleanup failed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

New in v2.

 PVE/QemuServer.pm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index a3d7d727..73d95687 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -7341,6 +7341,7 @@ sub live_import_from_files {
 	    mon_cmd($vmid, 'block-stream',
 		'job-id' => $job_id,
 		device => "drive-$ds",
+		'auto-dismiss' => JSON::false,
 	    );
 	    $jobs->{$job_id} = {};
 	}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pve-devel] applied-series: [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs
  2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
                   ` (3 preceding siblings ...)
  2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 4/4] live import: " Fiona Ebner
@ 2024-07-02 14:08 ` Fabian Grünbichler
  4 siblings, 0 replies; 6+ messages in thread
From: Fabian Grünbichler @ 2024-07-02 14:08 UTC (permalink / raw)
  To: Proxmox VE development discussion

On April 11, 2024 1:16 pm, Fiona Ebner wrote:
> Changes in v2:
>     * Also do not auto-dismiss for the stream job for the new
>       live-import feature.
> 
> When auto-dismiss=true (the default), a failed job can disappear very
> quickly from the job list and there might not be any chance to see the
> error in the result of 'query-block-jobs'. For jobs with $completion
> being 'auto', like 'block-stream', it couldn't even be detected that
> the job failed.
> 
> Jobs with auto-dismiss=false on the other hand, will wait in
> 'concluded' state until manually dismissed. For those, it will be
> possible to query the error if the job failed.
> 
> This series makes 'drive-mirror' and 'block-stream' jobs do just that.
> 
> There doesn't seem to be a way to have only failed jobs stay around,
> e.g. something like auto-dismiss=on-success.
> 
> 
> Fiona Ebner (4):
>   blockjob: anticipate jobs with auto-dismiss=false for better error
>     messages and detection
>   mirror: do not auto-dismiss to allow getting error message from job
>   live restore: do not auto-dismiss stream job to improve error message
>     and detection
>   live import: do not auto-dismiss stream job to improve error message
>     and detection
> 
>  PVE/QemuServer.pm | 35 +++++++++++++++++++++++++++++++++--
>  1 file changed, 33 insertions(+), 2 deletions(-)
> 
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
> 
> 


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-02 14:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-11 11:16 [pve-devel] [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fiona Ebner
2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 1/4] blockjob: anticipate jobs with auto-dismiss=false for better error messages and detection Fiona Ebner
2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 2/4] mirror: do not auto-dismiss to allow getting error message from job Fiona Ebner
2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 3/4] live restore: do not auto-dismiss stream job to improve error message and detection Fiona Ebner
2024-04-11 11:16 ` [pve-devel] [PATCH v2 qemu-server 4/4] live import: " Fiona Ebner
2024-07-02 14:08 ` [pve-devel] applied-series: [PATCH-SERIES v2 qemu-server] improve error detection/messages for some block jobs Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal