all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction
@ 2021-01-04 13:49 Stefan Reiter
  2021-01-05  9:52 ` Mira Limbeck
  2021-01-07 10:23 ` [pve-devel] applied: " Wolfgang Bumiller
  0 siblings, 2 replies; 3+ messages in thread
From: Stefan Reiter @ 2021-01-04 13:49 UTC (permalink / raw)
  To: pve-devel

Deadlocks could occur in the AIO_WAIT_WHILE loop in job_finish_sync,
which would wait for CREATED but not running jobs to complete, even
though job_enter is a no-op in that scenario. Mark offending jobs as
ABORTING immediately via job_update_rc if required.

Manifested itself in cancelling or failing backups with more than 2
drives.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Potential fix for #3225 and related forum threads:
https://forum.proxmox.com/threads/problem-mit-backup.80418/
https://forum.proxmox.com/threads/vm-hard-freezes-on-backup.81752/

 job.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/job.c b/job.c
index 97ee97a192..51984e557c 100644
--- a/job.c
+++ b/job.c
@@ -1035,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         return -EBUSY;
     }
 
+    /* in a sequential transaction jobs with status CREATED can appear at time
+     * of cancelling, these have not begun work so job_enter won't do anything,
+     * let's ensure they are marked as ABORTING if required */
+    if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
+        job_update_rc(job);
+    }
+
     AIO_WAIT_WHILE(job->aio_context,
                    (job_enter(job), !job_is_completed(job)));
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pve-devel] [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction
  2021-01-04 13:49 [pve-devel] [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction Stefan Reiter
@ 2021-01-05  9:52 ` Mira Limbeck
  2021-01-07 10:23 ` [pve-devel] applied: " Wolfgang Bumiller
  1 sibling, 0 replies; 3+ messages in thread
From: Mira Limbeck @ 2021-01-05  9:52 UTC (permalink / raw)
  To: pve-devel

Tested Stefan's prebuilt qemu package with this patch applied and my VM 
that has the issue.

Additionally tested the case of a full backup target 
(https://forum.proxmox.com/threads/vm-hard-freezes-on-backup.81752/) and 
it no longer hangs.

Looks good in my tests, so:

Tested-by: Mira Limbeck <m.limbeck@proxmox.com>

On 1/4/21 2:49 PM, Stefan Reiter wrote:
> Deadlocks could occur in the AIO_WAIT_WHILE loop in job_finish_sync,
> which would wait for CREATED but not running jobs to complete, even
> though job_enter is a no-op in that scenario. Mark offending jobs as
> ABORTING immediately via job_update_rc if required.
>
> Manifested itself in cancelling or failing backups with more than 2
> drives.
>
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>
> Potential fix for #3225 and related forum threads:
> https://forum.proxmox.com/threads/problem-mit-backup.80418/
> https://forum.proxmox.com/threads/vm-hard-freezes-on-backup.81752/
>
>   job.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/job.c b/job.c
> index 97ee97a192..51984e557c 100644
> --- a/job.c
> +++ b/job.c
> @@ -1035,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>           return -EBUSY;
>       }
>   
> +    /* in a sequential transaction jobs with status CREATED can appear at time
> +     * of cancelling, these have not begun work so job_enter won't do anything,
> +     * let's ensure they are marked as ABORTING if required */
> +    if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
> +        job_update_rc(job);
> +    }
> +
>       AIO_WAIT_WHILE(job->aio_context,
>                      (job_enter(job), !job_is_completed(job)));
>   




^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pve-devel] applied: [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction
  2021-01-04 13:49 [pve-devel] [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction Stefan Reiter
  2021-01-05  9:52 ` Mira Limbeck
@ 2021-01-07 10:23 ` Wolfgang Bumiller
  1 sibling, 0 replies; 3+ messages in thread
From: Wolfgang Bumiller @ 2021-01-07 10:23 UTC (permalink / raw)
  To: Stefan Reiter; +Cc: pve-devel

applied

On Mon, Jan 04, 2021 at 02:49:14PM +0100, Stefan Reiter wrote:
> Deadlocks could occur in the AIO_WAIT_WHILE loop in job_finish_sync,
> which would wait for CREATED but not running jobs to complete, even
> though job_enter is a no-op in that scenario. Mark offending jobs as
> ABORTING immediately via job_update_rc if required.
> 
> Manifested itself in cancelling or failing backups with more than 2
> drives.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Potential fix for #3225 and related forum threads:
> https://forum.proxmox.com/threads/problem-mit-backup.80418/
> https://forum.proxmox.com/threads/vm-hard-freezes-on-backup.81752/
> 
>  job.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/job.c b/job.c
> index 97ee97a192..51984e557c 100644
> --- a/job.c
> +++ b/job.c
> @@ -1035,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>          return -EBUSY;
>      }
>  
> +    /* in a sequential transaction jobs with status CREATED can appear at time
> +     * of cancelling, these have not begun work so job_enter won't do anything,
> +     * let's ensure they are marked as ABORTING if required */
> +    if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
> +        job_update_rc(job);
> +    }
> +
>      AIO_WAIT_WHILE(job->aio_context,
>                     (job_enter(job), !job_is_completed(job)));
>  
> -- 
> 2.20.1




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-07 10:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-04 13:49 [pve-devel] [PATCH qemu] PVE: fix aborting multiple 'CREATED' jobs in sequential transaction Stefan Reiter
2021-01-05  9:52 ` Mira Limbeck
2021-01-07 10:23 ` [pve-devel] applied: " Wolfgang Bumiller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal