From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 8B1FC629A1 for ; Tue, 27 Oct 2020 15:17:36 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 88EA5183B3 for ; Tue, 27 Oct 2020 15:17:36 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 6EA29183A7 for ; Tue, 27 Oct 2020 15:17:35 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 3A53845F55 for ; Tue, 27 Oct 2020 15:17:35 +0100 (CET) Date: Tue, 27 Oct 2020 15:17:34 +0100 From: Wolfgang Bumiller To: Stefan Reiter Cc: pve-devel@lists.proxmox.com Message-ID: <20201027141734.x2puokunzh3nkww3@olga.proxmox.com> References: <20201022121118.5504-1-s.reiter@proxmox.com> <20201022121118.5504-3-s.reiter@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201022121118.5504-3-s.reiter@proxmox.com> User-Agent: NeoMutt/20180716 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.013 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu 2/2] PVE: Don't call job_cancel in coroutines X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2020 14:17:36 -0000 On Thu, Oct 22, 2020 at 02:11:18PM +0200, Stefan Reiter wrote: > ...because it hangs on cancelling other jobs in the txn if you do. > > Signed-off-by: Stefan Reiter > --- > pve-backup.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/pve-backup.c b/pve-backup.c > index 9179754dcb..af2db0d4b9 100644 > --- a/pve-backup.c > +++ b/pve-backup.c > @@ -82,6 +82,12 @@ typedef struct PVEBackupDevInfo { > BlockJob *job; > } PVEBackupDevInfo; > > +typedef struct JobCancelData { > + AioContext *ctx; > + Coroutine *co; > + Job *job; > +} JobCancelData; > + > static void pvebackup_propagate_error(Error *err) > { > qemu_mutex_lock(&backup_state.stat.lock); > @@ -332,6 +338,18 @@ static void pvebackup_complete_cb(void *opaque, int ret) > aio_co_enter(qemu_get_aio_context(), co); > } > > +/* > + * job_cancel(_sync) does not like to be called from coroutines, so defer to > + * main loop processing via a bottom half. > + */ > +static void job_cancel_bh(void *opaque) { > + JobCancelData *data = (JobCancelData*)opaque; > + aio_context_acquire(data->job->aio_context); > + job_cancel_sync(data->job); > + aio_context_release(data->job->aio_context); > + aio_co_schedule(data->ctx, data->co); > +} > + > static void coroutine_fn pvebackup_co_cancel(void *opaque) > { > Error *cancel_err = NULL; > @@ -357,7 +375,13 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque) > NULL; > > if (cancel_job) { > - job_cancel(&cancel_job->job, false); > + JobCancelData data = { > + .ctx = qemu_get_current_aio_context(), > + .co = qemu_coroutine_self(), > + .job = &cancel_job->job, > + }; > + aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data); > + qemu_coroutine_yield(); Don't we need some kind of synchronization here? The yield does not guarantee we don't run before the bh is run, or does it? Maybe a condvar to trigger the coro after the job cancel bh? > } > > qemu_co_mutex_unlock(&backup_state.backup_mutex); > -- > 2.20.1