From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3A0F6629AF for ; Tue, 27 Oct 2020 15:57:38 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2DD44188C6 for ; Tue, 27 Oct 2020 15:57:08 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id A7DB2188B9 for ; Tue, 27 Oct 2020 15:57:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 75D7A45F65 for ; Tue, 27 Oct 2020 15:57:07 +0100 (CET) To: Wolfgang Bumiller Cc: pve-devel@lists.proxmox.com References: <20201022121118.5504-1-s.reiter@proxmox.com> <20201022121118.5504-3-s.reiter@proxmox.com> <20201027141734.x2puokunzh3nkww3@olga.proxmox.com> From: Stefan Reiter Message-ID: <9223e62b-4fee-3106-fe69-51d36a30e0b6@proxmox.com> Date: Tue, 27 Oct 2020 15:57:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20201027141734.x2puokunzh3nkww3@olga.proxmox.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 1.049 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -2.167 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH qemu 2/2] PVE: Don't call job_cancel in coroutines X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2020 14:57:38 -0000 On 10/27/20 3:17 PM, Wolfgang Bumiller wrote: > On Thu, Oct 22, 2020 at 02:11:18PM +0200, Stefan Reiter wrote: >> ...because it hangs on cancelling other jobs in the txn if you do. >> >> Signed-off-by: Stefan Reiter >> --- >> pve-backup.c | 26 +++++++++++++++++++++++++- >> 1 file changed, 25 insertions(+), 1 deletion(-) >> >> diff --git a/pve-backup.c b/pve-backup.c >> index 9179754dcb..af2db0d4b9 100644 >> --- a/pve-backup.c >> +++ b/pve-backup.c >> @@ -82,6 +82,12 @@ typedef struct PVEBackupDevInfo { >> BlockJob *job; >> } PVEBackupDevInfo; >> >> +typedef struct JobCancelData { >> + AioContext *ctx; >> + Coroutine *co; >> + Job *job; >> +} JobCancelData; >> + >> static void pvebackup_propagate_error(Error *err) >> { >> qemu_mutex_lock(&backup_state.stat.lock); >> @@ -332,6 +338,18 @@ static void pvebackup_complete_cb(void *opaque, int ret) >> aio_co_enter(qemu_get_aio_context(), co); >> } >> >> +/* >> + * job_cancel(_sync) does not like to be called from coroutines, so defer to >> + * main loop processing via a bottom half. >> + */ >> +static void job_cancel_bh(void *opaque) { >> + JobCancelData *data = (JobCancelData*)opaque; >> + aio_context_acquire(data->job->aio_context); >> + job_cancel_sync(data->job); >> + aio_context_release(data->job->aio_context); >> + aio_co_schedule(data->ctx, data->co); >> +} >> + >> static void coroutine_fn pvebackup_co_cancel(void *opaque) >> { >> Error *cancel_err = NULL; >> @@ -357,7 +375,13 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque) >> NULL; >> >> if (cancel_job) { >> - job_cancel(&cancel_job->job, false); >> + JobCancelData data = { >> + .ctx = qemu_get_current_aio_context(), >> + .co = qemu_coroutine_self(), >> + .job = &cancel_job->job, >> + }; >> + aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data); >> + qemu_coroutine_yield(); > > Don't we need some kind of synchronization here? The yield does not > guarantee we don't run before the bh is run, or does it? Maybe a condvar > to trigger the coro after the job cancel bh? > No, it cannot race, since we execute the BH in the same context as the coroutine (qemu_get_current_aio_context()). The coroutine thus blocks execution of the BH until it yields. See also code and comment in aio_co_reschedule_self() from 'util/async.c'. >> } >> >> qemu_co_mutex_unlock(&backup_state.backup_mutex); >> -- >> 2.20.1