[pve-devel] [RFC] PVE-Backup: create jobs in a drained section

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [RFC] PVE-Backup: create jobs in a drained section
@ 2023-02-08 14:07 Fiona Ebner
  2023-09-05 13:18 ` Fiona Ebner
  0 siblings, 1 reply; 2+ messages in thread
From: Fiona Ebner @ 2023-02-08 14:07 UTC (permalink / raw)
  To: pve-devel

With the drive-backup QMP command, upstream QEMU uses a drained
section for the source drive when creating the backup job. Do the same
here to avoid potential subtle bugs in the future if upstream starts
relying on that behavior.

There, the drained section extends until after the job is started, but
this cannot be done here for mutlti-disk backups (could at most start
the first job). The important thing is that the copy-before-write node
is in place and the bcs bitmap is initialized, which both happen
during job creation (ensured by the "block/backup: move bcs bitmap
initialization to job creation" PVE patch).

Inserting the copy-before-write node is already protected with a
drained section, which is why there should be no actual issue right
now. While that drained section does not extend until the bcs bitmap
initialization, it should also not be an issue currently, because the
job is not created from a coroutine (and even if, there would need to
be a yield point in between).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

@Thomas: The stuff with making sure source and target use the same
AioContext I talked about off-list doesn't seem to be required, or
to be more precise, it already happens: when copy-before-write opens
the target, it is opened in the same AioContext. There even is an
assertion in block-copy for source and target to use the same
AioContext, so we couldn't mess it up without noticing.

 pve-backup.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/pve-backup.c b/pve-backup.c
index 3ca4f74cb8..75791c2313 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -508,12 +508,16 @@ static void create_backup_jobs_bh(void *opaque) {
         AioContext *aio_context = bdrv_get_aio_context(di->bs);
         aio_context_acquire(aio_context);

+        bdrv_drained_begin(di->bs);
+
         BlockJob *job = backup_job_create(
             NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
             bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
             BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
             &local_err);

+        bdrv_drained_end(di->bs);
+
         aio_context_release(aio_context);

         di->job = job;
-- 
2.30.2

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [pve-devel] [RFC] PVE-Backup: create jobs in a drained section
  2023-02-08 14:07 [pve-devel] [RFC] PVE-Backup: create jobs in a drained section Fiona Ebner
@ 2023-09-05 13:18 ` Fiona Ebner
  0 siblings, 0 replies; 2+ messages in thread
From: Fiona Ebner @ 2023-09-05 13:18 UTC (permalink / raw)
  To: pve-devel

Am 08.02.23 um 15:07 schrieb Fiona Ebner:
> 
> Inserting the copy-before-write node is already protected with a
> drained section, which is why there should be no actual issue right
> now. While that drained section does not extend until the bcs bitmap
> initialization, it should also not be an issue currently, because the
> job is not created from a coroutine (and even if, there would need to
> be a yield point in between).
> 

My explanation why it's currently not required is wrong. Because there
can be an IO thread which can interact with the bitmap and that is what
happens (with my reproducer) for the crash reported here [0]. I couldn't
reproduce it anymore with this RFC applied.

A sketch of what happens is:

Notes:
* Each time a block-copy request is created, it resets the dirty bitmap
at the corresponding range.
* Thread 1 is still doing the backup setup, cbw_open() and
backup_init_bcs_bitmap() happen in backup_job_create()
* The check if a request can be created for a given range relies on the
dirty bitmap.

  Thread 1 bdrv_dirty_bitmap_merge_internal() as part of cbw_open()
A Thread 3 bdrv_reset_dirty_bitmap(offset=x, bytes=4MiB)
  Thread 1 bdrv_clear_dirty_bitmap() as part of backup_init_bcs_bitmap()
  Thread 1 bdrv_dirty_bitmap_merge_internal() as part of
           backup_init_bcs_bitmap()
B
  Thread 3 bdrv_reset_dirty_bitmap(offset=0, bytes=4MiB)
  Thread 3 bdrv_reset_dirty_bitmap(offset=4MiB, bytes=4MiB)
  ....
C Thread 3 bdrv_reset_dirty_bitmap(offset=x, bytes=4MiB)

Note that at time B, there can be a mismatch between the bitmap and the
request list, if merging didn't by chance set the bits at the location
for request A again.

Then, if C happens before A has finished, an assert will trigger,
because there already is a request in the same range:

> block_copy_task_create: Assertion `!reqlist_find_conflict(&s->reqs, offset, bytes)' failed
C doesn't need to cover exactly the same range as A of course, just
overlap, but it did for me.

That said, I also tried reproducing the issue with QEMU 7.2, but didn't
manage to yet. I'll take another look and then re-send this for QEMU 8.0
with a fixed commit message.

[0]: https://forum.proxmox.com/threads/133149/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-05 13:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-08 14:07 [pve-devel] [RFC] PVE-Backup: create jobs in a drained section Fiona Ebner
2023-09-05 13:18 ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal