[pbs-devel] RFC: Scheduler for PBS

From: "Max Carrara" <m.carrara@proxmox.com>
To: <pbs-devel@lists.proxmox.com>
Subject: [pbs-devel] RFC: Scheduler for PBS
Date: Fri, 09 Aug 2024 11:31:19 +0200	[thread overview]
Message-ID: <D3B9YIIHKRP2.2EX6MKWE3C0NP@proxmox.com> (raw)

RFC: Scheduler for PBS
======================

Introduction
------------

Gabriel and I have been prototyping a new scheduler for PBS, mostly in
order to address #3086 [1]. We will first summarize this bugzilla issue
and elaborate on why we think that implementing a scheduler inside PBS
is the best way to solve it.

Furthermore, this RFC shall provide a high-level overview on our plans
and what we have managed to implement thus far. Additionally, we will
outline a couple of other problems the scheduler, specifically with our
current architecture, could solve in the future.

We are doing this mostly because we want to gather some early feedback
on our design and our plans, and whether we should continue going in
this direction or not.

We also want to gather some thoughts on some other problems in
particular that we think should be addressed adequately in order for the
scheduler to work in an efficient and robust manner.

Summary of #3086: Limiting the Number of Parallel Backups
---------------------------------------------------------

The RFE #3086 [1] can be summarized as follows:

Currently, there is no way to limit the number of backup jobs that may
run in parallel.

This is not necessarily a problem for smaller setups or clusters, where
the administrator can schedule all backup jobs of all PVE hosts to run
at a different time.

However, trying to manually coordinate backup jobs of multiple hosts
becomes increasingly cumbersome as one's infrastructure (number of
hosts) scales up. An administrator would need to be able to accurately
estimate how long each job would take, for each VM, on each host.

Additionally, running many backups in parallel risks oversaturating the
network, causing a drop in bandwidth for the running backups - this in
turn can affect running VMs [2].

Why a Scheduler is Necessary
----------------------------

The above issue seemingly cannot be solved through bandwidth limits or
similar, which is why we believe that implementing a scheduler inside
PBS is the correct route to go.

This belief was reinforced after the prototypes we developed were able
to successfully queue and schedule many backups that were launched in
parallel (from the CLI) at once. Thus no other limits or "workarounds"
were necessary, all backup jobs were eventually completed.

This makes it much easier for administrators to ensure that their
backups actually happen (and succeed) while removing the need to
manually adjust the timing of when backups are made, so that none
overlap.

Architectural Overview
----------------------

The scheduler internally contains the type of job queue that is being
used, which in our case is a simple FIFO queue. We also used HTTP
long-polling [3] to schedule backup jobs, responding to the client only
when the backup job is started.

While long-polling appears to work fine for our current intents and
purposes, we still want to test if any alternatives (e.g.
"short-polling", as in normal polling) are more robust.

The main way to communicate with the scheduler is via its event loop.
This is a plain tokio task with an inner `loop` that matches on an enum
representing the different events / messages the scheduler may handle.
Such an event would be e.g. `NewBackupRequest` or `ConfigUpdate`.

The event loop receives events via an mpsc channel and may respond to
them individually via oneshot channels which are set up when certain
events are created. The benefit of tokio's channels is that they can
also work in blocking contexts, so it is possible to completely isolate
the scheduler in a separate thread if needed, for example.

Because users should also be able to dynamically configure the
scheduler, configuration changes are handled via the `ConfigUpdate`
event. That way even the type of the queue can be changed on the fly,
which one prototype is able to do.

Furthermore, our prototypes currently run inside `proxmox-backup-proxy`
and are reasonably decoupled from the rest of PBS, due to the scheduler
being event-based.

Backward Compatibility Considerations
-------------------------------------

We are still in the process of adequately handling backward compat. At
the moment, HTTP long-polling lets us support older clients as well; no
issues have appeared thus far.

However, we are also considering at least one separate API endpoint for
polling purposes and overall better client support, as we feel that this
might be safer and allows us to handle errors more gracefully.

Future Plans & Possibilities
----------------------------

1. Because the scheduler is keeping track of which jobs are currently
   running, it is relatively straightforward to check whether a job for
   the same group on the same datastore is running already. This makes
   it possible to queue the conflicting job, instead of having it fail
   immediately when trying to acquire the lock.

2. The scheduler should be in full control over when and which
   `WorkerTask`sare spawned, as that makes it much easier to handle
   errors. At the same time, the overall architecture of PBS would
   become much cleaner, by clearly separating concerns instead of having
   e.g. large API methods that do many things all at once [4].

3. The architecture of the scheduler is flexible enough to support
   different kinds of jobs in the future, so that e.g. prune, GC, sync
   jobs etc. may also be queued. This is definitely something we are
   considering of implementing as well.

4. Should more types of jobs be implemented in the scheduler, separate
   limits could also be set for each job. For example, the global job
   limit could be set to 10, while allowing a maximum of 10 backup jobs
   and 2 sync jobs to run concurrently. That way users can prefer
   backup jobs over other jobs, or vice versa.

5. In addition to a global limit, limits could also be set for
   individual users and API tokens. This would allow for even
   finer-grained job control, but is more costly to implement.

Final Thoughts
--------------

Please let us know what you think - we believe that implementing a
scheduler can potentially solve a vast amount of issues of users that
aim to scale up their infrastructure.

Thank you for reading! :)

References
----------

[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=3086
[2]: https://bugzilla.proxmox.com/show_bug.cgi?id=3086#c2
[3]: https://www.rfc-editor.org/rfc/rfc6202#section-2.1
[4]: https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=src/api2/backup/mod.rs;h=ea0d0292ec587382b154d436ce358e78fc723d0a;hb=refs/heads/master#l71

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel