From: "Max Carrara" <m.carrara@proxmox.com>
To: "Dominik Csapak" <d.csapak@proxmox.com>,
"Proxmox Backup Server development discussion"
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] RFC: Scheduler for PBS
Date: Fri, 09 Aug 2024 16:20:30 +0200 [thread overview]
Message-ID: <D3BG3XJ5W0JS.XJ852MRIKQ16@proxmox.com> (raw)
In-Reply-To: <ff31232c-7f29-4859-9204-6c579b2ef7a5@proxmox.com>
On Fri Aug 9, 2024 at 2:52 PM CEST, Dominik Csapak wrote:
> Hi,
>
> great to see that you tackle this!
>
> I read through the overview, which sounds fine, but I think that it
> should more reflect the actual issues, namely limitations in memory,
> threads, disk io and network.
>
> The actual reason people want to schedule things is to not overload the system
> (because of timeouts, hangs, etc.) so any scheduling system should consider
> not only the amount of jobs, but how much resources the the job will/can
> utilize.
>
> E.g. when I tried to introduce multi-threaded tape backup (configurable threads
> per tape job), Thomas rightfully said that it's probably not a good idea, since
> making multiple parallel tape backup job increases the load by much more than before.
>
> I generally like the approach, but I personally would like to see some
> work with resource constraints, for example one could imagine a configurable
> amount of available threads and (configurable?) used thread by job type
>
> so i can set my available to e.g. 10 and if my tape backup jobs then get
> 4, i can start 2 in parallel but not more
>
> Such a system does not have to be included from the beginning IMO, but the
> architecture should be prepared for such things
>
> Does that make sense?
That does make sense, yes! Thanks for bringing this to our attention.
We've just discussed this off-list a bit and mostly agree on stuff like
e.g. the thread limit per worker - though to be sure, do you mean the
number of threads that are passed to e.g. a `ParallelHandler` and
similar?
The scheduler doesn't really have a way to *really* enforce any limits,
though with the event-based architecture, it should be fairly trivial to
just add new fields to the scheduler's config.
We want to have a kind of "top-down control", so once the scheduler can
actually spawn and manage tasks itself (not like how it's done right
now, see my response to Chris), the scheduler could give the task a
separate thread pool for the stuff it wants to run in parallel. There
could even be different "types" of thread pools depending on the
purpose.
This is much easier said than done though, but I'm honestly rather
confident that we can get this to work. I would prefer to have the
resource-checking and -management decoupled and warded off, so that the
scheduler itself isn't really concerned with that. Rather, it should ask
the (e.g.) `ResourceManager` if there are enough threads available for a
`JobType::TapeBackup` or something of the sort.
Another thing we've been discussing just now was to just give the
spawned task a struct representing the limits it should abide to - that
would be a soft limit, but it would make things probably a lot easier.
(After all, passing a thread pool to the task also doesn't mean the task
*has* to use that thread pool...)
One thing I just discovered is tokio's `Semaphore` [1], which we could use
to keep track of the resources we've been handing out.
So, IMO this is a good idea and something we definitely should consider
in the future, though I have a couple questions:
1. How would you track & enforce memory limits? I think this is a much
harder problem, to be honest.
2. In the same vein, how could one find out how much memory a given task
will use? There's nothing that prevents tasks from just allocating
more memory at will, obviously.
Do you rather mean that if there's e.g. >90% memory being used (can
be made configurable), that we're not spawning any additional tasks?
3. How would you limit disk IO? We definitely want to add a limit for
the number of jobs that can run on a datastore at a time, so I guess
that would also be indirectly included there..?
(It could probably also be done with tokio's `Semaphore` [1], but
we'd need some kind of abstraction on top of that, because we can
still just read / write / open / close at will etc. We would need a
uniform way of accessing disk resources and *not* use any other way
to perform disk IO otherwise, which will be *hard*)
4. I guess network limits (e.g. bandwidth limits for sync jobs etc.)
could just be enforced on the TCP socket, so this shouldn't be too
hard. That way you could enforce individual rate limits for
individual tasks. Though, probably also easier said than done. Can
you elaborate some more on this, too?
Thanks a lot for your input, you've given us lots of ideas as well! :)
[1]: https://docs.rs/tokio/latest/tokio/sync/struct.Semaphore.html
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
prev parent reply other threads:[~2024-08-09 14:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 9:31 Max Carrara
2024-08-09 11:22 ` Christian Ebner
2024-08-09 12:33 ` Max Carrara
2024-08-09 12:52 ` Dominik Csapak
2024-08-09 14:20 ` Max Carrara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D3BG3XJ5W0JS.XJ852MRIKQ16@proxmox.com \
--to=m.carrara@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox