From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 3FB721FF15E for ; Fri, 9 Aug 2024 14:34:14 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B5B113DCCB; Fri, 9 Aug 2024 14:34:24 +0200 (CEST) Mime-Version: 1.0 Date: Fri, 09 Aug 2024 14:33:49 +0200 Message-Id: To: "Christian Ebner" , "Proxmox Backup Server development discussion" From: "Max Carrara" X-Mailer: aerc 0.17.0-72-g6a84f1331f1c References: <1045470937.13120.1723202551905@webmail.proxmox.com> In-Reply-To: <1045470937.13120.1723202551905@webmail.proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.033 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pbs-devel] RFC: Scheduler for PBS X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On Fri Aug 9, 2024 at 1:22 PM CEST, Christian Ebner wrote: > > On 09.08.2024 11:31 CEST Max Carrara wrote: > > Architectural Overview > > ---------------------- > > > > The scheduler internally contains the type of job queue that is being > > used, which in our case is a simple FIFO queue. We also used HTTP > > long-polling [3] to schedule backup jobs, responding to the client only > > when the backup job is started. > > > > While long-polling appears to work fine for our current intents and > > purposes, we still want to test if any alternatives (e.g. > > "short-polling", as in normal polling) are more robust. > > > > The main way to communicate with the scheduler is via its event loop. > > This is a plain tokio task with an inner `loop` that matches on an enum > > representing the different events / messages the scheduler may handle. > > Such an event would be e.g. `NewBackupRequest` or `ConfigUpdate`. > > > > The event loop receives events via an mpsc channel and may respond to > > them individually via oneshot channels which are set up when certain > > events are created. The benefit of tokio's channels is that they can > > also work in blocking contexts, so it is possible to completely isolate > > the scheduler in a separate thread if needed, for example. > > > > Because users should also be able to dynamically configure the > > scheduler, configuration changes are handled via the `ConfigUpdate` > > event. That way even the type of the queue can be changed on the fly, > > which one prototype is able to do. > > > > Furthermore, our prototypes currently run inside `proxmox-backup-proxy` > > and are reasonably decoupled from the rest of PBS, due to the scheduler > > being event-based. > > Thanks for the write-up, this does sound interesting! Thanks for reading! Glad you like it! > > Do you plan to also include the notification system, e.g. by sending out notification events based on events/messages handled by the scheduler? Or will that solely be handled by the worker tasks? We haven't considered this yet, but that does actually sound pretty interesting - we could probably use the tokio broadcast channel for that. That way other components could react to whatever is going on inside the scheduler. Damn, I like this idea. I'll definitely keep it in mind. Regarding worker tasks - I assume you mean `WorkerTask` here - I would personally like to rewrite those, as they currently don't return a handle, which is needed to check whether a task panicked or was cancelled somehow. (If you hit CTRL+C on the CLI, the finish-event will never reach the scheduler, thus the job is never removed from the running set of jobs.) We'll probably need to change the return type of `WorkerTask::spawn` and `WorkerTask::new_thread`, but I'd personally like to have the scheduler do all the spawning of tasks itself and introduce a type of worker task that's more integrated with the scheduler,, so we don't need to needlessly pass the `JoinHandle`s around (and also don't use `String`s for every thing in the universe). I hope that we could perhaps gradually transition from `WorkerTask` to `WhateverTheNewTaskIsCalled`, as that would make things much less painful, but it would need a lot of churn nevertheless, I think. So yes, I think I'd prefer the scheduler to emit events itself. A worker task should IMO just focus on what it's supposed to do, after all. > > What about periodic tasks that should be run at a given time, e.g. for server side alerts/monitoring tasks [0]? From you description I suppose these would simply be a different job type, and therefore be queued/executed based on their priority? Currently we check if a periodic task (like GC, Sync, etc.) needs to be run every minute (IIRC), which again is something that the scheduler (or rather a new component of it) could handle. The current loop we have for that is actually pretty fine - it could instead just send jobs to the scheduler instead of launching any periodic jobs itself. For such alerts and monitoring things this could be done in a similar way, I believe - if we add the "broadcasting idea" from above into the mix, we could have some kind of "monitoring service" that listens to the stuff the scheduler does. If e.g. the scheduler hasn't emitted a `BackupJobCompleted` event (or whatever) for a while, the monitoring service could send out an alert to the admin. What do you think about that? We don't have any handling for job priorities or something of the sort yet, as we mostly focused on getting a basic FIFO queue working (while trying to remain backward-compatible with the current behaviour). However, this should be fairly trivial to implement as well - each job could e.g. get a default priority of `0`; higher values mean that a job has a higher priority (or we use an enum to represent those prios). The rest could be done by the queue - we could just integrate that with the FIFO queue (or even introduce a new queue type, just because we can now ;P /j) Or we could add a separate queue for periodic jobs - the scheduler could simply prefer those over "requested" jobs. Lots of possibilities, really. Also, because of the event loop, it's really easy to just add more events and `match` on them. In fact, I like this pattern so much that I think we should adopt it in other places too. > > Can you already share some code (maybe of one of the prototypes), so one can have a closer look and do some initial testing or is it still to experimental for that? Yes -- Gabriel and I both have our prototypes in our staff repos! :) When you test things, do keep in mind that it still doesn't play too nicely with PVE (regarding e.g. when to fs-freeze / fs-thaw and a bunch of other things) - that in particular is one of the reasons why we think that we'll need at least one new endpoint for the scheduling stuff. (Probably with some kind of (long-)polling mechanism as well.) Thanks again for reading our RFC - you've given me *lots* of new ideas. I'm really curious what others have to say to this as well. :) > > Cheers, > Chris > > [0] https://bugzilla.proxmox.com/show_bug.cgi?id=5108 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel