public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Lukas Wagner <l.wagner@proxmox.com>
Cc: pdm-devel@lists.proxmox.com, Thomas Lamprecht <t.lamprecht@proxmox.com>
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 00/15] change task cache mechanism from time-based to max-size FIFO
Date: Fri, 31 Jan 2025 14:51:09 +0100	[thread overview]
Message-ID: <kbgks7vsitxj6jc5rolaf2rmzipatdkof3se2qvhjqu7kfkvnm@igccxcf6thro> (raw)
In-Reply-To: <fi3ol436pv4oghojose6ucwe3c2swo5knmasqc5gvnnpdczudg@wt5vnlglambd>

On Fri, Jan 31, 2025 at 02:36:07PM +0100, Wolfgang Bumiller wrote:
> On Fri, Jan 31, 2025 at 10:35:03AM +0100, Lukas Wagner wrote:
> > 
> > 
> > On  2025-01-28 13:25, Lukas Wagner wrote:
> > > This patch series changes the remote task caching behavior from being purely
> > > time-based cache to a FIFO cache-replacement policy with a maximum number of
> > > cached tasks per remote. If the maximum number is exceeded, the oldest tasks
> > > are dropped from the cache.
> > > 
> > > When calling the remote-tasks API, the latest missing task data which is not
> > > yet in the cache is requested from the remotes. There we limit this to once
> > > every 5 minutes at the moment, with the option for a force-refresh (to be triggered
> > > by a refresh button in the UI). As before, we augment the cached task data
> > > with the currently running tracked tasks which were started by PDM.
> > > 
> > > Some words about the cache storage implementation:
> > > Note that the storage backend for this cache probably needs some more love in
> > > the future. Right now its just a single JSON file for everything, mainly because this
> > > was the quickest approach to implement to unblock UI development work. 
> > > The problem with the approach is that it does not perform well with setups
> > > with a large number of remotes, since every update to the cache rewrites
> > > the entire cache file when the cache is persisted, causing additional
> > > CPU and IO load.
> > > 
> > > In the future, we should use a similar mechanism as the task archive in PBS.
> > > I'm not sure if the exact same mechanism can be used due to some different
> > > requirements, but the general direct probably fits quite well.
> > > If we can reuse it 1:1, we have to break it out of (iirc) the WorkerTask struct
> > > to be reusable.
> > > It will probably require some experimentation and benchmarking to find an
> > > ideal approach.
> > > We probably don't want a separate archive per remote, since we do not want to
> > > read hundres/thousands of files when we request an aggregated remote task history
> > > via the API. But having the complete archive in a single file also seems quite
> > > challenging - we need to keep the data sorted, while we also need to
> > > handle task data arriving out of order from different remotes. Keeping
> > > the data sorted when new data arrives leads us to the same problem as with
> > > JSON file, being that we have to rewrite the file over and over again, causing
> > > load and writes.
> > > 
> > > The good news is that since this is just a cache, we are pretty free to change
> > > the storage implementation without too much trouble; we don't even have to
> > > migrate the old data, since it should not be an issue to simply request the
> > > data from the remotes again. This is the main reason why I've opted
> > > to keep the JSON file for now; I or somebody else can revisit this at a later
> > > time.
> > > 
> > 
> > Some additional context as to explain the 'why', since @Thomas and @Wolfgang requested it:
> > 
> > The status quo for the task cache is to fetch a certain time-based range of tasks
> > (iirc the last seven days) from every remote and cache this for a certain period
> > of time (max-age). If the cached data is too old, we discard the task data
> > and fetch the same time range again.
> > My initial reasoning behind designing it like this was to keep the
> > 'ground truth' completely on the remote side, so *if* somebody were to mess with
> 
> I don't think "messing with the task archive" is something we want to
> worry about unless it's "easy enough".
> 
> > the task archive, we would be consistent after a refresh. Also this allowed to
> > keep the caching logic on PDM side much simpler, since we not doing much more
> > than caching the API response from the remote - the same way we already do it
> > for resources, subscription status, etc.
> > 
> > The downside is we had some unnecessary traffic, since we keep on fetching old tasks
> > that we already received.
> > 
> > I originally posted this as an RFC right before the holidays to get an initial approach
> > out to get some feedback on the approach (I wasn't too sure myself if the original approach
> > was a good idea) and also to somewhat unblock UI development for the remote task view.
> 
> There are definitely some things which need to be improved regardless of
> which version we use.

Since this does not fit into the current patches as it is already an
issue in the original code:

Also mentioned this off list, but this is something general to remember:

Whenever we `spawn()` a longer running task we also need to take into
account the possibility that the daemons might be reloaded, where these
tasks would prevent the original one from shutting down (and potentially
race against file locks in the reloaded one), so they should `select()`
on a `proxmox_daemon::shutdown_future()`.

For this case, this means the list of tasks which are currently being
polled in futures needs to be persisted somewhere so the reloaded daemon
can pick it up and continue the polling while the polling task in the
old daemon just gets cancelled. AFAICT this should be fairly
unproblematic here.


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel


  reply	other threads:[~2025-01-31 13:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-28 12:25 Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/15] pdm-api-types: derive Debug and PartialEq for TaskListItem Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/15] test support: add NamedTempFile helper Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/15] task cache: add basic test for TaskCache Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/15] task cache: remove max-age machanism Lukas Wagner
2025-01-29 18:27   ` Thomas Lamprecht
2025-01-30  8:01     ` Lukas Wagner
2025-01-30 16:06       ` Thomas Lamprecht
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/15] task cache: add FIFO cache replacement policy Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/15] remote tasks: move to dir based module Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/15] task cache: move to its own submodule Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/15] task cache: fetch every 5mins, requesting only missing tasks Lukas Wagner
2025-01-31 13:42   ` Wolfgang Bumiller
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/15] remote tasks: return tasks in stable order Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/15] remote tasks: allow to force-fetch latest tasks Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/15] fake remote: add missing fields to make the debug feature compile again Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/15] fake remote: generate fake task data Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/15] task cache: tests: improve test coverage Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/15] remote tasks: fix unused variable warning Lukas Wagner
2025-01-28 12:25 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/15] remote-tasks: restrict function visibility Lukas Wagner
2025-01-31  9:35 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/15] change task cache mechanism from time-based to max-size FIFO Lukas Wagner
2025-01-31 13:36   ` Wolfgang Bumiller
2025-01-31 13:51     ` Wolfgang Bumiller [this message]
2025-02-05 15:34   ` Thomas Lamprecht
2025-02-06 10:13     ` Lukas Wagner
2025-02-12  9:19       ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=kbgks7vsitxj6jc5rolaf2rmzipatdkof3se2qvhjqu7kfkvnm@igccxcf6thro \
    --to=w.bumiller@proxmox.com \
    --cc=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal