Re: [RFC datacenter-manager 0/4] add generic, per-remote (and global) cache for remote API responses

all lists on lists.proxmox.com
 help / color / mirror / Atom feed

From: "Lukas Wagner" <l.wagner@proxmox.com>
To: "Dominik Csapak" <d.csapak@proxmox.com>,
	"Lukas Wagner" <l.wagner@proxmox.com>,
	<pdm-devel@lists.proxmox.com>
Subject: Re: [RFC datacenter-manager 0/4] add generic, per-remote (and global) cache for remote API responses
Date: Tue, 12 May 2026 10:39:35 +0200	[thread overview]
Message-ID: <DIGKA3QJ7I26.7IOC5IYD48OR@proxmox.com> (raw)
In-Reply-To: <11d83bb3-d8e7-430f-8af8-64181d78154c@proxmox.com>

Hey Dominik,

thanks for your feedback.

On Mon May 11, 2026 at 2:37 PM CEST, Dominik Csapak wrote:
> hi,
>
> IMO the code looks mostly good to me, as well as the approach, a
> few comments regardless (all here, as the patches themselves are
> not that big ;) )
>
> * using the filesystem can be good, it can reduce the memory footprint
>    since we're not automatically store everything in memory
>
>    this can also be a downside though, since now the IOPS requirements
>    rise, and for frequent update cycles it could have a negative impact
>    on the disks (ssd wearout)
>
>    this could be mitigated by using e.g. /var/run instead of /var/cache
>    but then it's back to using memory all the time, with less
>    clear accounting.
>
>    I'm actually not sure which is better, maybe having that configurable
>    could make sense? (as in, whats the cache path? this way users can
>    decide if they want it on disk or in memory)

FWIW, I did some back-on-the-envelope calculations. I assume that the
cache for the cluster resources would likely be the one entry with the
most writes by far, since that one scales linearly with the number of
resources *and* has to be refreshed periodically (at the moment, we have
a max-age of 30 seconds for the dashboard).

A guest resource (e.g. a VM) is roughly 260 bytes of minified JSON.
If we assume a huge setup with, say, 50000 guests, that would be:

  260 bytes * 50000 = 13 MBytes

  13 MBytes of writes every 30 seconds boils down to

  13 * 2 * 60 * 24 = ~37.5 GB per day of writes (without accounting for
  write amplification effects)

A PDM host for huge setups like these would (hopefully) use
fully-enterprise grade SSDs, so in reality these amounts of writes
*should* be fine, I think. The average PDM setup is likely going be at
least one order of magnitude smaller than this, I would assume.

Also, it's worth mentioning that this is the
worst-case scenario where the dashboard is opened 24/7, if that is not
the case, we only request resource data every couple of minutes.

That being said, since we only have 13 MB of data, we could, as you
said, just put that into /var/run for now, and live with the data loss
on reboot.

>    I'm actually not sure which is better, maybe having that configurable
>    could make sense? (as in, whats the cache path? this way users can
>    decide if they want it on disk or in memory)

I think both options are fine. Storing it on disk would have the slight
advantage that more expensive, rarely fetched API requests (e.g. update
status, requesting a list of updates can, as of now, trigger an 'apt
update' on PVE nodes) can survive reboots of the host.

In practice, as reboots should hopefully be not that common, this really
should not matter too much, I think.

With regards to making it configurable: Yes, could make sense, but I'd
only add this once we actually have users requesting it.

I think I'd put it in /var/run for the next version, going from
non-persistent to persistent seems to be the easier step, in case we
want to change the location in the future.

>
> * currently you expire values if the timestamp is in the future,
>    could it be better to return an error here? IMO there had to be one
>    at some point, either the time was wrong in the past, or the time
>    is wrong now (neither of which is good?)

The question is, do we want to put the burden on the caller to handle
this error variant correctly?

With the current implementation, a timestamp from the future would
simply entail the caller to get a `None`, which in practice means that
the caller will retrieve fresh data from the remote, which will then be
put into the cache, overriding the entry with the bogus timestamp,
resulting in a self-healing effect.

If an error is returned, what would you do in the caller with it? Just
bubble it up to the API level and return a failure? Or handle it in a
way to heal the cache manually?

>
> * not sure if the pdm_cache::instance() abstraction is gaining us
>    anything here, we could simply have freestanding functions?

I guess at the moment we do not gain anything here, true.

I'm more leaning towards this style, since I'm always having my proposal
from https://lore.proxmox.com/pdm-devel/20260129134418.307552-1-l.wagner@proxmox.com/T/#t 
in mind, on which I will hopefully start working on again once I have
the time for it.

This cache would also something that would need to be injected via the
API handler via some application context object, to ensure that we can
drive API handler code in integration tests.

But since this is 100% pdm-internal code, I can just put it behind
helpers for now and refactor it again once we actually need it.

>
> * for the subscription info you chose to implement an async wrapper
>    maybe we could/should have that directly in the interface
>    (of pdm_cache for example) ? otherwise people might use it
>    without a 'spawn_blocking'. having both an async and sync interface
>    could be beneficial
>

Yes, I wondered the same actually. Would you add the async interface to
`pdm_cache`, or the generic implementation in `namespaced_cache`? If we
do the former, we need to add wrappers for the read/write guard types as
well, so that would be quite a bit more code.
If we put it in `namespaced_cache`, we probably should feature-gate them
as soon as we put the cache into a shared crate in proxmox.git

next prev parent reply	other threads:[~2026-05-12  8:40 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 15:03 [RFC datacenter-manager 0/4] add generic, per-remote (and global) cache for remote API responses Lukas Wagner
2026-05-08 15:03 ` [PATCH datacenter-manager 1/4] add persistent, generic, namespaced key-value cache implementation Lukas Wagner
2026-05-12 12:29   ` Wolfgang Bumiller
2026-05-13  8:45     ` Lukas Wagner
2026-05-08 15:03 ` [PATCH datacenter-manager 2/4] add pdm_cache cache as a specialized wrapper around the namespaced cache Lukas Wagner
2026-05-08 15:03 ` [PATCH datacenter-manager 3/4] api: resources: subscriptions: switch over to pdm_cache Lukas Wagner
2026-05-08 15:03 ` [PATCH datacenter-manager 4/4] remote-updates: " Lukas Wagner
2026-05-11 12:37 ` [RFC datacenter-manager 0/4] add generic, per-remote (and global) cache for remote API responses Dominik Csapak
2026-05-12  8:39   ` Lukas Wagner [this message]
2026-05-13 13:56 ` superseded: " Lukas Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DIGKA3QJ7I26.7IOC5IYD48OR@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal