From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B36E71FF146 for ; Tue, 12 May 2026 10:40:13 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6A04B8CBD; Tue, 12 May 2026 10:40:12 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Tue, 12 May 2026 10:39:35 +0200 Message-Id: Subject: Re: [RFC datacenter-manager 0/4] add generic, per-remote (and global) cache for remote API responses From: "Lukas Wagner" To: "Dominik Csapak" , "Lukas Wagner" , X-Mailer: aerc 0.21.0-0-g5549850facc2-dirty References: <20260508150330.363622-1-l.wagner@proxmox.com> <11d83bb3-d8e7-430f-8af8-64181d78154c@proxmox.com> In-Reply-To: <11d83bb3-d8e7-430f-8af8-64181d78154c@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778575063231 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Message-ID-Hash: RP5JQDD3BGBIJKEW3K4Q35QADXQX4SWJ X-Message-ID-Hash: RP5JQDD3BGBIJKEW3K4Q35QADXQX4SWJ X-MailFrom: l.wagner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Hey Dominik, thanks for your feedback. On Mon May 11, 2026 at 2:37 PM CEST, Dominik Csapak wrote: > hi, > > IMO the code looks mostly good to me, as well as the approach, a > few comments regardless (all here, as the patches themselves are > not that big ;) ) > > * using the filesystem can be good, it can reduce the memory footprint > since we're not automatically store everything in memory > > this can also be a downside though, since now the IOPS requirements > rise, and for frequent update cycles it could have a negative impact > on the disks (ssd wearout) > > this could be mitigated by using e.g. /var/run instead of /var/cache > but then it's back to using memory all the time, with less > clear accounting. > > I'm actually not sure which is better, maybe having that configurable > could make sense? (as in, whats the cache path? this way users can > decide if they want it on disk or in memory) FWIW, I did some back-on-the-envelope calculations. I assume that the cache for the cluster resources would likely be the one entry with the most writes by far, since that one scales linearly with the number of resources *and* has to be refreshed periodically (at the moment, we have a max-age of 30 seconds for the dashboard). A guest resource (e.g. a VM) is roughly 260 bytes of minified JSON. If we assume a huge setup with, say, 50000 guests, that would be: 260 bytes * 50000 =3D 13 MBytes 13 MBytes of writes every 30 seconds boils down to 13 * 2 * 60 * 24 =3D ~37.5 GB per day of writes (without accounting for write amplification effects) A PDM host for huge setups like these would (hopefully) use fully-enterprise grade SSDs, so in reality these amounts of writes *should* be fine, I think. The average PDM setup is likely going be at least one order of magnitude smaller than this, I would assume. Also, it's worth mentioning that this is the worst-case scenario where the dashboard is opened 24/7, if that is not the case, we only request resource data every couple of minutes. That being said, since we only have 13 MB of data, we could, as you said, just put that into /var/run for now, and live with the data loss on reboot. > I'm actually not sure which is better, maybe having that configurable > could make sense? (as in, whats the cache path? this way users can > decide if they want it on disk or in memory) I think both options are fine. Storing it on disk would have the slight advantage that more expensive, rarely fetched API requests (e.g. update status, requesting a list of updates can, as of now, trigger an 'apt update' on PVE nodes) can survive reboots of the host. In practice, as reboots should hopefully be not that common, this really should not matter too much, I think. With regards to making it configurable: Yes, could make sense, but I'd only add this once we actually have users requesting it. I think I'd put it in /var/run for the next version, going from non-persistent to persistent seems to be the easier step, in case we want to change the location in the future. > > * currently you expire values if the timestamp is in the future, > could it be better to return an error here? IMO there had to be one > at some point, either the time was wrong in the past, or the time > is wrong now (neither of which is good?) The question is, do we want to put the burden on the caller to handle this error variant correctly? With the current implementation, a timestamp from the future would simply entail the caller to get a `None`, which in practice means that the caller will retrieve fresh data from the remote, which will then be put into the cache, overriding the entry with the bogus timestamp, resulting in a self-healing effect. If an error is returned, what would you do in the caller with it? Just bubble it up to the API level and return a failure? Or handle it in a way to heal the cache manually? > > * not sure if the pdm_cache::instance() abstraction is gaining us > anything here, we could simply have freestanding functions? I guess at the moment we do not gain anything here, true. I'm more leaning towards this style, since I'm always having my proposal from https://lore.proxmox.com/pdm-devel/20260129134418.307552-1-l.wagner@pr= oxmox.com/T/#t=20 in mind, on which I will hopefully start working on again once I have the time for it. This cache would also something that would need to be injected via the API handler via some application context object, to ensure that we can drive API handler code in integration tests. But since this is 100% pdm-internal code, I can just put it behind helpers for now and refactor it again once we actually need it. > > * for the subscription info you chose to implement an async wrapper > maybe we could/should have that directly in the interface > (of pdm_cache for example) ? otherwise people might use it > without a 'spawn_blocking'. having both an async and sync interface > could be beneficial > Yes, I wondered the same actually. Would you add the async interface to `pdm_cache`, or the generic implementation in `namespaced_cache`? If we do the former, we need to add wrappers for the read/write guard types as well, so that would be quite a bit more code. If we put it in `namespaced_cache`, we probably should feature-gate them as soon as we put the cache into a shared crate in proxmox.git