From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id ACCC01FF14C for ; Fri, 15 May 2026 14:57:01 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id AEF9FE041; Fri, 15 May 2026 14:57:00 +0200 (CEST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 15 May 2026 14:56:55 +0200 Message-Id: From: "Lukas Wagner" To: "Thomas Lamprecht" , "Lukas Wagner" Subject: Re: [PATCH datacenter-manager 4/4] remote-updates: switch over to new api_cache X-Mailer: aerc 0.21.0-0-g5549850facc2-dirty References: <20260513135457.573414-1-l.wagner@proxmox.com> <20260513135457.573414-5-l.wagner@proxmox.com> <20260515090637.950992-4-t.lamprecht@proxmox.com> In-Reply-To: <20260515090637.950992-4-t.lamprecht@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778849809529 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: Y7B3GQNFTPI2WQMJG4YASZDFDNQCH4A2 X-Message-ID-Hash: Y7B3GQNFTPI2WQMJG4YASZDFDNQCH4A2 X-MailFrom: l.wagner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: pdm-devel@lists.proxmox.com X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On Fri May 15, 2026 at 11:06 AM CEST, Thomas Lamprecht wrote: > On Wed, 13 May 2026 15:54:57 +0200, Lukas Wagner wrote: >> diff --git a/server/src/remote_updates.rs b/server/src/remote_updates.rs >> --- a/server/src/remote_updates.rs >> +++ b/server/src/remote_updates.rs >> @@ -179,10 +167,11 @@ async fn update_cached_summary_for_node( >> node: String, >> node_data: NodeUpdateSummary, >> ) -> Result<(), Error> { >> - let mut file =3D File::open(UPDATE_CACHE)?; >> - let mut cache_content: UpdateSummary =3D serde_json::from_reader(&m= ut file)?; >> - let remote_entry =3D >> - cache_content >> + let cache =3D api_cache::write_global().await?; >> + let cache_content =3D cache.get::(UPDATE_SUMMARY_CAC= HE_KEY).await?; >> + >> + if let Some(mut entry) =3D cache_content { >> + let remote_entry =3D entry >> .remotes >> .entry(remote.id) >> .or_insert_with(|| RemoteUpdateSummary { >> @@ -191,15 +180,9 @@ async fn update_cached_summary_for_node( >> status: RemoteUpdateStatus::Success, >> }); >> >> - remote_entry.nodes.insert(node, node_data); > [...] >> + remote_entry.nodes.insert(node, node_data); >> + cache.set(UPDATE_SUMMARY_CACHE_KEY, entry).await?; >> + } >> >> Ok(()) >> } > > Small behaviour change worth a second look: the old code did > `File::open(UPDATE_CACHE)?`, which returned an error if the cache file > did not exist. Yeah, that actually was problematic before. It did not really surface, since at the moment we trigger the remote update fetching for all remotes very soon after daemon startup. > The new code uses `cache.get(..)`, which returns `Ok(None)` for that > case, and the `if let Some(..)` then silently skips the write. > So, if `list_available_updates` is called before any refresh has > populated the cache, its result is now thrown away silently instead of > surfaced as an error. > If you want this code path to be able to create the initial cache entry > as well, replacing the `if let Some(..)` with `let mut entry =3D > cache_content.unwrap_or_default();` would do it. > Did exactly that, thanks! >> @@ -212,7 +195,7 @@ pub async fn refresh_update_summary_cache(remotes: V= ec) -> Result<(), Er >> .do_for_all_remote_nodes(remotes.clone().into_iter(), fetch_ava= ilable_updates) >> .await; >> >> - let mut content =3D get_cached_summary_or_default()?; >> + let mut content =3D get_cached_summary_or_default().await?; >> >> // Clean out any remotes that might have been removed from the remo= te config in the meanwhile. >> content >> @@ -275,8 +258,28 @@ pub async fn refresh_update_summary_cache(remotes: = Vec) -> Result<(), Er >> } >> } >> >> - let options =3D proxmox_product_config::default_create_options(); >> - proxmox_sys::fs::replace_file(UPDATE_CACHE, &serde_json::to_vec(&co= ntent)?, options, true)?; >> + cleanup_old_cachefile().await?; >> + >> + let cache =3D api_cache::write_global().await?; >> + cache.set(UPDATE_SUMMARY_CACHE_KEY, content).await?; >> + >> + Ok(()) >> +} > > Two things on the final write: > > - `get_cached_summary_or_default()` above takes a read lock and drops > it again before `write_global()` is called down here. If > `update_cached_summary_for_node` runs in that gap, the entry it just > wrote will be overwritten by the `cache.set` below. Holding a single > write lock for the whole function would prevent that. Not sure if we > strictly need that guarantee though. Moved the write_global() up top to make sure that the lock is held while updating the values in the UpdateSummary data structure. Thanks! > > - `cleanup_old_cachefile` runs before the new `cache.set`. If the `set` > ever fails, the old file is already gone, so the next refresh starts > from an empty cache.=20 We don't really migrate contents over from the old cachefile, but start from a clean slate anyway with the new cache, so I don't think this makes any difference? > Either move the cleanup after the successful > write, or check whether the old file exists first so the cleanup only > runs once. The cleanup literally just make sure that the old, unused cachefile does not litter /var/cache/... - it's also fine to run 'cleanup_old_cachefile' multiple times. It would re-attempt to delete the file, but explicitly catches the 'file does not exist' case and does not log or throw an error then. I changed the order of operations anyway (so, set before cleanup), but I don't really think it makes a difference. (But maybe I'm missing something?)