From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id D33371FF13F for ; Thu, 07 May 2026 15:23:44 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B6C921CFC3; Thu, 7 May 2026 15:23:44 +0200 (CEST) Content-Type: text/plain; charset=UTF-8 Date: Thu, 07 May 2026 15:23:39 +0200 Message-Id: From: "Lukas Wagner" To: "Thomas Lamprecht" , Subject: Re: [PATCH datacenter-manager v2 4/8] subscription: add key pool and node status API endpoints Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 X-Mailer: aerc 0.21.0-0-g5549850facc2-dirty References: <20260507082943.2749725-1-t.lamprecht@proxmox.com> <20260507082943.2749725-5-t.lamprecht@proxmox.com> In-Reply-To: <20260507082943.2749725-5-t.lamprecht@proxmox.com> X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1778160111149 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: VNUZ3WNJRJSBGMZEEMSPXHKSQ3GK6YIZ X-Message-ID-Hash: VNUZ3WNJRJSBGMZEEMSPXHKSQ3GK6YIZ X-MailFrom: l.wagner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: Hi! not a deep review of the code yet, I mostly focused on checking out the UX and API design so far. Some notes inline. On Thu May 7, 2026 at 10:26 AM CEST, Thomas Lamprecht wrote: [...] > + > +const KEYS_ROUTER: Router =3D Router::new() > + .get(&API_METHOD_LIST_KEYS) > + .post(&API_METHOD_ADD_KEYS) > + .match_all("key", &KEY_ITEM_ROUTER); > + > +const KEY_ITEM_ROUTER: Router =3D Router::new() > + .get(&API_METHOD_GET_KEY) > + .put(&API_METHOD_ASSIGN_KEY) Maybe instead of PUTing a key to assign/unassign it, we could rather have a=20 POST .../keys//assignment and DELETE .../keys//assignment Then the POST could have remote and node as non-optional params, I think? > + .delete(&API_METHOD_DELETE_KEY); > + > +/// Force-fresh node-status query so the next view reflects the new stat= e instead of returning a > +/// cached entry up to 5 minutes later. Used by auto-assign / apply-pend= ing / clear-pending to > +/// avoid double-driving a node that has already moved to Active in the = cache window. > +const FRESH_NODE_STATUS_MAX_AGE: u64 =3D 0; > + > +/// Cached node-status freshness used by read-only views. Five minutes m= atches the resource-cache > +/// convention and is short enough that admins rarely see stale data on = the panel. > +const PANEL_NODE_STATUS_MAX_AGE: u64 =3D 5 * 60; > + [...] > + > +/// Pre-lock check for [`assign_key`]'s unassign path: returns the (remo= te, node) the entry is > +/// currently active on, if any, so the lock-protected branch can refuse= the unassign and prompt > +/// the operator to Reissue Key instead. Returns `None` for entries with= no binding, no live > +/// subscription, or a live subscription whose key does not match the en= try. > +async fn check_synced_assignment_for_unassign( > + key: &str, > +) -> Result, Error> { > + let (config, _) =3D pdm_config::subscriptions::config()?; > + let Some(entry) =3D config.get(key) else { > + return Ok(None); > + }; > + let (Some(prev_remote), Some(prev_node)) =3D (entry.remote.clone(), = entry.node.clone()) else { > + return Ok(None); > + }; > + let (remotes_config, _) =3D pdm_config::remotes::config()?; > + let Some(remote_entry) =3D remotes_config.get(&prev_remote) else { > + return Ok(None); > + }; > + let live =3D match get_subscription_info_for_remote(remote_entry, FR= ESH_NODE_STATUS_MAX_AGE).await > + { > + Ok(v) =3D> v, > + Err(_) =3D> return Ok(None), > + }; > + let synced =3D live > + .get(&prev_node) > + .and_then(|info| info.as_ref()) > + .map(|info| { > + info.status =3D=3D proxmox_subscription::SubscriptionStatus:= :Active > + && info.key.as_deref() =3D=3D Some(key) > + }) > + .unwrap_or(false); > + Ok(synced.then_some((prev_remote, prev_node))) > +} > + > +/// Push a single key to its assigned remote node. Operates on a borrowe= d `Remote` so the > +/// caller can fetch the remotes-config once and reuse it. > +async fn push_key_to_remote(remote: &Remote, key: &str, node_name: &str)= -> Result<(), Error> { > + let product_type =3D > + ProductType::from_key(key).ok_or_else(|| format_err!("unrecognis= ed key format: {key}"))?; > + > + // PVE and PBS share `proxmox_client::Client`, so `make_pbs_client_a= nd_login` works for both; > + // only the PUT path differs. > + let path =3D match product_type { > + ProductType::Pve =3D> format!("/api2/extjs/nodes/{node_name}/sub= scription"), > + ProductType::Pbs =3D> "/api2/extjs/nodes/localhost/subscription"= .to_string(), > + ProductType::Pmg | ProductType::Pom =3D> { > + bail!("PDM cannot push '{product_type}' keys: no remote supp= ort yet"); > + } > + }; > + > + let client =3D crate::connection::make_pbs_client_and_login(remote).= await?; > + > + client > + .0 > + .put(&path, &serde_json::json!({ "key": key })) > + .await?; > + client.0.post(&path, &serde_json::json!({})).await?; > + > + info!("pushed key '{key}' to {}/{node_name}", remote.id); > + Ok(()) naturally this should use proper API bindings from pbs_client and pve-api-types at some point. FWIW, I have patches ready for both, I can supply them if desired (can also happen as a follow-up, since your approach obviously works too for now) > +} > + > +#[api( > + input: { > + properties: { > + "max-age": { > + type: u64, > + optional: true, > + description: "Override the cache freshness window in sec= onds. \ > + Default 300 for panel views; pass 0 to for= ce a fresh query.", > + }, > + }, > + }, > + returns: { > + type: Array, > + description: "Subscription status of all remote nodes the user c= an audit.", > + items: { type: RemoteNodeStatus }, > + }, > + access: { > + permission: &Permission::Privilege(&["system"], PRIV_SYS_AUDIT, = false), > + }, > +)] > +/// Get the subscription status of every remote node the caller can audi= t, combined with key pool > +/// assignment information. > +/// > +/// Per-remote `PRIV_RESOURCE_AUDIT` is enforced inside the handler so u= sers only see remotes > +/// they may audit. > +async fn node_status( > + max_age: Option, > + rpcenv: &mut dyn RpcEnvironment, > +) -> Result, Error> { > + collect_node_status(max_age.unwrap_or(PANEL_NODE_STATUS_MAX_AGE), rp= cenv).await > +} > + > +/// Shared helper: fan out subscription queries to all remotes the calle= r has audit privilege on, > +/// in parallel, reusing the per-remote `SUBSCRIPTION_CACHE` via `get_su= bscription_info_for_remote`. > +/// Joins the results with the key-pool assignment table. > +async fn collect_node_status( > + max_age: u64, > + rpcenv: &mut dyn RpcEnvironment, > +) -> Result, Error> { > + let auth_id: Authid =3D rpcenv > + .get_auth_id() > + .context("no authid available")? > + .parse()?; > + let user_info =3D CachedUserInfo::new()?; > + > + let visible_remotes: Vec<(String, Remote)> =3D crate::api::remotes::= RemoteIterator::new()? > + .any_privs(&user_info, &auth_id, PRIV_RESOURCE_AUDIT) > + .into_iter() > + .collect(); > + > + let (keys_config, _) =3D pdm_config::subscriptions::config()?; > + > + // `get_subscription_info_for_remote` re-uses the per-remote `SUBSCR= IPTION_CACHE` so this > + // fan-out is safe to run concurrently. > + let fetch =3D visible_remotes.iter().map(|(name, remote)| async move= { > + let res =3D get_subscription_info_for_remote(remote, max_age).aw= ait; > + (name.clone(), remote.ty, res) > + }); > + let results =3D join_all(fetch).await; > + > + let mut out =3D Vec::new(); > + for (remote_name, remote_ty, result) in results { > + let node_infos =3D match result { > + Ok(info) =3D> info, > + Err(err) =3D> { > + warn!("failed to query subscription for remote {remote_n= ame}: {err}"); > + continue; > + } > + }; > + > + for (node_name, node_info) in &node_infos { > + let (status, level, sockets, current_key) =3D match node_inf= o { > + Some(info) =3D> (info.status, info.level, info.sockets, = info.key.clone()), > + None =3D> ( > + proxmox_subscription::SubscriptionStatus::NotFound, > + SubscriptionLevel::None, > + None, > + None, > + ), > + }; > + > + let assigned_key =3D keys_config > + .iter() > + .find(|(_id, entry)| { > + entry.remote.as_deref() =3D=3D Some(remote_name.as_s= tr()) > + && entry.node.as_deref() =3D=3D Some(node_name.a= s_str()) > + }) > + .map(|(_id, entry)| entry.key.clone()); > + > + out.push(RemoteNodeStatus { > + remote: remote_name.clone(), > + ty: remote_ty, > + node: node_name.to_string(), > + sockets, > + status, > + level, > + assigned_key, > + current_key, > + }); > + } > + } > + > + out.sort_by(|a, b| (&a.remote, &a.node).cmp(&(&b.remote, &b.node))); > + Ok(out) > +} > + > +#[api( > + input: { > + properties: { > + apply: { I'd rather call this 'assign', as to avoid confusion with the 'apply-pendin= g' action?=20 (but with my comment down below this would become obsolete) > + type: bool, > + optional: true, > + default: false, > + description: "Actually apply the proposed assignments. W= ithout this, only a preview is returned.", > + }, > + }, > + }, > + returns: { > + type: Array, > + description: "List of proposed or applied assignments.", > + items: { type: ProposedAssignment }, > + }, > + access: { > + permission: &Permission::Privilege(&["system"], PRIV_SYS_MODIFY,= false), > + }, > +)] > +/// Propose or apply automatic key-to-node assignments. > +/// > +/// Matches unused pool keys to remote nodes that do not yet have a pool= -assigned key, picking > +/// the smallest PVE key that covers each node's socket count. When `app= ly=3Dtrue`, the live node > +/// statuses are fetched first (without holding the config lock - sync l= ocks must not span > +/// awaits), then proposals are computed and persisted under the lock wi= th a per-key re-check > +/// against the now-current pool state, so a parallel admin edit between= fetch and apply does > +/// not get silently overwritten. > +async fn auto_assign( > + apply: Option, AFAIK, this can just be 'apply: bool' (or rather 'assign: bool'), the default should be automatically injected > + rpcenv: &mut dyn RpcEnvironment, > +) -> Result, Error> { > + let apply =3D apply.unwrap_or(false); ... then you can drop this assignment here. > + > + let auth_id: Authid =3D rpcenv > + .get_auth_id() > + .context("no authid available")? > + .parse()?; > + let user_info =3D CachedUserInfo::new()?; > + > + let node_statuses =3D collect_node_status(FRESH_NODE_STATUS_MAX_AGE,= rpcenv).await?; > + > + if !apply { > + let (config, _digest) =3D pdm_config::subscriptions::config()?; > + return Ok(compute_proposals(&config, &node_statuses)); > + } Maybe I'm missing something, but how do we actually ensure that the preview is then the same thing as what is assigned later when 'apply' is set to true? The proposal is computed both times from scratch, so if anything changes (available keys, node subscription status) in the mean while, the outcome from POST .../auto-assign?apply=3D1 is not the same that was shown in the UI earlier? It's probably not *that* big of an issue in real life (after all, the time between seeing the preview and pressing 'Apply' is rather short usually), but I guess with a slightly different API design this could be eliminated? Maybe, the auto-assign endpoint could always just return a proposal, and then there is a 'bulk assign' endpoint, which the UI then uses to submit the (confirmed) proposal. The 'bulk assign' endpoint could then fail if any of the assumptions from the proposal do not hold any more (e.g. digests changed). That proposal also would allow us to allow edits of the proposal in the preview dialog before it is applied/assigned. What do you think? > + > + let _lock =3D pdm_config::subscriptions::lock_config()?; > + let (mut config, _digest) =3D pdm_config::subscriptions::config()?; > + let mut proposals =3D compute_proposals(&config, &node_statuses); > + > + // Audit-only callers may see a remote in the preview but must not b= e able to stage a write > + // for it that another admin would later push on their behalf. > + proposals.retain(|p| { > + user_info.lookup_privs(&auth_id, &["resource", &p.remote]) & PRI= V_RESOURCE_MODIFY !=3D 0 > + }); > + > + for p in &proposals { > + if let Some(entry) =3D config.get_mut(&p.key) { > + // Skip keys that another writer assigned between the previe= w and the lock. > + if entry.remote.is_none() { > + entry.remote =3D Some(p.remote.clone()); > + entry.node =3D Some(p.node.clone()); > + } > + } > + } > + pdm_config::subscriptions::save_config(&config)?; > + > + Ok(proposals) > +} > +