public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Lukas Wagner" <l.wagner@proxmox.com>
To: "Thomas Lamprecht" <t.lamprecht@proxmox.com>,
	<pdm-devel@lists.proxmox.com>
Subject: Re: [PATCH datacenter-manager v2 4/8] subscription: add key pool and node status API endpoints
Date: Thu, 07 May 2026 15:23:39 +0200	[thread overview]
Message-ID: <DICH6VKB9KZW.3FTPRM1NTBJML@proxmox.com> (raw)
In-Reply-To: <20260507082943.2749725-5-t.lamprecht@proxmox.com>

Hi!

not a deep review of the code yet, I mostly focused on checking out the
UX and API design so far. Some notes inline.

On Thu May 7, 2026 at 10:26 AM CEST, Thomas Lamprecht wrote:
[...]
> +
> +const KEYS_ROUTER: Router = Router::new()
> +    .get(&API_METHOD_LIST_KEYS)
> +    .post(&API_METHOD_ADD_KEYS)
> +    .match_all("key", &KEY_ITEM_ROUTER);
> +
> +const KEY_ITEM_ROUTER: Router = Router::new()
> +    .get(&API_METHOD_GET_KEY)
> +    .put(&API_METHOD_ASSIGN_KEY)

Maybe instead of PUTing a key to assign/unassign it, we could rather
have a 

POST .../keys/<id>/assignment
  and
DELETE .../keys/<id>/assignment

Then the POST could have remote and node as non-optional params, I
think?


> +    .delete(&API_METHOD_DELETE_KEY);
> +
> +/// Force-fresh node-status query so the next view reflects the new state instead of returning a
> +/// cached entry up to 5 minutes later. Used by auto-assign / apply-pending / clear-pending to
> +/// avoid double-driving a node that has already moved to Active in the cache window.
> +const FRESH_NODE_STATUS_MAX_AGE: u64 = 0;
> +
> +/// Cached node-status freshness used by read-only views. Five minutes matches the resource-cache
> +/// convention and is short enough that admins rarely see stale data on the panel.
> +const PANEL_NODE_STATUS_MAX_AGE: u64 = 5 * 60;
> +

[...]

> +
> +/// Pre-lock check for [`assign_key`]'s unassign path: returns the (remote, node) the entry is
> +/// currently active on, if any, so the lock-protected branch can refuse the unassign and prompt
> +/// the operator to Reissue Key instead. Returns `None` for entries with no binding, no live
> +/// subscription, or a live subscription whose key does not match the entry.
> +async fn check_synced_assignment_for_unassign(
> +    key: &str,
> +) -> Result<Option<(String, String)>, Error> {
> +    let (config, _) = pdm_config::subscriptions::config()?;
> +    let Some(entry) = config.get(key) else {
> +        return Ok(None);
> +    };
> +    let (Some(prev_remote), Some(prev_node)) = (entry.remote.clone(), entry.node.clone()) else {
> +        return Ok(None);
> +    };
> +    let (remotes_config, _) = pdm_config::remotes::config()?;
> +    let Some(remote_entry) = remotes_config.get(&prev_remote) else {
> +        return Ok(None);
> +    };
> +    let live = match get_subscription_info_for_remote(remote_entry, FRESH_NODE_STATUS_MAX_AGE).await
> +    {
> +        Ok(v) => v,
> +        Err(_) => return Ok(None),
> +    };
> +    let synced = live
> +        .get(&prev_node)
> +        .and_then(|info| info.as_ref())
> +        .map(|info| {
> +            info.status == proxmox_subscription::SubscriptionStatus::Active
> +                && info.key.as_deref() == Some(key)
> +        })
> +        .unwrap_or(false);
> +    Ok(synced.then_some((prev_remote, prev_node)))
> +}
> +
> +/// Push a single key to its assigned remote node. Operates on a borrowed `Remote` so the
> +/// caller can fetch the remotes-config once and reuse it.
> +async fn push_key_to_remote(remote: &Remote, key: &str, node_name: &str) -> Result<(), Error> {
> +    let product_type =
> +        ProductType::from_key(key).ok_or_else(|| format_err!("unrecognised key format: {key}"))?;
> +
> +    // PVE and PBS share `proxmox_client::Client`, so `make_pbs_client_and_login` works for both;
> +    // only the PUT path differs.
> +    let path = match product_type {
> +        ProductType::Pve => format!("/api2/extjs/nodes/{node_name}/subscription"),
> +        ProductType::Pbs => "/api2/extjs/nodes/localhost/subscription".to_string(),
> +        ProductType::Pmg | ProductType::Pom => {
> +            bail!("PDM cannot push '{product_type}' keys: no remote support yet");
> +        }
> +    };
> +
> +    let client = crate::connection::make_pbs_client_and_login(remote).await?;
> +
> +    client
> +        .0
> +        .put(&path, &serde_json::json!({ "key": key }))
> +        .await?;
> +    client.0.post(&path, &serde_json::json!({})).await?;
> +
> +    info!("pushed key '{key}' to {}/{node_name}", remote.id);
> +    Ok(())

naturally this should use proper API bindings from pbs_client and
pve-api-types at some point. FWIW, I have patches ready for both, I can
supply them if desired (can also happen as a follow-up, since your
approach obviously works too for now)

> +}
> +
> +#[api(
> +    input: {
> +        properties: {
> +            "max-age": {
> +                type: u64,
> +                optional: true,
> +                description: "Override the cache freshness window in seconds. \
> +                              Default 300 for panel views; pass 0 to force a fresh query.",
> +            },
> +        },
> +    },
> +    returns: {
> +        type: Array,
> +        description: "Subscription status of all remote nodes the user can audit.",
> +        items: { type: RemoteNodeStatus },
> +    },
> +    access: {
> +        permission: &Permission::Privilege(&["system"], PRIV_SYS_AUDIT, false),
> +    },
> +)]
> +/// Get the subscription status of every remote node the caller can audit, combined with key pool
> +/// assignment information.
> +///
> +/// Per-remote `PRIV_RESOURCE_AUDIT` is enforced inside the handler so users only see remotes
> +/// they may audit.
> +async fn node_status(
> +    max_age: Option<u64>,
> +    rpcenv: &mut dyn RpcEnvironment,
> +) -> Result<Vec<RemoteNodeStatus>, Error> {
> +    collect_node_status(max_age.unwrap_or(PANEL_NODE_STATUS_MAX_AGE), rpcenv).await
> +}
> +
> +/// Shared helper: fan out subscription queries to all remotes the caller has audit privilege on,
> +/// in parallel, reusing the per-remote `SUBSCRIPTION_CACHE` via `get_subscription_info_for_remote`.
> +/// Joins the results with the key-pool assignment table.
> +async fn collect_node_status(
> +    max_age: u64,
> +    rpcenv: &mut dyn RpcEnvironment,
> +) -> Result<Vec<RemoteNodeStatus>, Error> {
> +    let auth_id: Authid = rpcenv
> +        .get_auth_id()
> +        .context("no authid available")?
> +        .parse()?;
> +    let user_info = CachedUserInfo::new()?;
> +
> +    let visible_remotes: Vec<(String, Remote)> = crate::api::remotes::RemoteIterator::new()?
> +        .any_privs(&user_info, &auth_id, PRIV_RESOURCE_AUDIT)
> +        .into_iter()
> +        .collect();
> +
> +    let (keys_config, _) = pdm_config::subscriptions::config()?;
> +
> +    // `get_subscription_info_for_remote` re-uses the per-remote `SUBSCRIPTION_CACHE` so this
> +    // fan-out is safe to run concurrently.
> +    let fetch = visible_remotes.iter().map(|(name, remote)| async move {
> +        let res = get_subscription_info_for_remote(remote, max_age).await;
> +        (name.clone(), remote.ty, res)
> +    });
> +    let results = join_all(fetch).await;
> +
> +    let mut out = Vec::new();
> +    for (remote_name, remote_ty, result) in results {
> +        let node_infos = match result {
> +            Ok(info) => info,
> +            Err(err) => {
> +                warn!("failed to query subscription for remote {remote_name}: {err}");
> +                continue;
> +            }
> +        };
> +
> +        for (node_name, node_info) in &node_infos {
> +            let (status, level, sockets, current_key) = match node_info {
> +                Some(info) => (info.status, info.level, info.sockets, info.key.clone()),
> +                None => (
> +                    proxmox_subscription::SubscriptionStatus::NotFound,
> +                    SubscriptionLevel::None,
> +                    None,
> +                    None,
> +                ),
> +            };
> +
> +            let assigned_key = keys_config
> +                .iter()
> +                .find(|(_id, entry)| {
> +                    entry.remote.as_deref() == Some(remote_name.as_str())
> +                        && entry.node.as_deref() == Some(node_name.as_str())
> +                })
> +                .map(|(_id, entry)| entry.key.clone());
> +
> +            out.push(RemoteNodeStatus {
> +                remote: remote_name.clone(),
> +                ty: remote_ty,
> +                node: node_name.to_string(),
> +                sockets,
> +                status,
> +                level,
> +                assigned_key,
> +                current_key,
> +            });
> +        }
> +    }
> +
> +    out.sort_by(|a, b| (&a.remote, &a.node).cmp(&(&b.remote, &b.node)));
> +    Ok(out)
> +}
> +
> +#[api(
> +    input: {
> +        properties: {
> +            apply: {

I'd rather call this 'assign', as to avoid confusion with the 'apply-pending' action? 

(but with my comment down below this would become obsolete)

> +                type: bool,
> +                optional: true,
> +                default: false,
> +                description: "Actually apply the proposed assignments. Without this, only a preview is returned.",
> +            },
> +        },
> +    },
> +    returns: {
> +        type: Array,
> +        description: "List of proposed or applied assignments.",
> +        items: { type: ProposedAssignment },
> +    },
> +    access: {
> +        permission: &Permission::Privilege(&["system"], PRIV_SYS_MODIFY, false),
> +    },
> +)]
> +/// Propose or apply automatic key-to-node assignments.
> +///
> +/// Matches unused pool keys to remote nodes that do not yet have a pool-assigned key, picking
> +/// the smallest PVE key that covers each node's socket count. When `apply=true`, the live node
> +/// statuses are fetched first (without holding the config lock - sync locks must not span
> +/// awaits), then proposals are computed and persisted under the lock with a per-key re-check
> +/// against the now-current pool state, so a parallel admin edit between fetch and apply does
> +/// not get silently overwritten.
> +async fn auto_assign(
> +    apply: Option<bool>,

AFAIK, this can just be 'apply: bool' (or rather 'assign: bool'), the
default should be automatically injected

> +    rpcenv: &mut dyn RpcEnvironment,
> +) -> Result<Vec<ProposedAssignment>, Error> {
> +    let apply = apply.unwrap_or(false);

... then you can drop this assignment here.

> +
> +    let auth_id: Authid = rpcenv
> +        .get_auth_id()
> +        .context("no authid available")?
> +        .parse()?;
> +    let user_info = CachedUserInfo::new()?;
> +
> +    let node_statuses = collect_node_status(FRESH_NODE_STATUS_MAX_AGE, rpcenv).await?;
> +
> +    if !apply {
> +        let (config, _digest) = pdm_config::subscriptions::config()?;
> +        return Ok(compute_proposals(&config, &node_statuses));
> +    }

Maybe I'm missing something, but how do we actually ensure that the
preview is then the same thing as what is assigned later when 'apply' is
set to true?

The proposal is computed both times from scratch, so if anything changes
(available keys, node subscription status) in the mean while, the
outcome from POST .../auto-assign?apply=1 is not the same that was shown
in the UI earlier?

It's probably not *that* big of an issue in real life (after all, the
time between seeing the preview and pressing 'Apply' is rather short
usually), but I guess with a slightly different API design this could be
eliminated?

Maybe, the auto-assign endpoint could always just return a proposal, and
then there is a 'bulk assign' endpoint, which the UI then uses to submit
the (confirmed) proposal. The 'bulk assign' endpoint could then fail if
any of the assumptions from the proposal do not hold any more (e.g.
digests changed).

That proposal also would allow us to allow edits of the proposal in the
preview dialog before it is applied/assigned.

What do you think?

> +
> +    let _lock = pdm_config::subscriptions::lock_config()?;
> +    let (mut config, _digest) = pdm_config::subscriptions::config()?;
> +    let mut proposals = compute_proposals(&config, &node_statuses);
> +
> +    // Audit-only callers may see a remote in the preview but must not be able to stage a write
> +    // for it that another admin would later push on their behalf.
> +    proposals.retain(|p| {
> +        user_info.lookup_privs(&auth_id, &["resource", &p.remote]) & PRIV_RESOURCE_MODIFY != 0
> +    });
> +
> +    for p in &proposals {
> +        if let Some(entry) = config.get_mut(&p.key) {
> +            // Skip keys that another writer assigned between the preview and the lock.
> +            if entry.remote.is_none() {
> +                entry.remote = Some(p.remote.clone());
> +                entry.node = Some(p.node.clone());
> +            }
> +        }
> +    }
> +    pdm_config::subscriptions::save_config(&config)?;
> +
> +    Ok(proposals)
> +}
> +





  reply	other threads:[~2026-05-07 13:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07  8:26 [PATCH datacenter-manager v2 0/8] subscription: add central key pool registry with reissue support Thomas Lamprecht
2026-05-07  8:26 ` [PATCH datacenter-manager v2 1/8] api: subscription cache: ensure max_age=0 forces a fresh fetch Thomas Lamprecht
2026-05-07 13:23   ` Lukas Wagner
2026-05-08 12:43   ` applied: " Lukas Wagner
2026-05-07  8:26 ` [PATCH datacenter-manager v2 2/8] api types: subscription level: render full names Thomas Lamprecht
2026-05-07 13:23   ` Lukas Wagner
2026-05-07  8:26 ` [PATCH datacenter-manager v2 3/8] subscription: add key pool data model and config layer Thomas Lamprecht
2026-05-07  8:26 ` [PATCH datacenter-manager v2 4/8] subscription: add key pool and node status API endpoints Thomas Lamprecht
2026-05-07 13:23   ` Lukas Wagner [this message]
2026-05-07  8:26 ` [PATCH datacenter-manager v2 5/8] ui: add subscription registry with key pool and node status Thomas Lamprecht
2026-05-07  8:26 ` [PATCH datacenter-manager v2 6/8] cli: add subscription key pool management subcommands Thomas Lamprecht
2026-05-07  8:26 ` [PATCH datacenter-manager v2 7/8] docs: add subscription registry chapter Thomas Lamprecht
2026-05-07  8:26 ` [PATCH datacenter-manager v2 8/8] subscription: add Reissue Key action with pending-reissue queue Thomas Lamprecht
2026-05-07  8:34 ` [PATCH datacenter-manager v2 9/9] fixup! ui: add subscription registry with key pool and node status Thomas Lamprecht
2026-05-07 13:23 ` [PATCH datacenter-manager v2 0/8] subscription: add central key pool registry with reissue support Lukas Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DICH6VKB9KZW.3FTPRM1NTBJML@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal