all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
Date: Thu, 26 Mar 2026 11:28:53 +0100	[thread overview]
Message-ID: <DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com> (raw)
In-Reply-To: <20260324183029.1274972-6-d.kral@proxmox.com>

lgtm

pls find my comments inline, mostly relating to nits or tiny things

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> This patch is added to move the handling of specific usage stats and
> their (de)serialization to the pve-rs bindings and have the general
> functionality in this crate.

[snip]

nit: imo, it's more convenient to expose the more ergonomic `&str` type,
using:

       pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
           self.resources.iter().map(String::as_str)
       }

> +    pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
> +        self.resources.iter()
> +    }

[snip] 

> +    pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => {
> +                self.placement = ResourcePlacement::Moving {
> +                    current_node: current_node.to_string(),

nit: 

                       current_node: current_node.to_owned(), 

represents the intention best, that is, owning rather than converting

[snip]

> +    /// Handles the external removal of a node.
> +    ///
> +    /// Returns whether the resource does not have any node left.

Considering what it does, I find the name of this function a bit confusing.

> +    pub fn remove_node(&mut self, nodename: &str) -> bool {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => current_node == nodename,
> +            ResourcePlacement::Moving {
> +                current_node,
> +                target_node,
> +            } => {
> +                if current_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: target_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                } else if target_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: current_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                }
> +
> +                false
> +            }
> +        }
> +    }

[snip]

> +    /// Add a node to the cluster usage.
> +    ///
> +    /// This method fails if a node with the same `nodename` already exists.
> +    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
> +        if self.nodes.contains_key(&nodename) {
> +            bail!("node '{}' already exists", nodename);

nit: 

               bail!("node '{nodename}' already exists");

> +        }

[snip]

we are reading only, consider using a slice for `nodenames` here (just
like for `remove_resource_from_nodes`):

       fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {

pls find the related changes [0] and [1].

> +    fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
> +        if nodenames
> +            .iter()
> +            .any(|nodename| !self.nodes.contains_key(*nodename))
> +        {
> +            bail!("resource nodes do not exist");
> +        }
> +
> +        nodenames.iter().for_each(|nodename| {
> +            if let Some(node) = self.nodes.get_mut(*nodename) {
> +                node.add_resource(sid);
> +            }
> +        });
> +
> +        Ok(())
> +    }

[snip]

> +    /// Add `resource` with identifier `sid` to cluster usage.
> +    ///
> +    /// This method fails if a resource with the same `sid` already exists or the resource's nodes
> +    /// do not exist in the cluster usage.
> +    pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
> +        if self.resources.contains_key(&sid) {
> +            bail!("resource '{}' already exists", sid);
> +        }
> +
> +        self.add_resource_to_nodes(&sid, resource.nodenames())?;

[0]:

           self.add_resource_to_nodes(&sid, &resource.nodenames())?;

> +
> +        self.resources.insert(sid.to_string(), resource);

nit: to_owned() instead of of to_string() represents the intention best

[snip]

> +    pub fn add_resource_usage_to_node(
> +        &mut self,
> +        nodename: &str,
> +        sid: &str,
> +        stats: ResourceStats,
> +    ) -> Result<(), Error> {
> +        if let Some(resource) = self.resources.get_mut(sid) {
> +            resource.moving_to(nodename.to_string())?;
> +
> +            self.add_resource_to_nodes(sid, vec![nodename])

[1]:

               self.add_resource_to_nodes(sid, &[nodename])

[snip]

> +#[test]
> +fn test_no_duplicate_nodes() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    match usage.add_node("node1".to_string(), NodeStats::default()) {
> +        Ok(_) => bail!("cluster usage does allow duplicate node entries"),
> +        Err(_) => Ok(()),
> +    }

since this is supposed to be a test case, I would rather assert instead
of bail, using:

    assert!(
        usage
            .add_node("node1".to_string(), NodeStats::default())
            .is_err(),
        "cluster usage allows duplicate node entries"
    );

> +}
> +
> +#[test]
> +fn test_no_duplicate_resources() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: "node1".to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource("vm:101".to_string(), resource.clone())?;
> +
> +    match usage.add_resource("vm:101".to_string(), resource) {
> +        Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
> +        Err(_) => Ok(()),
> +    }

assert instead of bail:

       assert!(
           usage.add_resource("vm:101".to_string(), resource).is_err(),
           "cluster usage allows duplicate resource entries"
       );

> +}
> +
> +#[test]
> +#[allow(deprecated)]
> +fn test_add_resource_usage_to_node() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +    usage.add_node("node2".to_string(), NodeStats::default())?;
> +    usage.add_node("node3".to_string(), NodeStats::default())?;
> +
> +    usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
> +    usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
> +
> +    if usage
> +        .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
> +        .is_ok()
> +    {
> +        bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
> +    }

assert instead of bail:

       assert!(
           usage
               .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
               .is_err(),
           "add_resource_usage_to_node() allows adding resource to more than two nodes"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_stationary_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, nodename) = ("vm:101", "node1");
> +
> +    usage.add_node(nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (Some(_), Some(node)) => {
> +            if !node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not added to node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_some(),
           "resource '{sid}' was not added"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       node.contains_resource(sid),
                       "resource '{sid}' was not added to node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    usage.remove_resource(sid);
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (None, Some(node)) => {
> +            if node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_none(),
           "resource '{sid}' was not removed"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       !node.contains_resource(sid),
                       "resource '{sid}' was not removed from node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_moving_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
> +
> +    usage.add_node(current_nodename.to_string(), NodeStats::default())?;
> +    usage.add_node(target_nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Moving {
> +        current_node: current_nodename.to_string(),
> +        target_node: target_nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +

analogously, here I'd find asserting more appropriate than bailing

> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (Some(_), Some(current_node), Some(target_node)) => {
> +            if !current_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to current node '{current_nodename}'");
> +            }
> +
> +            if !target_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or nodes were not added"),
> +    }
> +
> +    usage.remove_resource(sid);

analogously, here I'd find asserting more appropriate than bailing

> +
> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (None, Some(current_node), Some(target_node)) => {
> +            if current_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
> +            }
> +
> +            if target_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }
> +
> +    Ok(())
> +}





  reply	other threads:[~2026-03-26 10:28 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac [this message]
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com \
    --to=d.rusovac@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal