public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
Date: Thu, 26 Mar 2026 11:28:53 +0100	[thread overview]
Message-ID: <DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com> (raw)
In-Reply-To: <20260324183029.1274972-6-d.kral@proxmox.com>

lgtm

pls find my comments inline, mostly relating to nits or tiny things

On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> This patch is added to move the handling of specific usage stats and
> their (de)serialization to the pve-rs bindings and have the general
> functionality in this crate.

[snip]

nit: imo, it's more convenient to expose the more ergonomic `&str` type,
using:

       pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
           self.resources.iter().map(String::as_str)
       }

> +    pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
> +        self.resources.iter()
> +    }

[snip] 

> +    pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => {
> +                self.placement = ResourcePlacement::Moving {
> +                    current_node: current_node.to_string(),

nit: 

                       current_node: current_node.to_owned(), 

represents the intention best, that is, owning rather than converting

[snip]

> +    /// Handles the external removal of a node.
> +    ///
> +    /// Returns whether the resource does not have any node left.

Considering what it does, I find the name of this function a bit confusing.

> +    pub fn remove_node(&mut self, nodename: &str) -> bool {
> +        match &self.placement {
> +            ResourcePlacement::Stationary { current_node } => current_node == nodename,
> +            ResourcePlacement::Moving {
> +                current_node,
> +                target_node,
> +            } => {
> +                if current_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: target_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                } else if target_node == nodename {
> +                    self.placement = ResourcePlacement::Stationary {
> +                        current_node: current_node.to_string(),

nit: to_owned() represents the intention best

> +                    };
> +                }
> +
> +                false
> +            }
> +        }
> +    }

[snip]

> +    /// Add a node to the cluster usage.
> +    ///
> +    /// This method fails if a node with the same `nodename` already exists.
> +    pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
> +        if self.nodes.contains_key(&nodename) {
> +            bail!("node '{}' already exists", nodename);

nit: 

               bail!("node '{nodename}' already exists");

> +        }

[snip]

we are reading only, consider using a slice for `nodenames` here (just
like for `remove_resource_from_nodes`):

       fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {

pls find the related changes [0] and [1].

> +    fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
> +        if nodenames
> +            .iter()
> +            .any(|nodename| !self.nodes.contains_key(*nodename))
> +        {
> +            bail!("resource nodes do not exist");
> +        }
> +
> +        nodenames.iter().for_each(|nodename| {
> +            if let Some(node) = self.nodes.get_mut(*nodename) {
> +                node.add_resource(sid);
> +            }
> +        });
> +
> +        Ok(())
> +    }

[snip]

> +    /// Add `resource` with identifier `sid` to cluster usage.
> +    ///
> +    /// This method fails if a resource with the same `sid` already exists or the resource's nodes
> +    /// do not exist in the cluster usage.
> +    pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
> +        if self.resources.contains_key(&sid) {
> +            bail!("resource '{}' already exists", sid);
> +        }
> +
> +        self.add_resource_to_nodes(&sid, resource.nodenames())?;

[0]:

           self.add_resource_to_nodes(&sid, &resource.nodenames())?;

> +
> +        self.resources.insert(sid.to_string(), resource);

nit: to_owned() instead of of to_string() represents the intention best

[snip]

> +    pub fn add_resource_usage_to_node(
> +        &mut self,
> +        nodename: &str,
> +        sid: &str,
> +        stats: ResourceStats,
> +    ) -> Result<(), Error> {
> +        if let Some(resource) = self.resources.get_mut(sid) {
> +            resource.moving_to(nodename.to_string())?;
> +
> +            self.add_resource_to_nodes(sid, vec![nodename])

[1]:

               self.add_resource_to_nodes(sid, &[nodename])

[snip]

> +#[test]
> +fn test_no_duplicate_nodes() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    match usage.add_node("node1".to_string(), NodeStats::default()) {
> +        Ok(_) => bail!("cluster usage does allow duplicate node entries"),
> +        Err(_) => Ok(()),
> +    }

since this is supposed to be a test case, I would rather assert instead
of bail, using:

    assert!(
        usage
            .add_node("node1".to_string(), NodeStats::default())
            .is_err(),
        "cluster usage allows duplicate node entries"
    );

> +}
> +
> +#[test]
> +fn test_no_duplicate_resources() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: "node1".to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource("vm:101".to_string(), resource.clone())?;
> +
> +    match usage.add_resource("vm:101".to_string(), resource) {
> +        Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
> +        Err(_) => Ok(()),
> +    }

assert instead of bail:

       assert!(
           usage.add_resource("vm:101".to_string(), resource).is_err(),
           "cluster usage allows duplicate resource entries"
       );

> +}
> +
> +#[test]
> +#[allow(deprecated)]
> +fn test_add_resource_usage_to_node() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    usage.add_node("node1".to_string(), NodeStats::default())?;
> +    usage.add_node("node2".to_string(), NodeStats::default())?;
> +    usage.add_node("node3".to_string(), NodeStats::default())?;
> +
> +    usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
> +    usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
> +
> +    if usage
> +        .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
> +        .is_ok()
> +    {
> +        bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
> +    }

assert instead of bail:

       assert!(
           usage
               .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
               .is_err(),
           "add_resource_usage_to_node() allows adding resource to more than two nodes"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_stationary_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, nodename) = ("vm:101", "node1");
> +
> +    usage.add_node(nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Stationary {
> +        current_node: nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (Some(_), Some(node)) => {
> +            if !node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not added to node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_some(),
           "resource '{sid}' was not added"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       node.contains_resource(sid),
                       "resource '{sid}' was not added to node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    usage.remove_resource(sid);
> +
> +    match (usage.get_resource(sid), usage.get_node(nodename)) {
> +        (None, Some(node)) => {
> +            if node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from node '{nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }

assert instead of bail:

       assert!(
           usage.get_resource(sid).is_none(),
           "resource '{sid}' was not removed"
       );
       assert!(
           usage
               .get_node(nodename)
               .map(|node| {
                   assert!(
                       !node.contains_resource(sid),
                       "resource '{sid}' was not removed from node '{nodename}'"
                   );
               })
               .is_some(),
           "node '{nodename}' was not added"
       );

> +
> +    Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_moving_resource() -> Result<(), Error> {
> +    let mut usage = Usage::new();
> +
> +    let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
> +
> +    usage.add_node(current_nodename.to_string(), NodeStats::default())?;
> +    usage.add_node(target_nodename.to_string(), NodeStats::default())?;
> +
> +    let placement = ResourcePlacement::Moving {
> +        current_node: current_nodename.to_string(),
> +        target_node: target_nodename.to_string(),
> +    };
> +    let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> +    usage.add_resource(sid.to_string(), resource)?;
> +

analogously, here I'd find asserting more appropriate than bailing

> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (Some(_), Some(current_node), Some(target_node)) => {
> +            if !current_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to current node '{current_nodename}'");
> +            }
> +
> +            if !target_node.contains_resource("vm:101") {
> +                bail!("resource '{sid}' was not added to target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' or nodes were not added"),
> +    }
> +
> +    usage.remove_resource(sid);

analogously, here I'd find asserting more appropriate than bailing

> +
> +    match (
> +        usage.get_resource(sid),
> +        usage.get_node(current_nodename),
> +        usage.get_node(target_nodename),
> +    ) {
> +        (None, Some(current_node), Some(target_node)) => {
> +            if current_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
> +            }
> +
> +            if target_node.contains_resource(sid) {
> +                bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
> +            }
> +        }
> +        _ => bail!("resource '{sid}' was not removed"),
> +    }
> +
> +    Ok(())
> +}





  reply	other threads:[~2026-03-26 10:28 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19   ` Dominik Rusovac
2026-03-26 14:16     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28   ` Dominik Rusovac [this message]
2026-03-26 14:15     ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34   ` Dominik Rusovac
2026-03-26 14:11     ` Daniel Kral
2026-03-27  9:34       ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27  9:38   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27  9:39   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27  9:41   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18   ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16   ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08   ` Jillian Morgan
2026-03-26 16:20     ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43   ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com \
    --to=d.rusovac@proxmox.com \
    --cc=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal