From: "Dominik Rusovac" <d.rusovac@proxmox.com>
To: "Daniel Kral" <d.kral@proxmox.com>, <pve-devel@lists.proxmox.com>
Subject: Re: [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation
Date: Thu, 26 Mar 2026 11:28:53 +0100 [thread overview]
Message-ID: <DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com> (raw)
In-Reply-To: <20260324183029.1274972-6-d.kral@proxmox.com>
lgtm
pls find my comments inline, mostly relating to nits or tiny things
On Tue Mar 24, 2026 at 7:29 PM CET, Daniel Kral wrote:
> This is a more generic version of the `Usage` implementation from the
> pve_static bindings in the pve_rs repository.
>
> As the upcoming load balancing scheduler actions and dynamic resource
> scheduler will need more information about each resource, this further
> improves on the state tracking of each resource:
>
> In this implementation, a resource is composed of its usage statistics
> and its two essential states: the running state and the node placement.
> The non_exhaustive attribute ensures that usages need to construct the
> a Resource instance through its API.
>
> Users can repeatedly use the current state of Usage to make scheduling
> decisions with the to_scheduler() method. This method takes an
> implementation of UsageAggregator, which dictates how the usage
> information is represented to the Scheduler.
>
> Signed-off-by: Daniel Kral <d.kral@proxmox.com>
> ---
> changes v1 -> v2:
> - new!
>
> This patch is added to move the handling of specific usage stats and
> their (de)serialization to the pve-rs bindings and have the general
> functionality in this crate.
[snip]
nit: imo, it's more convenient to expose the more ergonomic `&str` type,
using:
pub fn resources_iter(&self) -> impl Iterator<Item = &str> {
self.resources.iter().map(String::as_str)
}
> + pub fn resources_iter(&self) -> impl Iterator<Item = &String> {
> + self.resources.iter()
> + }
[snip]
> + pub fn moving_to(&mut self, target_node: String) -> Result<(), Error> {
> + match &self.placement {
> + ResourcePlacement::Stationary { current_node } => {
> + self.placement = ResourcePlacement::Moving {
> + current_node: current_node.to_string(),
nit:
current_node: current_node.to_owned(),
represents the intention best, that is, owning rather than converting
[snip]
> + /// Handles the external removal of a node.
> + ///
> + /// Returns whether the resource does not have any node left.
Considering what it does, I find the name of this function a bit confusing.
> + pub fn remove_node(&mut self, nodename: &str) -> bool {
> + match &self.placement {
> + ResourcePlacement::Stationary { current_node } => current_node == nodename,
> + ResourcePlacement::Moving {
> + current_node,
> + target_node,
> + } => {
> + if current_node == nodename {
> + self.placement = ResourcePlacement::Stationary {
> + current_node: target_node.to_string(),
nit: to_owned() represents the intention best
> + };
> + } else if target_node == nodename {
> + self.placement = ResourcePlacement::Stationary {
> + current_node: current_node.to_string(),
nit: to_owned() represents the intention best
> + };
> + }
> +
> + false
> + }
> + }
> + }
[snip]
> + /// Add a node to the cluster usage.
> + ///
> + /// This method fails if a node with the same `nodename` already exists.
> + pub fn add_node(&mut self, nodename: String, stats: NodeStats) -> Result<(), Error> {
> + if self.nodes.contains_key(&nodename) {
> + bail!("node '{}' already exists", nodename);
nit:
bail!("node '{nodename}' already exists");
> + }
[snip]
we are reading only, consider using a slice for `nodenames` here (just
like for `remove_resource_from_nodes`):
fn add_resource_to_nodes(&mut self, sid: &str, nodenames: &[&str]) -> Result<(), Error> {
pls find the related changes [0] and [1].
> + fn add_resource_to_nodes(&mut self, sid: &str, nodenames: Vec<&str>) -> Result<(), Error> {
> + if nodenames
> + .iter()
> + .any(|nodename| !self.nodes.contains_key(*nodename))
> + {
> + bail!("resource nodes do not exist");
> + }
> +
> + nodenames.iter().for_each(|nodename| {
> + if let Some(node) = self.nodes.get_mut(*nodename) {
> + node.add_resource(sid);
> + }
> + });
> +
> + Ok(())
> + }
[snip]
> + /// Add `resource` with identifier `sid` to cluster usage.
> + ///
> + /// This method fails if a resource with the same `sid` already exists or the resource's nodes
> + /// do not exist in the cluster usage.
> + pub fn add_resource(&mut self, sid: String, resource: Resource) -> Result<(), Error> {
> + if self.resources.contains_key(&sid) {
> + bail!("resource '{}' already exists", sid);
> + }
> +
> + self.add_resource_to_nodes(&sid, resource.nodenames())?;
[0]:
self.add_resource_to_nodes(&sid, &resource.nodenames())?;
> +
> + self.resources.insert(sid.to_string(), resource);
nit: to_owned() instead of of to_string() represents the intention best
[snip]
> + pub fn add_resource_usage_to_node(
> + &mut self,
> + nodename: &str,
> + sid: &str,
> + stats: ResourceStats,
> + ) -> Result<(), Error> {
> + if let Some(resource) = self.resources.get_mut(sid) {
> + resource.moving_to(nodename.to_string())?;
> +
> + self.add_resource_to_nodes(sid, vec![nodename])
[1]:
self.add_resource_to_nodes(sid, &[nodename])
[snip]
> +#[test]
> +fn test_no_duplicate_nodes() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> + match usage.add_node("node1".to_string(), NodeStats::default()) {
> + Ok(_) => bail!("cluster usage does allow duplicate node entries"),
> + Err(_) => Ok(()),
> + }
since this is supposed to be a test case, I would rather assert instead
of bail, using:
assert!(
usage
.add_node("node1".to_string(), NodeStats::default())
.is_err(),
"cluster usage allows duplicate node entries"
);
> +}
> +
> +#[test]
> +fn test_no_duplicate_resources() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Stationary {
> + current_node: "node1".to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource("vm:101".to_string(), resource.clone())?;
> +
> + match usage.add_resource("vm:101".to_string(), resource) {
> + Ok(_) => bail!("cluster usage does allow duplicate resource entries"),
> + Err(_) => Ok(()),
> + }
assert instead of bail:
assert!(
usage.add_resource("vm:101".to_string(), resource).is_err(),
"cluster usage allows duplicate resource entries"
);
> +}
> +
> +#[test]
> +#[allow(deprecated)]
> +fn test_add_resource_usage_to_node() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + usage.add_node("node1".to_string(), NodeStats::default())?;
> + usage.add_node("node2".to_string(), NodeStats::default())?;
> + usage.add_node("node3".to_string(), NodeStats::default())?;
> +
> + usage.add_resource_usage_to_node("node1", "vm:101", ResourceStats::default())?;
> + usage.add_resource_usage_to_node("node2", "vm:101", ResourceStats::default())?;
> +
> + if usage
> + .add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
> + .is_ok()
> + {
> + bail!("add_resource_usage_to_node() allows adding resource to more than two nodes");
> + }
assert instead of bail:
assert!(
usage
.add_resource_usage_to_node("node3", "vm:101", ResourceStats::default())
.is_err(),
"add_resource_usage_to_node() allows adding resource to more than two nodes"
);
> +
> + Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_stationary_resource() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + let (sid, nodename) = ("vm:101", "node1");
> +
> + usage.add_node(nodename.to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Stationary {
> + current_node: nodename.to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource(sid.to_string(), resource)?;
> +
> + match (usage.get_resource(sid), usage.get_node(nodename)) {
> + (Some(_), Some(node)) => {
> + if !node.contains_resource(sid) {
> + bail!("resource '{sid}' was not added to node '{nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' or node '{nodename}' were not added"),
> + }
assert instead of bail:
assert!(
usage.get_resource(sid).is_some(),
"resource '{sid}' was not added"
);
assert!(
usage
.get_node(nodename)
.map(|node| {
assert!(
node.contains_resource(sid),
"resource '{sid}' was not added to node '{nodename}'"
);
})
.is_some(),
"node '{nodename}' was not added"
);
> +
> + usage.remove_resource(sid);
> +
> + match (usage.get_resource(sid), usage.get_node(nodename)) {
> + (None, Some(node)) => {
> + if node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from node '{nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' was not removed"),
> + }
assert instead of bail:
assert!(
usage.get_resource(sid).is_none(),
"resource '{sid}' was not removed"
);
assert!(
usage
.get_node(nodename)
.map(|node| {
assert!(
!node.contains_resource(sid),
"resource '{sid}' was not removed from node '{nodename}'"
);
})
.is_some(),
"node '{nodename}' was not added"
);
> +
> + Ok(())
> +}
> +
> +#[test]
> +fn test_add_remove_moving_resource() -> Result<(), Error> {
> + let mut usage = Usage::new();
> +
> + let (sid, current_nodename, target_nodename) = ("vm:101", "node1", "node2");
> +
> + usage.add_node(current_nodename.to_string(), NodeStats::default())?;
> + usage.add_node(target_nodename.to_string(), NodeStats::default())?;
> +
> + let placement = ResourcePlacement::Moving {
> + current_node: current_nodename.to_string(),
> + target_node: target_nodename.to_string(),
> + };
> + let resource = Resource::new(ResourceStats::default(), ResourceState::Stopped, placement);
> +
> + usage.add_resource(sid.to_string(), resource)?;
> +
analogously, here I'd find asserting more appropriate than bailing
> + match (
> + usage.get_resource(sid),
> + usage.get_node(current_nodename),
> + usage.get_node(target_nodename),
> + ) {
> + (Some(_), Some(current_node), Some(target_node)) => {
> + if !current_node.contains_resource("vm:101") {
> + bail!("resource '{sid}' was not added to current node '{current_nodename}'");
> + }
> +
> + if !target_node.contains_resource("vm:101") {
> + bail!("resource '{sid}' was not added to target node '{target_nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' or nodes were not added"),
> + }
> +
> + usage.remove_resource(sid);
analogously, here I'd find asserting more appropriate than bailing
> +
> + match (
> + usage.get_resource(sid),
> + usage.get_node(current_nodename),
> + usage.get_node(target_nodename),
> + ) {
> + (None, Some(current_node), Some(target_node)) => {
> + if current_node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from current node '{current_nodename}'");
> + }
> +
> + if target_node.contains_resource(sid) {
> + bail!("resource '{sid}' was not removed from target node '{target_nodename}'");
> + }
> + }
> + _ => bail!("resource '{sid}' was not removed"),
> + }
> +
> + Ok(())
> +}
next prev parent reply other threads:[~2026-03-26 10:28 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 18:29 [PATCH cluster/ha-manager/perl-rs/proxmox v2 00/40] dynamic scheduler + load rebalancer Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 01/40] resource-scheduling: inline add_cpu_usage in score_nodes_to_start_service Daniel Kral
2026-03-26 10:10 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 02/40] resource-scheduling: move score_nodes_to_start_service to scheduler crate Daniel Kral
2026-03-26 10:11 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 03/40] resource-scheduling: rename service to resource where appropriate Daniel Kral
2026-03-26 10:12 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 04/40] resource-scheduling: introduce generic scheduler implementation Daniel Kral
2026-03-26 10:19 ` Dominik Rusovac
2026-03-26 14:16 ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 05/40] resource-scheduling: implement generic cluster usage implementation Daniel Kral
2026-03-26 10:28 ` Dominik Rusovac [this message]
2026-03-26 14:15 ` Daniel Kral
2026-03-24 18:29 ` [PATCH proxmox v2 06/40] resource-scheduling: topsis: handle empty criteria without panics Daniel Kral
2026-03-26 10:29 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 07/40] resource-scheduling: compare by nodename in score_nodes_to_start_resource Daniel Kral
2026-03-26 10:29 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 08/40] resource-scheduling: factor out topsis alternative mapping Daniel Kral
2026-03-26 10:30 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH proxmox v2 09/40] resource-scheduling: implement rebalancing migration selection Daniel Kral
2026-03-26 10:34 ` Dominik Rusovac
2026-03-26 14:11 ` Daniel Kral
2026-03-27 9:34 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 10/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_node Daniel Kral
2026-03-27 9:38 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 11/40] pve-rs: resource-scheduling: remove pedantic error handling from remove_service_usage Daniel Kral
2026-03-27 9:39 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 12/40] pve-rs: resource-scheduling: move pve_static into resource_scheduling module Daniel Kral
2026-03-27 9:41 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 13/40] pve-rs: resource-scheduling: use generic usage implementation Daniel Kral
2026-03-27 14:13 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 14/40] pve-rs: resource-scheduling: static: replace deprecated usage structs Daniel Kral
2026-03-27 14:18 ` Dominik Rusovac
2026-03-24 18:29 ` [PATCH perl-rs v2 15/40] pve-rs: resource-scheduling: implement pve_dynamic bindings Daniel Kral
2026-03-27 14:15 ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH perl-rs v2 16/40] pve-rs: resource-scheduling: expose auto rebalancing methods Daniel Kral
2026-03-27 14:16 ` Dominik Rusovac
2026-03-24 18:30 ` [PATCH cluster v2 17/40] datacenter config: restructure verbose description for the ha crs option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 18/40] datacenter config: add dynamic load scheduler option Daniel Kral
2026-03-24 18:30 ` [PATCH cluster v2 19/40] datacenter config: add auto rebalancing options Daniel Kral
2026-03-26 16:08 ` Jillian Morgan
2026-03-26 16:20 ` Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 20/40] env: pve2: implement dynamic node and service stats Daniel Kral
2026-03-25 21:43 ` Thomas Lamprecht
2026-03-24 18:30 ` [PATCH ha-manager v2 21/40] sim: hardware: pass correct types for static stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 22/40] sim: hardware: factor out static stats' default values Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 23/40] sim: hardware: fix static stats guard Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 24/40] sim: hardware: handle dynamic service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 25/40] sim: hardware: add set-dynamic-stats command Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 26/40] sim: hardware: add getters for dynamic {node,service} stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 27/40] usage: pass service data to add_service_usage Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 28/40] usage: pass service data to get_used_service_nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 29/40] add running flag to cluster service stats Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 30/40] usage: use add_service to add service usage to nodes Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 31/40] usage: add dynamic usage scheduler Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 32/40] test: add dynamic usage scheduler test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 33/40] manager: rename execute_migration to queue_resource_motion Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 34/40] manager: update_crs_scheduler_mode: factor out crs config Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 35/40] implement automatic rebalancing Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 36/40] test: add resource bundle generation test cases Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 37/40] test: add dynamic automatic rebalancing system " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 38/40] test: add static " Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 39/40] test: add automatic rebalancing system test cases with TOPSIS method Daniel Kral
2026-03-24 18:30 ` [PATCH ha-manager v2 40/40] test: add automatic rebalancing system test cases with affinity rules Daniel Kral
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DHCN66Y3FTZP.2A9UB5HSV2G68@proxmox.com \
--to=d.rusovac@proxmox.com \
--cc=d.kral@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox