* [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel @ 2025-08-28 14:42 Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw) To: pdm-devel Still work in progress (code style improvements, documentation, naming), but sharing early because Stefan might need something similar. Please do not merge yet. proxmox-datacenter-manager: Lukas Wagner (2): server: add convenience helper to fetch results from many remote nodes in parallel remote task fetching: use ParallelFetcher helper .../tasks/remote_tasks.rs | 239 ++++++------------ server/src/lib.rs | 1 + server/src/parallel_fetcher.rs | 236 +++++++++++++++++ 3 files changed, 314 insertions(+), 162 deletions(-) create mode 100644 server/src/parallel_fetcher.rs Summary over all repositories: 3 files changed, 314 insertions(+), 162 deletions(-) -- Generated by murpp 0.9.0 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel 2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner @ 2025-08-28 14:42 ` Lukas Wagner 2025-08-29 8:57 ` Dominik Csapak 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner 2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner 2 siblings, 1 reply; 10+ messages in thread From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw) To: pdm-devel Implementing this over and over again is a lot of work; many parts of PDM need to fetch some kind of resource from all remotes and all nodes of these remotes. The general approach is the same as for remote task fetching (JoinSet and semaphores to control concurrency), it will be refactored in a later commit to use this new helper. Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> --- server/src/lib.rs | 1 + server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++ 2 files changed, 237 insertions(+) create mode 100644 server/src/parallel_fetcher.rs diff --git a/server/src/lib.rs b/server/src/lib.rs index 485bc792..3f8b7708 100644 --- a/server/src/lib.rs +++ b/server/src/lib.rs @@ -6,6 +6,7 @@ pub mod auth; pub mod context; pub mod env; pub mod metric_collection; +pub mod parallel_fetcher; pub mod remote_cache; pub mod remote_tasks; pub mod resource_cache; diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs new file mode 100644 index 00000000..3fad68a0 --- /dev/null +++ b/server/src/parallel_fetcher.rs @@ -0,0 +1,236 @@ +use std::{ + collections::HashMap, + fmt::Debug, + future::Future, + sync::Arc, + time::{Duration, Instant}, +}; + +use anyhow::Error; +use pdm_api_types::remotes::{Remote, RemoteType}; +use pve_api_types::ClusterNodeIndexResponse; +use tokio::{ + sync::{OwnedSemaphorePermit, Semaphore}, + task::JoinSet, +}; + +use crate::connection; + +pub struct ParallelFetcher<C> { + pub max_connections: usize, + pub max_connections_per_remote: usize, + pub context: C, +} + +pub struct FetchResults<T> { + pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>, +} + +impl<T> Default for FetchResults<T> { + fn default() -> Self { + Self { + remote_results: Default::default(), + } + } +} + +#[derive(Debug)] +pub struct RemoteResult<T> { + pub node_results: HashMap<String, Result<NodeResults<T>, Error>>, +} + +impl<T> Default for RemoteResult<T> { + fn default() -> Self { + Self { + node_results: Default::default(), + } + } +} + +#[derive(Debug)] +pub struct NodeResults<T> { + pub data: T, + pub api_response_time: Duration, +} + +impl<C: Clone + Send + 'static> ParallelFetcher<C> { + pub fn new(context: C) -> Self { + Self { + max_connections: 20, + max_connections_per_remote: 5, + context, + } + } + + pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T> + where + A: Iterator<Item = Remote>, + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, + Ft: Future<Output = Result<T, Error>> + Send + 'static, + T: Send + Debug + 'static, + { + let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections)); + + let mut remote_join_set = JoinSet::new(); + + for remote in remotes { + let semaphore = Arc::clone(&total_connections_semaphore); + + let f = func.clone(); + + remote_join_set.spawn(Self::fetch_remote( + remote, + self.context.clone(), + semaphore, + f, + self.max_connections_per_remote, + )); + } + + let mut results = FetchResults::default(); + + while let Some(a) = remote_join_set.join_next().await { + match a { + Ok((remote_name, remote_result)) => { + results.remote_results.insert(remote_name, remote_result); + } + Err(err) => { + log::error!("join error when waiting for future: {err}") + } + } + } + + results + } + + async fn fetch_remote<F, Ft, T>( + remote: Remote, + context: C, + semaphore: Arc<Semaphore>, + func: F, + max_connections_per_remote: usize, + ) -> (String, Result<RemoteResult<T>, Error>) + where + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, + Ft: Future<Output = Result<T, Error>> + Send + 'static, + T: Send + Debug + 'static, + { + let mut per_remote_results = RemoteResult::default(); + + let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap()); + let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote)); + + match remote.ty { + RemoteType::Pve => { + let remote_clone = remote.clone(); + + let nodes = match async move { + let client = connection::make_pve_client(&remote_clone)?; + let nodes = client.list_nodes().await?; + + Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes) + } + .await + { + Ok(nodes) => nodes, + Err(err) => return (remote.id.clone(), Err(err)), + }; + + let mut nodes_join_set = JoinSet::new(); + + for node in nodes { + let permit = if let Some(permit) = permit.take() { + permit + } else { + Arc::clone(&semaphore).acquire_owned().await.unwrap() + }; + + let per_remote_connections_permit = Arc::clone(&per_remote_semaphore) + .acquire_owned() + .await + .unwrap(); + + let func_clone = func.clone(); + let remote_clone = remote.clone(); + let node_name = node.node.clone(); + let context_clone = context.clone(); + + nodes_join_set.spawn(Self::fetch_pve_node( + func_clone, + context_clone, + remote_clone, + node_name, + permit, + per_remote_connections_permit, + )); + } + + while let Some(join_result) = nodes_join_set.join_next().await { + match join_result { + Ok((node_name, per_node_result)) => { + per_remote_results + .node_results + .insert(node_name, per_node_result); + } + Err(e) => { + log::error!("join error when waiting for future: {e}") + } + } + } + } + RemoteType::Pbs => { + let node = "localhost".to_string(); + + let now = Instant::now(); + let result = func(context, remote.clone(), node.clone()).await; + let api_response_time = now.elapsed(); + + match result { + Ok(data) => { + per_remote_results.node_results.insert( + node, + Ok(NodeResults { + data, + api_response_time, + }), + ); + } + Err(err) => { + per_remote_results.node_results.insert(node, Err(err)); + } + } + } + } + + (remote.id, Ok(per_remote_results)) + } + + async fn fetch_pve_node<F, Ft, T>( + func: F, + context: C, + remote: Remote, + node: String, + _permit: OwnedSemaphorePermit, + _per_remote_connections_permit: OwnedSemaphorePermit, + ) -> (String, Result<NodeResults<T>, Error>) + where + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, + Ft: Future<Output = Result<T, Error>> + Send + 'static, + T: Send + Debug + 'static, + { + let now = Instant::now(); + let result = func(context, remote.clone(), node.clone()).await; + let api_response_time = now.elapsed(); + + match result { + Ok(data) => ( + node, + Ok(NodeResults { + data, + api_response_time, + }), + ), + Err(err) => (node, Err(err)), + } + } +} -- 2.47.2 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner @ 2025-08-29 8:57 ` Dominik Csapak 2025-08-29 9:02 ` Stefan Hanreich 2025-08-29 14:05 ` Lukas Wagner 0 siblings, 2 replies; 10+ messages in thread From: Dominik Csapak @ 2025-08-29 8:57 UTC (permalink / raw) To: Proxmox Datacenter Manager development discussion, Lukas Wagner some minor comments inline, but looks good in general to me On 8/28/25 4:42 PM, Lukas Wagner wrote: > Implementing this over and over again is a lot of work; many parts of > PDM need to fetch some kind of resource from all remotes and all nodes > of these remotes. > > The general approach is the same as for remote task fetching (JoinSet > and semaphores to control concurrency), it will be refactored in a later > commit to use this new helper. > > Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> > --- > server/src/lib.rs | 1 + > server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++ > 2 files changed, 237 insertions(+) > create mode 100644 server/src/parallel_fetcher.rs > > diff --git a/server/src/lib.rs b/server/src/lib.rs > index 485bc792..3f8b7708 100644 > --- a/server/src/lib.rs > +++ b/server/src/lib.rs > @@ -6,6 +6,7 @@ pub mod auth; > pub mod context; > pub mod env; > pub mod metric_collection; > +pub mod parallel_fetcher; > pub mod remote_cache; > pub mod remote_tasks; > pub mod resource_cache; > diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs > new file mode 100644 > index 00000000..3fad68a0 > --- /dev/null > +++ b/server/src/parallel_fetcher.rs > @@ -0,0 +1,236 @@ > +use std::{ > + collections::HashMap, > + fmt::Debug, > + future::Future, > + sync::Arc, > + time::{Duration, Instant}, > +}; > + > +use anyhow::Error; > +use pdm_api_types::remotes::{Remote, RemoteType}; > +use pve_api_types::ClusterNodeIndexResponse; > +use tokio::{ > + sync::{OwnedSemaphorePermit, Semaphore}, > + task::JoinSet, > +}; > + > +use crate::connection; > + > +pub struct ParallelFetcher<C> { > + pub max_connections: usize, > + pub max_connections_per_remote: usize, > + pub context: C, > +} > + > +pub struct FetchResults<T> { > + pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>, > +} > + > +impl<T> Default for FetchResults<T> { > + fn default() -> Self { > + Self { > + remote_results: Default::default(), > + } > + } > +} > + > +#[derive(Debug)] > +pub struct RemoteResult<T> { > + pub node_results: HashMap<String, Result<NodeResults<T>, Error>>, > +} > + > +impl<T> Default for RemoteResult<T> { > + fn default() -> Self { > + Self { > + node_results: Default::default(), > + } > + } > +} > + > +#[derive(Debug)] > +pub struct NodeResults<T> { > + pub data: T, > + pub api_response_time: Duration, > +} > + > +impl<C: Clone + Send + 'static> ParallelFetcher<C> { > + pub fn new(context: C) -> Self { > + Self { > + max_connections: 20, > + max_connections_per_remote: 5, you probably thought of it already, but we probably want to make this settable from outside in a seperate constructor, or at least put the values in const values for now > + context, > + } > + } > + > + pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T> > + where > + A: Iterator<Item = Remote>, > + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, > + Ft: Future<Output = Result<T, Error>> + Send + 'static, > + T: Send + Debug + 'static, > + { > + let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections)); > + > + let mut remote_join_set = JoinSet::new(); > + > + for remote in remotes { > + let semaphore = Arc::clone(&total_connections_semaphore); > + > + let f = func.clone(); > + > + remote_join_set.spawn(Self::fetch_remote( > + remote, > + self.context.clone(), > + semaphore, > + f, > + self.max_connections_per_remote, > + )); not sure if it would work properly, but couldn't we put the semaphores into self, and clone that instead? then `fetch_remote` could become a normal method on self and we could implement a helper for getting permits and we wouldn't have to pass the semaphores around all the time. (maybe that does not work as i think though..) > + } > + > + let mut results = FetchResults::default(); > + > + while let Some(a) = remote_join_set.join_next().await { > + match a { > + Ok((remote_name, remote_result)) => { > + results.remote_results.insert(remote_name, remote_result); > + } > + Err(err) => { > + log::error!("join error when waiting for future: {err}") > + } > + } > + } > + > + results > + } > + > + async fn fetch_remote<F, Ft, T>( > + remote: Remote, > + context: C, > + semaphore: Arc<Semaphore>, > + func: F, > + max_connections_per_remote: usize, > + ) -> (String, Result<RemoteResult<T>, Error>) > + where > + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, > + Ft: Future<Output = Result<T, Error>> + Send + 'static, > + T: Send + Debug + 'static, > + { > + let mut per_remote_results = RemoteResult::default(); > + > + let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap()); > + let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote)); > + > + match remote.ty { > + RemoteType::Pve => { > + let remote_clone = remote.clone(); > + > + let nodes = match async move { > + let client = connection::make_pve_client(&remote_clone)?; > + let nodes = client.list_nodes().await?; > + > + Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes) > + } > + .await > + { > + Ok(nodes) => nodes, > + Err(err) => return (remote.id.clone(), Err(err)), > + }; > + > + let mut nodes_join_set = JoinSet::new(); > + > + for node in nodes { > + let permit = if let Some(permit) = permit.take() { > + permit > + } else { > + Arc::clone(&semaphore).acquire_owned().await.unwrap() > + }; > + > + let per_remote_connections_permit = Arc::clone(&per_remote_semaphore) > + .acquire_owned() > + .await > + .unwrap(); > + > + let func_clone = func.clone(); > + let remote_clone = remote.clone(); > + let node_name = node.node.clone(); > + let context_clone = context.clone(); > + > + nodes_join_set.spawn(Self::fetch_pve_node( > + func_clone, > + context_clone, > + remote_clone, > + node_name, > + permit, > + per_remote_connections_permit, > + )); > + } > + > + while let Some(join_result) = nodes_join_set.join_next().await { > + match join_result { > + Ok((node_name, per_node_result)) => { > + per_remote_results > + .node_results > + .insert(node_name, per_node_result); > + } > + Err(e) => { > + log::error!("join error when waiting for future: {e}") > + } > + } > + } > + } > + RemoteType::Pbs => { > + let node = "localhost".to_string(); > + > + let now = Instant::now(); > + let result = func(context, remote.clone(), node.clone()).await; > + let api_response_time = now.elapsed(); > + > + match result { > + Ok(data) => { > + per_remote_results.node_results.insert( > + node, > + Ok(NodeResults { > + data, > + api_response_time, > + }), > + ); > + } > + Err(err) => { > + per_remote_results.node_results.insert(node, Err(err)); > + } > + } > + } > + } > + > + (remote.id, Ok(per_remote_results)) > + } probably a `fn do_for_all_remotes` that only does one thing per remote would also be nice? (i think stefan said he'll need something like this) > + > + async fn fetch_pve_node<F, Ft, T>( > + func: F, > + context: C, > + remote: Remote, > + node: String, > + _permit: OwnedSemaphorePermit, > + _per_remote_connections_permit: OwnedSemaphorePermit, > + ) -> (String, Result<NodeResults<T>, Error>) > + where > + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, > + Ft: Future<Output = Result<T, Error>> + Send + 'static, > + T: Send + Debug + 'static, > + { > + let now = Instant::now(); > + let result = func(context, remote.clone(), node.clone()).await; > + let api_response_time = now.elapsed(); > + > + match result { > + Ok(data) => ( > + node, > + Ok(NodeResults { > + data, > + api_response_time, > + }), > + ), > + Err(err) => (node, Err(err)), > + } > + } > +} _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel 2025-08-29 8:57 ` Dominik Csapak @ 2025-08-29 9:02 ` Stefan Hanreich 2025-08-29 9:17 ` Lukas Wagner 2025-08-29 14:05 ` Lukas Wagner 1 sibling, 1 reply; 10+ messages in thread From: Stefan Hanreich @ 2025-08-29 9:02 UTC (permalink / raw) To: Proxmox Datacenter Manager development discussion, Dominik Csapak, Lukas Wagner On 8/29/25 10:58 AM, Dominik Csapak wrote: [snip] > probably a `fn do_for_all_remotes` that only does one thing per remote > would also be nice? (i think stefan said he'll need something like this) currently working on implementing this - I'll send a patch soon hopefully [snip] _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel 2025-08-29 9:02 ` Stefan Hanreich @ 2025-08-29 9:17 ` Lukas Wagner 0 siblings, 0 replies; 10+ messages in thread From: Lukas Wagner @ 2025-08-29 9:17 UTC (permalink / raw) To: Stefan Hanreich, Proxmox Datacenter Manager development discussion, Dominik Csapak On Fri Aug 29, 2025 at 11:02 AM CEST, Stefan Hanreich wrote: > On 8/29/25 10:58 AM, Dominik Csapak wrote: > > [snip] > >> probably a `fn do_for_all_remotes` that only does one thing per remote >> would also be nice? (i think stefan said he'll need something like this) > > currently working on implementing this - I'll send a patch soon hopefully yup, unfortunately this was a misunderstanding between me and Stefan; I thought that the also needed the 'for all nodes' function that was implemented here; but anyways, this should be pretty easy to add now as well. _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel 2025-08-29 8:57 ` Dominik Csapak 2025-08-29 9:02 ` Stefan Hanreich @ 2025-08-29 14:05 ` Lukas Wagner 1 sibling, 0 replies; 10+ messages in thread From: Lukas Wagner @ 2025-08-29 14:05 UTC (permalink / raw) To: Dominik Csapak, Proxmox Datacenter Manager development discussion On Fri Aug 29, 2025 at 10:57 AM CEST, Dominik Csapak wrote: > some minor comments inline, but looks good in general to me > > On 8/28/25 4:42 PM, Lukas Wagner wrote: >> Implementing this over and over again is a lot of work; many parts of >> PDM need to fetch some kind of resource from all remotes and all nodes >> of these remotes. >> >> The general approach is the same as for remote task fetching (JoinSet >> and semaphores to control concurrency), it will be refactored in a later >> commit to use this new helper. >> >> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> >> --- >> server/src/lib.rs | 1 + >> server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++ >> 2 files changed, 237 insertions(+) >> create mode 100644 server/src/parallel_fetcher.rs >> >> diff --git a/server/src/lib.rs b/server/src/lib.rs >> index 485bc792..3f8b7708 100644 >> --- a/server/src/lib.rs >> +++ b/server/src/lib.rs >> @@ -6,6 +6,7 @@ pub mod auth; >> pub mod context; >> pub mod env; >> pub mod metric_collection; >> +pub mod parallel_fetcher; >> pub mod remote_cache; >> pub mod remote_tasks; >> pub mod resource_cache; >> diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs >> new file mode 100644 >> index 00000000..3fad68a0 >> --- /dev/null >> +++ b/server/src/parallel_fetcher.rs >> @@ -0,0 +1,236 @@ >> +use std::{ >> + collections::HashMap, >> + fmt::Debug, >> + future::Future, >> + sync::Arc, >> + time::{Duration, Instant}, >> +}; >> + >> +use anyhow::Error; >> +use pdm_api_types::remotes::{Remote, RemoteType}; >> +use pve_api_types::ClusterNodeIndexResponse; >> +use tokio::{ >> + sync::{OwnedSemaphorePermit, Semaphore}, >> + task::JoinSet, >> +}; >> + >> +use crate::connection; >> + >> +pub struct ParallelFetcher<C> { >> + pub max_connections: usize, >> + pub max_connections_per_remote: usize, >> + pub context: C, >> +} >> + >> +pub struct FetchResults<T> { >> + pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>, >> +} >> + >> +impl<T> Default for FetchResults<T> { >> + fn default() -> Self { >> + Self { >> + remote_results: Default::default(), >> + } >> + } >> +} >> + >> +#[derive(Debug)] >> +pub struct RemoteResult<T> { >> + pub node_results: HashMap<String, Result<NodeResults<T>, Error>>, >> +} >> + >> +impl<T> Default for RemoteResult<T> { >> + fn default() -> Self { >> + Self { >> + node_results: Default::default(), >> + } >> + } >> +} >> + >> +#[derive(Debug)] >> +pub struct NodeResults<T> { >> + pub data: T, >> + pub api_response_time: Duration, >> +} >> + >> +impl<C: Clone + Send + 'static> ParallelFetcher<C> { >> + pub fn new(context: C) -> Self { >> + Self { >> + max_connections: 20, >> + max_connections_per_remote: 5, > > you probably thought of it already, but we probably want to > make this settable from outside in a seperate constructor, or at least > put the values in const values for now I've extracted these into constants for v2, but I think for the future a builder pattern could be nice for this one. Stefan and I are playing around with some ideas on how to make the whole thing a bit nicer to use in the call sites, but that's something for a future series; since we both need this helper for our code we've settled for something basic first, which can then be improved/refactored later. > >> + context, >> + } >> + } >> + >> + pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T> >> + where >> + A: Iterator<Item = Remote>, >> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, >> + Ft: Future<Output = Result<T, Error>> + Send + 'static, >> + T: Send + Debug + 'static, >> + { >> + let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections)); >> + >> + let mut remote_join_set = JoinSet::new(); >> + >> + for remote in remotes { >> + let semaphore = Arc::clone(&total_connections_semaphore); >> + >> + let f = func.clone(); >> + >> + remote_join_set.spawn(Self::fetch_remote( >> + remote, >> + self.context.clone(), >> + semaphore, >> + f, >> + self.max_connections_per_remote, >> + )); > > not sure if it would work properly, but couldn't we put the semaphores > into self, and clone that instead? > > then `fetch_remote` could become a normal method on self > and we could implement a helper for getting permits and > we wouldn't have to pass the semaphores around all the time. > > (maybe that does not work as i think though..) Good idea actually, but I'd also revisit that a bit later; I'd prefer to get this applied as soon as possible, so that Stefan and I can build on it. > >> + } >> + >> + let mut results = FetchResults::default(); >> + >> + while let Some(a) = remote_join_set.join_next().await { >> + match a { >> + Ok((remote_name, remote_result)) => { >> + results.remote_results.insert(remote_name, remote_result); >> + } >> + Err(err) => { >> + log::error!("join error when waiting for future: {err}") >> + } >> + } >> + } >> + >> + results >> + } >> + >> + async fn fetch_remote<F, Ft, T>( >> + remote: Remote, >> + context: C, >> + semaphore: Arc<Semaphore>, >> + func: F, >> + max_connections_per_remote: usize, >> + ) -> (String, Result<RemoteResult<T>, Error>) >> + where >> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, >> + Ft: Future<Output = Result<T, Error>> + Send + 'static, >> + T: Send + Debug + 'static, >> + { >> + let mut per_remote_results = RemoteResult::default(); >> + >> + let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap()); >> + let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote)); >> + >> + match remote.ty { >> + RemoteType::Pve => { >> + let remote_clone = remote.clone(); >> + >> + let nodes = match async move { >> + let client = connection::make_pve_client(&remote_clone)?; >> + let nodes = client.list_nodes().await?; >> + >> + Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes) >> + } >> + .await >> + { >> + Ok(nodes) => nodes, >> + Err(err) => return (remote.id.clone(), Err(err)), >> + }; >> + >> + let mut nodes_join_set = JoinSet::new(); >> + >> + for node in nodes { >> + let permit = if let Some(permit) = permit.take() { >> + permit >> + } else { >> + Arc::clone(&semaphore).acquire_owned().await.unwrap() >> + }; >> + >> + let per_remote_connections_permit = Arc::clone(&per_remote_semaphore) >> + .acquire_owned() >> + .await >> + .unwrap(); >> + >> + let func_clone = func.clone(); >> + let remote_clone = remote.clone(); >> + let node_name = node.node.clone(); >> + let context_clone = context.clone(); >> + >> + nodes_join_set.spawn(Self::fetch_pve_node( >> + func_clone, >> + context_clone, >> + remote_clone, >> + node_name, >> + permit, >> + per_remote_connections_permit, >> + )); >> + } >> + >> + while let Some(join_result) = nodes_join_set.join_next().await { >> + match join_result { >> + Ok((node_name, per_node_result)) => { >> + per_remote_results >> + .node_results >> + .insert(node_name, per_node_result); >> + } >> + Err(e) => { >> + log::error!("join error when waiting for future: {e}") >> + } >> + } >> + } >> + } >> + RemoteType::Pbs => { >> + let node = "localhost".to_string(); >> + >> + let now = Instant::now(); >> + let result = func(context, remote.clone(), node.clone()).await; >> + let api_response_time = now.elapsed(); >> + >> + match result { >> + Ok(data) => { >> + per_remote_results.node_results.insert( >> + node, >> + Ok(NodeResults { >> + data, >> + api_response_time, >> + }), >> + ); >> + } >> + Err(err) => { >> + per_remote_results.node_results.insert(node, Err(err)); >> + } >> + } >> + } >> + } >> + >> + (remote.id, Ok(per_remote_results)) >> + } > > probably a `fn do_for_all_remotes` that only does one thing per remote > would also be nice? (i think stefan said he'll need something like this) Will be added for v2, I adopted (and slightly modified, after some discussion) Stefan's patches for that >> + >> + async fn fetch_pve_node<F, Ft, T>( >> + func: F, >> + context: C, >> + remote: Remote, >> + node: String, >> + _permit: OwnedSemaphorePermit, >> + _per_remote_connections_permit: OwnedSemaphorePermit, >> + ) -> (String, Result<NodeResults<T>, Error>) >> + where >> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static, >> + Ft: Future<Output = Result<T, Error>> + Send + 'static, >> + T: Send + Debug + 'static, >> + { >> + let now = Instant::now(); >> + let result = func(context, remote.clone(), node.clone()).await; >> + let api_response_time = now.elapsed(); >> + >> + match result { >> + Ok(data) => ( >> + node, >> + Ok(NodeResults { >> + data, >> + api_response_time, >> + }), >> + ), >> + Err(err) => (node, Err(err)), >> + } >> + } >> +} _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper 2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner @ 2025-08-28 14:42 ` Lukas Wagner 2025-08-29 9:15 ` Dominik Csapak 2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner 2 siblings, 1 reply; 10+ messages in thread From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw) To: pdm-devel This allows us to simplify the fetching logic quite a bit. Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> --- .../tasks/remote_tasks.rs | 239 ++++++------------ 1 file changed, 77 insertions(+), 162 deletions(-) diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs index 04c51dac..967e633c 100644 --- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs +++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs @@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource}; use server::{ api::pve, + parallel_fetcher::{NodeResults, ParallelFetcher}, remote_tasks::{ self, task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem}, @@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> { get_remotes_with_finished_tasks(&remote_config, &poll_results) }; - let (all_tasks, update_state_for_remote) = fetch_remotes( - remotes, - Arc::new(cache_state), - Arc::clone(&total_connections_semaphore), - ) - .await; + let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await; if !all_tasks.is_empty() { update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?; @@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> { async fn fetch_remotes( remotes: Vec<Remote>, cache_state: Arc<State>, - total_connections_semaphore: Arc<Semaphore>, ) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) { - let mut join_set = JoinSet::new(); + let fetcher = ParallelFetcher { + max_connections: MAX_CONNECTIONS, + max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE, + context: cache_state, + }; - for remote in remotes { - let semaphore = Arc::clone(&total_connections_semaphore); - let state_clone = Arc::clone(&cache_state); - - join_set.spawn(async move { - log::debug!("fetching remote tasks for '{}'", remote.id); - fetch_tasks(&remote, state_clone, semaphore) - .await - .map_err(|err| { - format_err!("could not fetch tasks from remote '{}': {err}", remote.id) - }) - }); - } + let fetch_results = fetcher + .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node) + .await; let mut all_tasks = Vec::new(); - let mut update_state_for_remote = NodeFetchSuccessMap::default(); + let mut node_success_map = NodeFetchSuccessMap::default(); - while let Some(res) = join_set.join_next().await { - match res { - Ok(Ok(FetchedTasks { - tasks, - node_results, - })) => { - all_tasks.extend(tasks); - update_state_for_remote.merge(node_results); + for (remote_name, result) in fetch_results.remote_results { + match result { + Ok(remote_result) => { + for (node_name, node_result) in remote_result.node_results { + match node_result { + Ok(NodeResults { data, .. }) => { + all_tasks.extend(data); + node_success_map.set_node_success(remote_name.clone(), node_name); + } + Err(err) => { + log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}"); + } + } + } + } + Err(err) => { + log::error!("could not fetch tasks from remote '{remote_name}': {err:#}"); } - Ok(Err(err)) => log::error!("{err:#}"), - Err(err) => log::error!("could not join task fetching future: {err:#}"), } } - (all_tasks, update_state_for_remote) + (all_tasks, node_success_map) +} + +async fn fetch_tasks_from_single_node( + context: Arc<State>, + remote: Remote, + node: String, +) -> Result<Vec<TaskCacheItem>, Error> { + match remote.ty { + RemoteType::Pve => { + let since = context + .cutoff_timestamp(&remote.id, &node) + .unwrap_or_else(|| { + proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 + }); + + let params = ListTasks { + source: Some(ListTasksSource::Archive), + since: Some(since), + // If `limit` is not provided, we only receive 50 tasks + limit: Some(MAX_TASKS_TO_FETCH), + ..Default::default() + }; + + let remote_clone = remote.clone(); + let client = pve::connect(&remote_clone)?; + let task_list = client.get_task_list(&node, params).await.map_err(|err| { + format_err!("remote '{}', node '{}': {err}", remote_clone.id, node) + })?; + + let task_list = task_list + .into_iter() + .map(|task| map_pve_task(task, &remote.id)) + .filter_map(|task| match task { + Ok(task) => Some(task), + Err(err) => { + log::error!("could not map PVE task: {err:#}"); + None + } + }) + .collect(); + + Ok(task_list) + } + RemoteType::Pbs => { + // TODO: Support PBS. + Ok(vec![]) + } + } } /// Return all remotes from the given config. @@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> { tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await? } -/// Fetched tasks from a single remote. -struct FetchedTasks { - /// List of tasks. - tasks: Vec<TaskCacheItem>, - /// Contains whether a cluster node was fetched successfully. - node_results: NodeFetchSuccessMap, -} - -/// Fetch tasks (active and finished) from a remote. -async fn fetch_tasks( - remote: &Remote, - state: Arc<State>, - total_connections_semaphore: Arc<Semaphore>, -) -> Result<FetchedTasks, Error> { - let mut tasks = Vec::new(); - - let mut node_results = NodeFetchSuccessMap::default(); - - match remote.ty { - RemoteType::Pve => { - let client = pve::connect(remote)?; - - let nodes = { - // This permit *must* be dropped before we acquire the permits for the - // per-node connections - otherwise we risk a deadlock. - let _permit = total_connections_semaphore.acquire().await.unwrap(); - client.list_nodes().await? - }; - - // This second semaphore is used to limit the number of concurrent connections - // *per remote*, not in total. - let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE)); - let mut join_set = JoinSet::new(); - - for node in nodes { - let node_name = node.node.to_string(); - - let since = state - .cutoff_timestamp(&remote.id, &node_name) - .unwrap_or_else(|| { - proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 - }); - - let params = ListTasks { - source: Some(ListTasksSource::Archive), - since: Some(since), - // If `limit` is not provided, we only receive 50 tasks - limit: Some(MAX_TASKS_TO_FETCH), - ..Default::default() - }; - - let per_remote_permit = Arc::clone(&per_remote_semaphore) - .acquire_owned() - .await - .unwrap(); - - let total_connections_permit = Arc::clone(&total_connections_semaphore) - .acquire_owned() - .await - .unwrap(); - - let remote_clone = remote.clone(); - - join_set.spawn(async move { - let res = async { - let client = pve::connect(&remote_clone)?; - let task_list = - client - .get_task_list(&node.node, params) - .await - .map_err(|err| { - format_err!( - "remote '{}', node '{}': {err}", - remote_clone.id, - node.node - ) - })?; - Ok::<Vec<_>, Error>(task_list) - } - .await; - - drop(total_connections_permit); - drop(per_remote_permit); - - (node_name, res) - }); - } - - while let Some(result) = join_set.join_next().await { - match result { - Ok((node_name, result)) => match result { - Ok(task_list) => { - let mapped = - task_list.into_iter().filter_map(|task| { - match map_pve_task(task, &remote.id) { - Ok(task) => Some(task), - Err(err) => { - log::error!( - "could not map task data, skipping: {err:#}" - ); - None - } - } - }); - - tasks.extend(mapped); - node_results.set_node_success(remote.id.clone(), node_name); - } - Err(error) => { - log::error!("could not fetch tasks: {error:#}"); - } - }, - Err(error) => { - log::error!("could not join task fetching task: {error:#}"); - } - } - } - } - RemoteType::Pbs => { - // TODO: Add code for PBS - } - } - - Ok(FetchedTasks { - tasks, - node_results, - }) -} - #[derive(PartialEq, Debug)] /// Outcome from polling a tracked task. enum PollResult { -- 2.47.2 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner @ 2025-08-29 9:15 ` Dominik Csapak 2025-08-29 9:19 ` Lukas Wagner 0 siblings, 1 reply; 10+ messages in thread From: Dominik Csapak @ 2025-08-29 9:15 UTC (permalink / raw) To: Proxmox Datacenter Manager development discussion, Lukas Wagner some minor comments inline, rest LGTM On 8/28/25 4:42 PM, Lukas Wagner wrote: > This allows us to simplify the fetching logic quite a bit. > > Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> > --- > .../tasks/remote_tasks.rs | 239 ++++++------------ > 1 file changed, 77 insertions(+), 162 deletions(-) > > diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs > index 04c51dac..967e633c 100644 > --- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs > +++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs > @@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource}; > > use server::{ > api::pve, > + parallel_fetcher::{NodeResults, ParallelFetcher}, > remote_tasks::{ > self, > task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem}, > @@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> { > get_remotes_with_finished_tasks(&remote_config, &poll_results) > }; > > - let (all_tasks, update_state_for_remote) = fetch_remotes( > - remotes, > - Arc::new(cache_state), > - Arc::clone(&total_connections_semaphore), > - ) > - .await; > + let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await; > > if !all_tasks.is_empty() { > update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?; > @@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> { > async fn fetch_remotes( > remotes: Vec<Remote>, > cache_state: Arc<State>, > - total_connections_semaphore: Arc<Semaphore>, > ) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) { > - let mut join_set = JoinSet::new(); > + let fetcher = ParallelFetcher { > + max_connections: MAX_CONNECTIONS, > + max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE, > + context: cache_state, > + }; > > - for remote in remotes { > - let semaphore = Arc::clone(&total_connections_semaphore); > - let state_clone = Arc::clone(&cache_state); > - > - join_set.spawn(async move { > - log::debug!("fetching remote tasks for '{}'", remote.id); > - fetch_tasks(&remote, state_clone, semaphore) > - .await > - .map_err(|err| { > - format_err!("could not fetch tasks from remote '{}': {err}", remote.id) > - }) > - }); > - } > + let fetch_results = fetcher > + .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node) > + .await; > > let mut all_tasks = Vec::new(); > - let mut update_state_for_remote = NodeFetchSuccessMap::default(); > + let mut node_success_map = NodeFetchSuccessMap::default(); > > - while let Some(res) = join_set.join_next().await { > - match res { > - Ok(Ok(FetchedTasks { > - tasks, > - node_results, > - })) => { > - all_tasks.extend(tasks); > - update_state_for_remote.merge(node_results); > + for (remote_name, result) in fetch_results.remote_results { > + match result { > + Ok(remote_result) => { > + for (node_name, node_result) in remote_result.node_results { > + match node_result { > + Ok(NodeResults { data, .. }) => { > + all_tasks.extend(data); > + node_success_map.set_node_success(remote_name.clone(), node_name); > + } > + Err(err) => { > + log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}"); > + } > + } > + } > + } > + Err(err) => { > + log::error!("could not fetch tasks from remote '{remote_name}': {err:#}"); > } > - Ok(Err(err)) => log::error!("{err:#}"), > - Err(err) => log::error!("could not join task fetching future: {err:#}"), > } > } > > - (all_tasks, update_state_for_remote) > + (all_tasks, node_success_map) > +} > + > +async fn fetch_tasks_from_single_node( > + context: Arc<State>, > + remote: Remote, > + node: String, > +) -> Result<Vec<TaskCacheItem>, Error> { > + match remote.ty { > + RemoteType::Pve => { > + let since = context > + .cutoff_timestamp(&remote.id, &node) > + .unwrap_or_else(|| { > + proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 > + }); > + > + let params = ListTasks { > + source: Some(ListTasksSource::Archive), > + since: Some(since), > + // If `limit` is not provided, we only receive 50 tasks > + limit: Some(MAX_TASKS_TO_FETCH), > + ..Default::default() > + }; > + > + let remote_clone = remote.clone(); > + let client = pve::connect(&remote_clone)?; this clone can simply be omitted by using remote directly > + let task_list = client.get_task_list(&node, params).await.map_err(|err| { > + format_err!("remote '{}', node '{}': {err}", remote_clone.id, node) > + })?; > + > + let task_list = task_list > + .into_iter() > + .map(|task| map_pve_task(task, &remote.id)) > + .filter_map(|task| match task { > + Ok(task) => Some(task), > + Err(err) => { > + log::error!("could not map PVE task: {err:#}"); > + None > + } > + }) > + .collect(); two things here: you could do just one filter_map calling map_pve_task inside and you can simply append `into_iter().filter_map(..)` on the original get_task_list call. no need for the extra `let task_list = task_list....` > + > + Ok(task_list) > + } > + RemoteType::Pbs => { > + // TODO: Support PBS. > + Ok(vec![]) > + } > + } > } > > /// Return all remotes from the given config. > @@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> { > tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await? > } > > -/// Fetched tasks from a single remote. > -struct FetchedTasks { > - /// List of tasks. > - tasks: Vec<TaskCacheItem>, > - /// Contains whether a cluster node was fetched successfully. > - node_results: NodeFetchSuccessMap, > -} > - > -/// Fetch tasks (active and finished) from a remote. > -async fn fetch_tasks( > - remote: &Remote, > - state: Arc<State>, > - total_connections_semaphore: Arc<Semaphore>, > -) -> Result<FetchedTasks, Error> { > - let mut tasks = Vec::new(); > - > - let mut node_results = NodeFetchSuccessMap::default(); > - > - match remote.ty { > - RemoteType::Pve => { > - let client = pve::connect(remote)?; > - > - let nodes = { > - // This permit *must* be dropped before we acquire the permits for the > - // per-node connections - otherwise we risk a deadlock. > - let _permit = total_connections_semaphore.acquire().await.unwrap(); > - client.list_nodes().await? > - }; > - > - // This second semaphore is used to limit the number of concurrent connections > - // *per remote*, not in total. > - let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE)); > - let mut join_set = JoinSet::new(); > - > - for node in nodes { > - let node_name = node.node.to_string(); > - > - let since = state > - .cutoff_timestamp(&remote.id, &node_name) > - .unwrap_or_else(|| { > - proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 > - }); > - > - let params = ListTasks { > - source: Some(ListTasksSource::Archive), > - since: Some(since), > - // If `limit` is not provided, we only receive 50 tasks > - limit: Some(MAX_TASKS_TO_FETCH), > - ..Default::default() > - }; > - > - let per_remote_permit = Arc::clone(&per_remote_semaphore) > - .acquire_owned() > - .await > - .unwrap(); > - > - let total_connections_permit = Arc::clone(&total_connections_semaphore) > - .acquire_owned() > - .await > - .unwrap(); > - > - let remote_clone = remote.clone(); > - > - join_set.spawn(async move { > - let res = async { > - let client = pve::connect(&remote_clone)?; > - let task_list = > - client > - .get_task_list(&node.node, params) > - .await > - .map_err(|err| { > - format_err!( > - "remote '{}', node '{}': {err}", > - remote_clone.id, > - node.node > - ) > - })?; > - Ok::<Vec<_>, Error>(task_list) > - } > - .await; > - > - drop(total_connections_permit); > - drop(per_remote_permit); > - > - (node_name, res) > - }); > - } > - > - while let Some(result) = join_set.join_next().await { > - match result { > - Ok((node_name, result)) => match result { > - Ok(task_list) => { > - let mapped = > - task_list.into_iter().filter_map(|task| { > - match map_pve_task(task, &remote.id) { > - Ok(task) => Some(task), > - Err(err) => { > - log::error!( > - "could not map task data, skipping: {err:#}" > - ); > - None > - } > - } > - }); > - > - tasks.extend(mapped); > - node_results.set_node_success(remote.id.clone(), node_name); > - } > - Err(error) => { > - log::error!("could not fetch tasks: {error:#}"); > - } > - }, > - Err(error) => { > - log::error!("could not join task fetching task: {error:#}"); > - } > - } > - } > - } > - RemoteType::Pbs => { > - // TODO: Add code for PBS > - } > - } > - > - Ok(FetchedTasks { > - tasks, > - node_results, > - }) > -} > - > #[derive(PartialEq, Debug)] > /// Outcome from polling a tracked task. > enum PollResult { _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper 2025-08-29 9:15 ` Dominik Csapak @ 2025-08-29 9:19 ` Lukas Wagner 0 siblings, 0 replies; 10+ messages in thread From: Lukas Wagner @ 2025-08-29 9:19 UTC (permalink / raw) To: Dominik Csapak, Proxmox Datacenter Manager development discussion On Fri Aug 29, 2025 at 11:15 AM CEST, Dominik Csapak wrote: > some minor comments inline, rest LGTM > > On 8/28/25 4:42 PM, Lukas Wagner wrote: >> This allows us to simplify the fetching logic quite a bit. >> >> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> >> --- >> .../tasks/remote_tasks.rs | 239 ++++++------------ >> 1 file changed, 77 insertions(+), 162 deletions(-) >> >> diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs >> index 04c51dac..967e633c 100644 >> --- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs >> +++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs >> @@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource}; >> >> use server::{ >> api::pve, >> + parallel_fetcher::{NodeResults, ParallelFetcher}, >> remote_tasks::{ >> self, >> task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem}, >> @@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> { >> get_remotes_with_finished_tasks(&remote_config, &poll_results) >> }; >> >> - let (all_tasks, update_state_for_remote) = fetch_remotes( >> - remotes, >> - Arc::new(cache_state), >> - Arc::clone(&total_connections_semaphore), >> - ) >> - .await; >> + let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await; >> >> if !all_tasks.is_empty() { >> update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?; >> @@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> { >> async fn fetch_remotes( >> remotes: Vec<Remote>, >> cache_state: Arc<State>, >> - total_connections_semaphore: Arc<Semaphore>, >> ) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) { >> - let mut join_set = JoinSet::new(); >> + let fetcher = ParallelFetcher { >> + max_connections: MAX_CONNECTIONS, >> + max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE, >> + context: cache_state, >> + }; >> >> - for remote in remotes { >> - let semaphore = Arc::clone(&total_connections_semaphore); >> - let state_clone = Arc::clone(&cache_state); >> - >> - join_set.spawn(async move { >> - log::debug!("fetching remote tasks for '{}'", remote.id); >> - fetch_tasks(&remote, state_clone, semaphore) >> - .await >> - .map_err(|err| { >> - format_err!("could not fetch tasks from remote '{}': {err}", remote.id) >> - }) >> - }); >> - } >> + let fetch_results = fetcher >> + .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node) >> + .await; >> >> let mut all_tasks = Vec::new(); >> - let mut update_state_for_remote = NodeFetchSuccessMap::default(); >> + let mut node_success_map = NodeFetchSuccessMap::default(); >> >> - while let Some(res) = join_set.join_next().await { >> - match res { >> - Ok(Ok(FetchedTasks { >> - tasks, >> - node_results, >> - })) => { >> - all_tasks.extend(tasks); >> - update_state_for_remote.merge(node_results); >> + for (remote_name, result) in fetch_results.remote_results { >> + match result { >> + Ok(remote_result) => { >> + for (node_name, node_result) in remote_result.node_results { >> + match node_result { >> + Ok(NodeResults { data, .. }) => { >> + all_tasks.extend(data); >> + node_success_map.set_node_success(remote_name.clone(), node_name); >> + } >> + Err(err) => { >> + log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}"); >> + } >> + } >> + } >> + } >> + Err(err) => { >> + log::error!("could not fetch tasks from remote '{remote_name}': {err:#}"); >> } >> - Ok(Err(err)) => log::error!("{err:#}"), >> - Err(err) => log::error!("could not join task fetching future: {err:#}"), >> } >> } >> >> - (all_tasks, update_state_for_remote) >> + (all_tasks, node_success_map) >> +} >> + >> +async fn fetch_tasks_from_single_node( >> + context: Arc<State>, >> + remote: Remote, >> + node: String, >> +) -> Result<Vec<TaskCacheItem>, Error> { >> + match remote.ty { >> + RemoteType::Pve => { >> + let since = context >> + .cutoff_timestamp(&remote.id, &node) >> + .unwrap_or_else(|| { >> + proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 >> + }); >> + >> + let params = ListTasks { >> + source: Some(ListTasksSource::Archive), >> + since: Some(since), >> + // If `limit` is not provided, we only receive 50 tasks >> + limit: Some(MAX_TASKS_TO_FETCH), >> + ..Default::default() >> + }; >> + >> + let remote_clone = remote.clone(); >> + let client = pve::connect(&remote_clone)?; > > this clone can simply be omitted by using remote directly > True, thanks! >> + let task_list = client.get_task_list(&node, params).await.map_err(|err| { >> + format_err!("remote '{}', node '{}': {err}", remote_clone.id, node) >> + })?; >> + >> + let task_list = task_list >> + .into_iter() >> + .map(|task| map_pve_task(task, &remote.id)) >> + .filter_map(|task| match task { >> + Ok(task) => Some(task), >> + Err(err) => { >> + log::error!("could not map PVE task: {err:#}"); >> + None >> + } >> + }) >> + .collect(); > > two things here: > > you could do just one filter_map calling map_pve_task inside > > and you can simply append `into_iter().filter_map(..)` on the original > get_task_list call. no need for the extra `let task_list = task_list....` > Good point! Will make the changes you requested. >> + >> + Ok(task_list) >> + } >> + RemoteType::Pbs => { >> + // TODO: Support PBS. >> + Ok(vec![]) >> + } >> + } >> } >> >> /// Return all remotes from the given config. >> @@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> { >> tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await? >> } >> >> -/// Fetched tasks from a single remote. >> -struct FetchedTasks { >> - /// List of tasks. >> - tasks: Vec<TaskCacheItem>, >> - /// Contains whether a cluster node was fetched successfully. >> - node_results: NodeFetchSuccessMap, >> -} >> - >> -/// Fetch tasks (active and finished) from a remote. >> -async fn fetch_tasks( >> - remote: &Remote, >> - state: Arc<State>, >> - total_connections_semaphore: Arc<Semaphore>, >> -) -> Result<FetchedTasks, Error> { >> - let mut tasks = Vec::new(); >> - >> - let mut node_results = NodeFetchSuccessMap::default(); >> - >> - match remote.ty { >> - RemoteType::Pve => { >> - let client = pve::connect(remote)?; >> - >> - let nodes = { >> - // This permit *must* be dropped before we acquire the permits for the >> - // per-node connections - otherwise we risk a deadlock. >> - let _permit = total_connections_semaphore.acquire().await.unwrap(); >> - client.list_nodes().await? >> - }; >> - >> - // This second semaphore is used to limit the number of concurrent connections >> - // *per remote*, not in total. >> - let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE)); >> - let mut join_set = JoinSet::new(); >> - >> - for node in nodes { >> - let node_name = node.node.to_string(); >> - >> - let since = state >> - .cutoff_timestamp(&remote.id, &node_name) >> - .unwrap_or_else(|| { >> - proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64 >> - }); >> - >> - let params = ListTasks { >> - source: Some(ListTasksSource::Archive), >> - since: Some(since), >> - // If `limit` is not provided, we only receive 50 tasks >> - limit: Some(MAX_TASKS_TO_FETCH), >> - ..Default::default() >> - }; >> - >> - let per_remote_permit = Arc::clone(&per_remote_semaphore) >> - .acquire_owned() >> - .await >> - .unwrap(); >> - >> - let total_connections_permit = Arc::clone(&total_connections_semaphore) >> - .acquire_owned() >> - .await >> - .unwrap(); >> - >> - let remote_clone = remote.clone(); >> - >> - join_set.spawn(async move { >> - let res = async { >> - let client = pve::connect(&remote_clone)?; >> - let task_list = >> - client >> - .get_task_list(&node.node, params) >> - .await >> - .map_err(|err| { >> - format_err!( >> - "remote '{}', node '{}': {err}", >> - remote_clone.id, >> - node.node >> - ) >> - })?; >> - Ok::<Vec<_>, Error>(task_list) >> - } >> - .await; >> - >> - drop(total_connections_permit); >> - drop(per_remote_permit); >> - >> - (node_name, res) >> - }); >> - } >> - >> - while let Some(result) = join_set.join_next().await { >> - match result { >> - Ok((node_name, result)) => match result { >> - Ok(task_list) => { >> - let mapped = >> - task_list.into_iter().filter_map(|task| { >> - match map_pve_task(task, &remote.id) { >> - Ok(task) => Some(task), >> - Err(err) => { >> - log::error!( >> - "could not map task data, skipping: {err:#}" >> - ); >> - None >> - } >> - } >> - }); >> - >> - tasks.extend(mapped); >> - node_results.set_node_success(remote.id.clone(), node_name); >> - } >> - Err(error) => { >> - log::error!("could not fetch tasks: {error:#}"); >> - } >> - }, >> - Err(error) => { >> - log::error!("could not join task fetching task: {error:#}"); >> - } >> - } >> - } >> - } >> - RemoteType::Pbs => { >> - // TODO: Add code for PBS >> - } >> - } >> - >> - Ok(FetchedTasks { >> - tasks, >> - node_results, >> - }) >> -} >> - >> #[derive(PartialEq, Debug)] >> /// Outcome from polling a tracked task. >> enum PollResult { _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel 2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner @ 2025-08-29 14:11 ` Lukas Wagner 2 siblings, 0 replies; 10+ messages in thread From: Lukas Wagner @ 2025-08-29 14:11 UTC (permalink / raw) To: Proxmox Datacenter Manager development discussion; +Cc: pdm-devel On Thu Aug 28, 2025 at 4:42 PM CEST, Lukas Wagner wrote: > Still work in progress (code style improvements, documentation, naming), > but sharing early because Stefan might need something similar. > > Please do not merge yet. > > proxmox-datacenter-manager: > > Lukas Wagner (2): > server: add convenience helper to fetch results from many remote nodes > in parallel > remote task fetching: use ParallelFetcher helper > > .../tasks/remote_tasks.rs | 239 ++++++------------ > server/src/lib.rs | 1 + > server/src/parallel_fetcher.rs | 236 +++++++++++++++++ > 3 files changed, 314 insertions(+), 162 deletions(-) > create mode 100644 server/src/parallel_fetcher.rs > > > Summary over all repositories: > 3 files changed, 314 insertions(+), 162 deletions(-) superseded by: https://lore.proxmox.com/pdm-devel/20250829141028.309835-1-l.wagner@proxmox.com/T/#t _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-29 14:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner 2025-08-29 8:57 ` Dominik Csapak 2025-08-29 9:02 ` Stefan Hanreich 2025-08-29 9:17 ` Lukas Wagner 2025-08-29 14:05 ` Lukas Wagner 2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner 2025-08-29 9:15 ` Dominik Csapak 2025-08-29 9:19 ` Lukas Wagner 2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.