* [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel
@ 2025-08-28 14:42 Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw)
To: pdm-devel
Still work in progress (code style improvements, documentation, naming),
but sharing early because Stefan might need something similar.
Please do not merge yet.
proxmox-datacenter-manager:
Lukas Wagner (2):
server: add convenience helper to fetch results from many remote nodes
in parallel
remote task fetching: use ParallelFetcher helper
.../tasks/remote_tasks.rs | 239 ++++++------------
server/src/lib.rs | 1 +
server/src/parallel_fetcher.rs | 236 +++++++++++++++++
3 files changed, 314 insertions(+), 162 deletions(-)
create mode 100644 server/src/parallel_fetcher.rs
Summary over all repositories:
3 files changed, 314 insertions(+), 162 deletions(-)
--
Generated by murpp 0.9.0
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel
2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
@ 2025-08-28 14:42 ` Lukas Wagner
2025-08-29 8:57 ` Dominik Csapak
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner
2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
2 siblings, 1 reply; 10+ messages in thread
From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw)
To: pdm-devel
Implementing this over and over again is a lot of work; many parts of
PDM need to fetch some kind of resource from all remotes and all nodes
of these remotes.
The general approach is the same as for remote task fetching (JoinSet
and semaphores to control concurrency), it will be refactored in a later
commit to use this new helper.
Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
server/src/lib.rs | 1 +
server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++
2 files changed, 237 insertions(+)
create mode 100644 server/src/parallel_fetcher.rs
diff --git a/server/src/lib.rs b/server/src/lib.rs
index 485bc792..3f8b7708 100644
--- a/server/src/lib.rs
+++ b/server/src/lib.rs
@@ -6,6 +6,7 @@ pub mod auth;
pub mod context;
pub mod env;
pub mod metric_collection;
+pub mod parallel_fetcher;
pub mod remote_cache;
pub mod remote_tasks;
pub mod resource_cache;
diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs
new file mode 100644
index 00000000..3fad68a0
--- /dev/null
+++ b/server/src/parallel_fetcher.rs
@@ -0,0 +1,236 @@
+use std::{
+ collections::HashMap,
+ fmt::Debug,
+ future::Future,
+ sync::Arc,
+ time::{Duration, Instant},
+};
+
+use anyhow::Error;
+use pdm_api_types::remotes::{Remote, RemoteType};
+use pve_api_types::ClusterNodeIndexResponse;
+use tokio::{
+ sync::{OwnedSemaphorePermit, Semaphore},
+ task::JoinSet,
+};
+
+use crate::connection;
+
+pub struct ParallelFetcher<C> {
+ pub max_connections: usize,
+ pub max_connections_per_remote: usize,
+ pub context: C,
+}
+
+pub struct FetchResults<T> {
+ pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>,
+}
+
+impl<T> Default for FetchResults<T> {
+ fn default() -> Self {
+ Self {
+ remote_results: Default::default(),
+ }
+ }
+}
+
+#[derive(Debug)]
+pub struct RemoteResult<T> {
+ pub node_results: HashMap<String, Result<NodeResults<T>, Error>>,
+}
+
+impl<T> Default for RemoteResult<T> {
+ fn default() -> Self {
+ Self {
+ node_results: Default::default(),
+ }
+ }
+}
+
+#[derive(Debug)]
+pub struct NodeResults<T> {
+ pub data: T,
+ pub api_response_time: Duration,
+}
+
+impl<C: Clone + Send + 'static> ParallelFetcher<C> {
+ pub fn new(context: C) -> Self {
+ Self {
+ max_connections: 20,
+ max_connections_per_remote: 5,
+ context,
+ }
+ }
+
+ pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T>
+ where
+ A: Iterator<Item = Remote>,
+ F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
+ Ft: Future<Output = Result<T, Error>> + Send + 'static,
+ T: Send + Debug + 'static,
+ {
+ let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections));
+
+ let mut remote_join_set = JoinSet::new();
+
+ for remote in remotes {
+ let semaphore = Arc::clone(&total_connections_semaphore);
+
+ let f = func.clone();
+
+ remote_join_set.spawn(Self::fetch_remote(
+ remote,
+ self.context.clone(),
+ semaphore,
+ f,
+ self.max_connections_per_remote,
+ ));
+ }
+
+ let mut results = FetchResults::default();
+
+ while let Some(a) = remote_join_set.join_next().await {
+ match a {
+ Ok((remote_name, remote_result)) => {
+ results.remote_results.insert(remote_name, remote_result);
+ }
+ Err(err) => {
+ log::error!("join error when waiting for future: {err}")
+ }
+ }
+ }
+
+ results
+ }
+
+ async fn fetch_remote<F, Ft, T>(
+ remote: Remote,
+ context: C,
+ semaphore: Arc<Semaphore>,
+ func: F,
+ max_connections_per_remote: usize,
+ ) -> (String, Result<RemoteResult<T>, Error>)
+ where
+ F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
+ Ft: Future<Output = Result<T, Error>> + Send + 'static,
+ T: Send + Debug + 'static,
+ {
+ let mut per_remote_results = RemoteResult::default();
+
+ let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap());
+ let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote));
+
+ match remote.ty {
+ RemoteType::Pve => {
+ let remote_clone = remote.clone();
+
+ let nodes = match async move {
+ let client = connection::make_pve_client(&remote_clone)?;
+ let nodes = client.list_nodes().await?;
+
+ Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes)
+ }
+ .await
+ {
+ Ok(nodes) => nodes,
+ Err(err) => return (remote.id.clone(), Err(err)),
+ };
+
+ let mut nodes_join_set = JoinSet::new();
+
+ for node in nodes {
+ let permit = if let Some(permit) = permit.take() {
+ permit
+ } else {
+ Arc::clone(&semaphore).acquire_owned().await.unwrap()
+ };
+
+ let per_remote_connections_permit = Arc::clone(&per_remote_semaphore)
+ .acquire_owned()
+ .await
+ .unwrap();
+
+ let func_clone = func.clone();
+ let remote_clone = remote.clone();
+ let node_name = node.node.clone();
+ let context_clone = context.clone();
+
+ nodes_join_set.spawn(Self::fetch_pve_node(
+ func_clone,
+ context_clone,
+ remote_clone,
+ node_name,
+ permit,
+ per_remote_connections_permit,
+ ));
+ }
+
+ while let Some(join_result) = nodes_join_set.join_next().await {
+ match join_result {
+ Ok((node_name, per_node_result)) => {
+ per_remote_results
+ .node_results
+ .insert(node_name, per_node_result);
+ }
+ Err(e) => {
+ log::error!("join error when waiting for future: {e}")
+ }
+ }
+ }
+ }
+ RemoteType::Pbs => {
+ let node = "localhost".to_string();
+
+ let now = Instant::now();
+ let result = func(context, remote.clone(), node.clone()).await;
+ let api_response_time = now.elapsed();
+
+ match result {
+ Ok(data) => {
+ per_remote_results.node_results.insert(
+ node,
+ Ok(NodeResults {
+ data,
+ api_response_time,
+ }),
+ );
+ }
+ Err(err) => {
+ per_remote_results.node_results.insert(node, Err(err));
+ }
+ }
+ }
+ }
+
+ (remote.id, Ok(per_remote_results))
+ }
+
+ async fn fetch_pve_node<F, Ft, T>(
+ func: F,
+ context: C,
+ remote: Remote,
+ node: String,
+ _permit: OwnedSemaphorePermit,
+ _per_remote_connections_permit: OwnedSemaphorePermit,
+ ) -> (String, Result<NodeResults<T>, Error>)
+ where
+ F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
+ Ft: Future<Output = Result<T, Error>> + Send + 'static,
+ T: Send + Debug + 'static,
+ {
+ let now = Instant::now();
+ let result = func(context, remote.clone(), node.clone()).await;
+ let api_response_time = now.elapsed();
+
+ match result {
+ Ok(data) => (
+ node,
+ Ok(NodeResults {
+ data,
+ api_response_time,
+ }),
+ ),
+ Err(err) => (node, Err(err)),
+ }
+ }
+}
--
2.47.2
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper
2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner
@ 2025-08-28 14:42 ` Lukas Wagner
2025-08-29 9:15 ` Dominik Csapak
2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
2 siblings, 1 reply; 10+ messages in thread
From: Lukas Wagner @ 2025-08-28 14:42 UTC (permalink / raw)
To: pdm-devel
This allows us to simplify the fetching logic quite a bit.
Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
.../tasks/remote_tasks.rs | 239 ++++++------------
1 file changed, 77 insertions(+), 162 deletions(-)
diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
index 04c51dac..967e633c 100644
--- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
+++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
@@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource};
use server::{
api::pve,
+ parallel_fetcher::{NodeResults, ParallelFetcher},
remote_tasks::{
self,
task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem},
@@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> {
get_remotes_with_finished_tasks(&remote_config, &poll_results)
};
- let (all_tasks, update_state_for_remote) = fetch_remotes(
- remotes,
- Arc::new(cache_state),
- Arc::clone(&total_connections_semaphore),
- )
- .await;
+ let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await;
if !all_tasks.is_empty() {
update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?;
@@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> {
async fn fetch_remotes(
remotes: Vec<Remote>,
cache_state: Arc<State>,
- total_connections_semaphore: Arc<Semaphore>,
) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) {
- let mut join_set = JoinSet::new();
+ let fetcher = ParallelFetcher {
+ max_connections: MAX_CONNECTIONS,
+ max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE,
+ context: cache_state,
+ };
- for remote in remotes {
- let semaphore = Arc::clone(&total_connections_semaphore);
- let state_clone = Arc::clone(&cache_state);
-
- join_set.spawn(async move {
- log::debug!("fetching remote tasks for '{}'", remote.id);
- fetch_tasks(&remote, state_clone, semaphore)
- .await
- .map_err(|err| {
- format_err!("could not fetch tasks from remote '{}': {err}", remote.id)
- })
- });
- }
+ let fetch_results = fetcher
+ .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node)
+ .await;
let mut all_tasks = Vec::new();
- let mut update_state_for_remote = NodeFetchSuccessMap::default();
+ let mut node_success_map = NodeFetchSuccessMap::default();
- while let Some(res) = join_set.join_next().await {
- match res {
- Ok(Ok(FetchedTasks {
- tasks,
- node_results,
- })) => {
- all_tasks.extend(tasks);
- update_state_for_remote.merge(node_results);
+ for (remote_name, result) in fetch_results.remote_results {
+ match result {
+ Ok(remote_result) => {
+ for (node_name, node_result) in remote_result.node_results {
+ match node_result {
+ Ok(NodeResults { data, .. }) => {
+ all_tasks.extend(data);
+ node_success_map.set_node_success(remote_name.clone(), node_name);
+ }
+ Err(err) => {
+ log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}");
+ }
+ }
+ }
+ }
+ Err(err) => {
+ log::error!("could not fetch tasks from remote '{remote_name}': {err:#}");
}
- Ok(Err(err)) => log::error!("{err:#}"),
- Err(err) => log::error!("could not join task fetching future: {err:#}"),
}
}
- (all_tasks, update_state_for_remote)
+ (all_tasks, node_success_map)
+}
+
+async fn fetch_tasks_from_single_node(
+ context: Arc<State>,
+ remote: Remote,
+ node: String,
+) -> Result<Vec<TaskCacheItem>, Error> {
+ match remote.ty {
+ RemoteType::Pve => {
+ let since = context
+ .cutoff_timestamp(&remote.id, &node)
+ .unwrap_or_else(|| {
+ proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
+ });
+
+ let params = ListTasks {
+ source: Some(ListTasksSource::Archive),
+ since: Some(since),
+ // If `limit` is not provided, we only receive 50 tasks
+ limit: Some(MAX_TASKS_TO_FETCH),
+ ..Default::default()
+ };
+
+ let remote_clone = remote.clone();
+ let client = pve::connect(&remote_clone)?;
+ let task_list = client.get_task_list(&node, params).await.map_err(|err| {
+ format_err!("remote '{}', node '{}': {err}", remote_clone.id, node)
+ })?;
+
+ let task_list = task_list
+ .into_iter()
+ .map(|task| map_pve_task(task, &remote.id))
+ .filter_map(|task| match task {
+ Ok(task) => Some(task),
+ Err(err) => {
+ log::error!("could not map PVE task: {err:#}");
+ None
+ }
+ })
+ .collect();
+
+ Ok(task_list)
+ }
+ RemoteType::Pbs => {
+ // TODO: Support PBS.
+ Ok(vec![])
+ }
+ }
}
/// Return all remotes from the given config.
@@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> {
tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await?
}
-/// Fetched tasks from a single remote.
-struct FetchedTasks {
- /// List of tasks.
- tasks: Vec<TaskCacheItem>,
- /// Contains whether a cluster node was fetched successfully.
- node_results: NodeFetchSuccessMap,
-}
-
-/// Fetch tasks (active and finished) from a remote.
-async fn fetch_tasks(
- remote: &Remote,
- state: Arc<State>,
- total_connections_semaphore: Arc<Semaphore>,
-) -> Result<FetchedTasks, Error> {
- let mut tasks = Vec::new();
-
- let mut node_results = NodeFetchSuccessMap::default();
-
- match remote.ty {
- RemoteType::Pve => {
- let client = pve::connect(remote)?;
-
- let nodes = {
- // This permit *must* be dropped before we acquire the permits for the
- // per-node connections - otherwise we risk a deadlock.
- let _permit = total_connections_semaphore.acquire().await.unwrap();
- client.list_nodes().await?
- };
-
- // This second semaphore is used to limit the number of concurrent connections
- // *per remote*, not in total.
- let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE));
- let mut join_set = JoinSet::new();
-
- for node in nodes {
- let node_name = node.node.to_string();
-
- let since = state
- .cutoff_timestamp(&remote.id, &node_name)
- .unwrap_or_else(|| {
- proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
- });
-
- let params = ListTasks {
- source: Some(ListTasksSource::Archive),
- since: Some(since),
- // If `limit` is not provided, we only receive 50 tasks
- limit: Some(MAX_TASKS_TO_FETCH),
- ..Default::default()
- };
-
- let per_remote_permit = Arc::clone(&per_remote_semaphore)
- .acquire_owned()
- .await
- .unwrap();
-
- let total_connections_permit = Arc::clone(&total_connections_semaphore)
- .acquire_owned()
- .await
- .unwrap();
-
- let remote_clone = remote.clone();
-
- join_set.spawn(async move {
- let res = async {
- let client = pve::connect(&remote_clone)?;
- let task_list =
- client
- .get_task_list(&node.node, params)
- .await
- .map_err(|err| {
- format_err!(
- "remote '{}', node '{}': {err}",
- remote_clone.id,
- node.node
- )
- })?;
- Ok::<Vec<_>, Error>(task_list)
- }
- .await;
-
- drop(total_connections_permit);
- drop(per_remote_permit);
-
- (node_name, res)
- });
- }
-
- while let Some(result) = join_set.join_next().await {
- match result {
- Ok((node_name, result)) => match result {
- Ok(task_list) => {
- let mapped =
- task_list.into_iter().filter_map(|task| {
- match map_pve_task(task, &remote.id) {
- Ok(task) => Some(task),
- Err(err) => {
- log::error!(
- "could not map task data, skipping: {err:#}"
- );
- None
- }
- }
- });
-
- tasks.extend(mapped);
- node_results.set_node_success(remote.id.clone(), node_name);
- }
- Err(error) => {
- log::error!("could not fetch tasks: {error:#}");
- }
- },
- Err(error) => {
- log::error!("could not join task fetching task: {error:#}");
- }
- }
- }
- }
- RemoteType::Pbs => {
- // TODO: Add code for PBS
- }
- }
-
- Ok(FetchedTasks {
- tasks,
- node_results,
- })
-}
-
#[derive(PartialEq, Debug)]
/// Outcome from polling a tracked task.
enum PollResult {
--
2.47.2
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner
@ 2025-08-29 8:57 ` Dominik Csapak
2025-08-29 9:02 ` Stefan Hanreich
2025-08-29 14:05 ` Lukas Wagner
0 siblings, 2 replies; 10+ messages in thread
From: Dominik Csapak @ 2025-08-29 8:57 UTC (permalink / raw)
To: Proxmox Datacenter Manager development discussion, Lukas Wagner
some minor comments inline, but looks good in general to me
On 8/28/25 4:42 PM, Lukas Wagner wrote:
> Implementing this over and over again is a lot of work; many parts of
> PDM need to fetch some kind of resource from all remotes and all nodes
> of these remotes.
>
> The general approach is the same as for remote task fetching (JoinSet
> and semaphores to control concurrency), it will be refactored in a later
> commit to use this new helper.
>
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
> server/src/lib.rs | 1 +
> server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++
> 2 files changed, 237 insertions(+)
> create mode 100644 server/src/parallel_fetcher.rs
>
> diff --git a/server/src/lib.rs b/server/src/lib.rs
> index 485bc792..3f8b7708 100644
> --- a/server/src/lib.rs
> +++ b/server/src/lib.rs
> @@ -6,6 +6,7 @@ pub mod auth;
> pub mod context;
> pub mod env;
> pub mod metric_collection;
> +pub mod parallel_fetcher;
> pub mod remote_cache;
> pub mod remote_tasks;
> pub mod resource_cache;
> diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs
> new file mode 100644
> index 00000000..3fad68a0
> --- /dev/null
> +++ b/server/src/parallel_fetcher.rs
> @@ -0,0 +1,236 @@
> +use std::{
> + collections::HashMap,
> + fmt::Debug,
> + future::Future,
> + sync::Arc,
> + time::{Duration, Instant},
> +};
> +
> +use anyhow::Error;
> +use pdm_api_types::remotes::{Remote, RemoteType};
> +use pve_api_types::ClusterNodeIndexResponse;
> +use tokio::{
> + sync::{OwnedSemaphorePermit, Semaphore},
> + task::JoinSet,
> +};
> +
> +use crate::connection;
> +
> +pub struct ParallelFetcher<C> {
> + pub max_connections: usize,
> + pub max_connections_per_remote: usize,
> + pub context: C,
> +}
> +
> +pub struct FetchResults<T> {
> + pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>,
> +}
> +
> +impl<T> Default for FetchResults<T> {
> + fn default() -> Self {
> + Self {
> + remote_results: Default::default(),
> + }
> + }
> +}
> +
> +#[derive(Debug)]
> +pub struct RemoteResult<T> {
> + pub node_results: HashMap<String, Result<NodeResults<T>, Error>>,
> +}
> +
> +impl<T> Default for RemoteResult<T> {
> + fn default() -> Self {
> + Self {
> + node_results: Default::default(),
> + }
> + }
> +}
> +
> +#[derive(Debug)]
> +pub struct NodeResults<T> {
> + pub data: T,
> + pub api_response_time: Duration,
> +}
> +
> +impl<C: Clone + Send + 'static> ParallelFetcher<C> {
> + pub fn new(context: C) -> Self {
> + Self {
> + max_connections: 20,
> + max_connections_per_remote: 5,
you probably thought of it already, but we probably want to
make this settable from outside in a seperate constructor, or at least
put the values in const values for now
> + context,
> + }
> + }
> +
> + pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T>
> + where
> + A: Iterator<Item = Remote>,
> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
> + T: Send + Debug + 'static,
> + {
> + let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections));
> +
> + let mut remote_join_set = JoinSet::new();
> +
> + for remote in remotes {
> + let semaphore = Arc::clone(&total_connections_semaphore);
> +
> + let f = func.clone();
> +
> + remote_join_set.spawn(Self::fetch_remote(
> + remote,
> + self.context.clone(),
> + semaphore,
> + f,
> + self.max_connections_per_remote,
> + ));
not sure if it would work properly, but couldn't we put the semaphores
into self, and clone that instead?
then `fetch_remote` could become a normal method on self
and we could implement a helper for getting permits and
we wouldn't have to pass the semaphores around all the time.
(maybe that does not work as i think though..)
> + }
> +
> + let mut results = FetchResults::default();
> +
> + while let Some(a) = remote_join_set.join_next().await {
> + match a {
> + Ok((remote_name, remote_result)) => {
> + results.remote_results.insert(remote_name, remote_result);
> + }
> + Err(err) => {
> + log::error!("join error when waiting for future: {err}")
> + }
> + }
> + }
> +
> + results
> + }
> +
> + async fn fetch_remote<F, Ft, T>(
> + remote: Remote,
> + context: C,
> + semaphore: Arc<Semaphore>,
> + func: F,
> + max_connections_per_remote: usize,
> + ) -> (String, Result<RemoteResult<T>, Error>)
> + where
> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
> + T: Send + Debug + 'static,
> + {
> + let mut per_remote_results = RemoteResult::default();
> +
> + let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap());
> + let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote));
> +
> + match remote.ty {
> + RemoteType::Pve => {
> + let remote_clone = remote.clone();
> +
> + let nodes = match async move {
> + let client = connection::make_pve_client(&remote_clone)?;
> + let nodes = client.list_nodes().await?;
> +
> + Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes)
> + }
> + .await
> + {
> + Ok(nodes) => nodes,
> + Err(err) => return (remote.id.clone(), Err(err)),
> + };
> +
> + let mut nodes_join_set = JoinSet::new();
> +
> + for node in nodes {
> + let permit = if let Some(permit) = permit.take() {
> + permit
> + } else {
> + Arc::clone(&semaphore).acquire_owned().await.unwrap()
> + };
> +
> + let per_remote_connections_permit = Arc::clone(&per_remote_semaphore)
> + .acquire_owned()
> + .await
> + .unwrap();
> +
> + let func_clone = func.clone();
> + let remote_clone = remote.clone();
> + let node_name = node.node.clone();
> + let context_clone = context.clone();
> +
> + nodes_join_set.spawn(Self::fetch_pve_node(
> + func_clone,
> + context_clone,
> + remote_clone,
> + node_name,
> + permit,
> + per_remote_connections_permit,
> + ));
> + }
> +
> + while let Some(join_result) = nodes_join_set.join_next().await {
> + match join_result {
> + Ok((node_name, per_node_result)) => {
> + per_remote_results
> + .node_results
> + .insert(node_name, per_node_result);
> + }
> + Err(e) => {
> + log::error!("join error when waiting for future: {e}")
> + }
> + }
> + }
> + }
> + RemoteType::Pbs => {
> + let node = "localhost".to_string();
> +
> + let now = Instant::now();
> + let result = func(context, remote.clone(), node.clone()).await;
> + let api_response_time = now.elapsed();
> +
> + match result {
> + Ok(data) => {
> + per_remote_results.node_results.insert(
> + node,
> + Ok(NodeResults {
> + data,
> + api_response_time,
> + }),
> + );
> + }
> + Err(err) => {
> + per_remote_results.node_results.insert(node, Err(err));
> + }
> + }
> + }
> + }
> +
> + (remote.id, Ok(per_remote_results))
> + }
probably a `fn do_for_all_remotes` that only does one thing per remote
would also be nice? (i think stefan said he'll need something like this)
> +
> + async fn fetch_pve_node<F, Ft, T>(
> + func: F,
> + context: C,
> + remote: Remote,
> + node: String,
> + _permit: OwnedSemaphorePermit,
> + _per_remote_connections_permit: OwnedSemaphorePermit,
> + ) -> (String, Result<NodeResults<T>, Error>)
> + where
> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
> + T: Send + Debug + 'static,
> + {
> + let now = Instant::now();
> + let result = func(context, remote.clone(), node.clone()).await;
> + let api_response_time = now.elapsed();
> +
> + match result {
> + Ok(data) => (
> + node,
> + Ok(NodeResults {
> + data,
> + api_response_time,
> + }),
> + ),
> + Err(err) => (node, Err(err)),
> + }
> + }
> +}
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel
2025-08-29 8:57 ` Dominik Csapak
@ 2025-08-29 9:02 ` Stefan Hanreich
2025-08-29 9:17 ` Lukas Wagner
2025-08-29 14:05 ` Lukas Wagner
1 sibling, 1 reply; 10+ messages in thread
From: Stefan Hanreich @ 2025-08-29 9:02 UTC (permalink / raw)
To: Proxmox Datacenter Manager development discussion,
Dominik Csapak, Lukas Wagner
On 8/29/25 10:58 AM, Dominik Csapak wrote:
[snip]
> probably a `fn do_for_all_remotes` that only does one thing per remote
> would also be nice? (i think stefan said he'll need something like this)
currently working on implementing this - I'll send a patch soon hopefully
[snip]
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner
@ 2025-08-29 9:15 ` Dominik Csapak
2025-08-29 9:19 ` Lukas Wagner
0 siblings, 1 reply; 10+ messages in thread
From: Dominik Csapak @ 2025-08-29 9:15 UTC (permalink / raw)
To: Proxmox Datacenter Manager development discussion, Lukas Wagner
some minor comments inline, rest LGTM
On 8/28/25 4:42 PM, Lukas Wagner wrote:
> This allows us to simplify the fetching logic quite a bit.
>
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
> .../tasks/remote_tasks.rs | 239 ++++++------------
> 1 file changed, 77 insertions(+), 162 deletions(-)
>
> diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
> index 04c51dac..967e633c 100644
> --- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
> +++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
> @@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource};
>
> use server::{
> api::pve,
> + parallel_fetcher::{NodeResults, ParallelFetcher},
> remote_tasks::{
> self,
> task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem},
> @@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> {
> get_remotes_with_finished_tasks(&remote_config, &poll_results)
> };
>
> - let (all_tasks, update_state_for_remote) = fetch_remotes(
> - remotes,
> - Arc::new(cache_state),
> - Arc::clone(&total_connections_semaphore),
> - )
> - .await;
> + let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await;
>
> if !all_tasks.is_empty() {
> update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?;
> @@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> {
> async fn fetch_remotes(
> remotes: Vec<Remote>,
> cache_state: Arc<State>,
> - total_connections_semaphore: Arc<Semaphore>,
> ) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) {
> - let mut join_set = JoinSet::new();
> + let fetcher = ParallelFetcher {
> + max_connections: MAX_CONNECTIONS,
> + max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE,
> + context: cache_state,
> + };
>
> - for remote in remotes {
> - let semaphore = Arc::clone(&total_connections_semaphore);
> - let state_clone = Arc::clone(&cache_state);
> -
> - join_set.spawn(async move {
> - log::debug!("fetching remote tasks for '{}'", remote.id);
> - fetch_tasks(&remote, state_clone, semaphore)
> - .await
> - .map_err(|err| {
> - format_err!("could not fetch tasks from remote '{}': {err}", remote.id)
> - })
> - });
> - }
> + let fetch_results = fetcher
> + .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node)
> + .await;
>
> let mut all_tasks = Vec::new();
> - let mut update_state_for_remote = NodeFetchSuccessMap::default();
> + let mut node_success_map = NodeFetchSuccessMap::default();
>
> - while let Some(res) = join_set.join_next().await {
> - match res {
> - Ok(Ok(FetchedTasks {
> - tasks,
> - node_results,
> - })) => {
> - all_tasks.extend(tasks);
> - update_state_for_remote.merge(node_results);
> + for (remote_name, result) in fetch_results.remote_results {
> + match result {
> + Ok(remote_result) => {
> + for (node_name, node_result) in remote_result.node_results {
> + match node_result {
> + Ok(NodeResults { data, .. }) => {
> + all_tasks.extend(data);
> + node_success_map.set_node_success(remote_name.clone(), node_name);
> + }
> + Err(err) => {
> + log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}");
> + }
> + }
> + }
> + }
> + Err(err) => {
> + log::error!("could not fetch tasks from remote '{remote_name}': {err:#}");
> }
> - Ok(Err(err)) => log::error!("{err:#}"),
> - Err(err) => log::error!("could not join task fetching future: {err:#}"),
> }
> }
>
> - (all_tasks, update_state_for_remote)
> + (all_tasks, node_success_map)
> +}
> +
> +async fn fetch_tasks_from_single_node(
> + context: Arc<State>,
> + remote: Remote,
> + node: String,
> +) -> Result<Vec<TaskCacheItem>, Error> {
> + match remote.ty {
> + RemoteType::Pve => {
> + let since = context
> + .cutoff_timestamp(&remote.id, &node)
> + .unwrap_or_else(|| {
> + proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
> + });
> +
> + let params = ListTasks {
> + source: Some(ListTasksSource::Archive),
> + since: Some(since),
> + // If `limit` is not provided, we only receive 50 tasks
> + limit: Some(MAX_TASKS_TO_FETCH),
> + ..Default::default()
> + };
> +
> + let remote_clone = remote.clone();
> + let client = pve::connect(&remote_clone)?;
this clone can simply be omitted by using remote directly
> + let task_list = client.get_task_list(&node, params).await.map_err(|err| {
> + format_err!("remote '{}', node '{}': {err}", remote_clone.id, node)
> + })?;
> +
> + let task_list = task_list
> + .into_iter()
> + .map(|task| map_pve_task(task, &remote.id))
> + .filter_map(|task| match task {
> + Ok(task) => Some(task),
> + Err(err) => {
> + log::error!("could not map PVE task: {err:#}");
> + None
> + }
> + })
> + .collect();
two things here:
you could do just one filter_map calling map_pve_task inside
and you can simply append `into_iter().filter_map(..)` on the original
get_task_list call. no need for the extra `let task_list = task_list....`
> +
> + Ok(task_list)
> + }
> + RemoteType::Pbs => {
> + // TODO: Support PBS.
> + Ok(vec![])
> + }
> + }
> }
>
> /// Return all remotes from the given config.
> @@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> {
> tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await?
> }
>
> -/// Fetched tasks from a single remote.
> -struct FetchedTasks {
> - /// List of tasks.
> - tasks: Vec<TaskCacheItem>,
> - /// Contains whether a cluster node was fetched successfully.
> - node_results: NodeFetchSuccessMap,
> -}
> -
> -/// Fetch tasks (active and finished) from a remote.
> -async fn fetch_tasks(
> - remote: &Remote,
> - state: Arc<State>,
> - total_connections_semaphore: Arc<Semaphore>,
> -) -> Result<FetchedTasks, Error> {
> - let mut tasks = Vec::new();
> -
> - let mut node_results = NodeFetchSuccessMap::default();
> -
> - match remote.ty {
> - RemoteType::Pve => {
> - let client = pve::connect(remote)?;
> -
> - let nodes = {
> - // This permit *must* be dropped before we acquire the permits for the
> - // per-node connections - otherwise we risk a deadlock.
> - let _permit = total_connections_semaphore.acquire().await.unwrap();
> - client.list_nodes().await?
> - };
> -
> - // This second semaphore is used to limit the number of concurrent connections
> - // *per remote*, not in total.
> - let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE));
> - let mut join_set = JoinSet::new();
> -
> - for node in nodes {
> - let node_name = node.node.to_string();
> -
> - let since = state
> - .cutoff_timestamp(&remote.id, &node_name)
> - .unwrap_or_else(|| {
> - proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
> - });
> -
> - let params = ListTasks {
> - source: Some(ListTasksSource::Archive),
> - since: Some(since),
> - // If `limit` is not provided, we only receive 50 tasks
> - limit: Some(MAX_TASKS_TO_FETCH),
> - ..Default::default()
> - };
> -
> - let per_remote_permit = Arc::clone(&per_remote_semaphore)
> - .acquire_owned()
> - .await
> - .unwrap();
> -
> - let total_connections_permit = Arc::clone(&total_connections_semaphore)
> - .acquire_owned()
> - .await
> - .unwrap();
> -
> - let remote_clone = remote.clone();
> -
> - join_set.spawn(async move {
> - let res = async {
> - let client = pve::connect(&remote_clone)?;
> - let task_list =
> - client
> - .get_task_list(&node.node, params)
> - .await
> - .map_err(|err| {
> - format_err!(
> - "remote '{}', node '{}': {err}",
> - remote_clone.id,
> - node.node
> - )
> - })?;
> - Ok::<Vec<_>, Error>(task_list)
> - }
> - .await;
> -
> - drop(total_connections_permit);
> - drop(per_remote_permit);
> -
> - (node_name, res)
> - });
> - }
> -
> - while let Some(result) = join_set.join_next().await {
> - match result {
> - Ok((node_name, result)) => match result {
> - Ok(task_list) => {
> - let mapped =
> - task_list.into_iter().filter_map(|task| {
> - match map_pve_task(task, &remote.id) {
> - Ok(task) => Some(task),
> - Err(err) => {
> - log::error!(
> - "could not map task data, skipping: {err:#}"
> - );
> - None
> - }
> - }
> - });
> -
> - tasks.extend(mapped);
> - node_results.set_node_success(remote.id.clone(), node_name);
> - }
> - Err(error) => {
> - log::error!("could not fetch tasks: {error:#}");
> - }
> - },
> - Err(error) => {
> - log::error!("could not join task fetching task: {error:#}");
> - }
> - }
> - }
> - }
> - RemoteType::Pbs => {
> - // TODO: Add code for PBS
> - }
> - }
> -
> - Ok(FetchedTasks {
> - tasks,
> - node_results,
> - })
> -}
> -
> #[derive(PartialEq, Debug)]
> /// Outcome from polling a tracked task.
> enum PollResult {
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel
2025-08-29 9:02 ` Stefan Hanreich
@ 2025-08-29 9:17 ` Lukas Wagner
0 siblings, 0 replies; 10+ messages in thread
From: Lukas Wagner @ 2025-08-29 9:17 UTC (permalink / raw)
To: Stefan Hanreich,
Proxmox Datacenter Manager development discussion,
Dominik Csapak
On Fri Aug 29, 2025 at 11:02 AM CEST, Stefan Hanreich wrote:
> On 8/29/25 10:58 AM, Dominik Csapak wrote:
>
> [snip]
>
>> probably a `fn do_for_all_remotes` that only does one thing per remote
>> would also be nice? (i think stefan said he'll need something like this)
>
> currently working on implementing this - I'll send a patch soon hopefully
yup, unfortunately this was a misunderstanding between me and Stefan; I
thought that the also needed the 'for all nodes' function that was
implemented here; but anyways, this should be pretty easy to add now as
well.
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper
2025-08-29 9:15 ` Dominik Csapak
@ 2025-08-29 9:19 ` Lukas Wagner
0 siblings, 0 replies; 10+ messages in thread
From: Lukas Wagner @ 2025-08-29 9:19 UTC (permalink / raw)
To: Dominik Csapak, Proxmox Datacenter Manager development discussion
On Fri Aug 29, 2025 at 11:15 AM CEST, Dominik Csapak wrote:
> some minor comments inline, rest LGTM
>
> On 8/28/25 4:42 PM, Lukas Wagner wrote:
>> This allows us to simplify the fetching logic quite a bit.
>>
>> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
>> ---
>> .../tasks/remote_tasks.rs | 239 ++++++------------
>> 1 file changed, 77 insertions(+), 162 deletions(-)
>>
>> diff --git a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
>> index 04c51dac..967e633c 100644
>> --- a/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
>> +++ b/server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
>> @@ -17,6 +17,7 @@ use pve_api_types::{ListTasks, ListTasksResponse, ListTasksSource};
>>
>> use server::{
>> api::pve,
>> + parallel_fetcher::{NodeResults, ParallelFetcher},
>> remote_tasks::{
>> self,
>> task_cache::{NodeFetchSuccessMap, State, TaskCache, TaskCacheItem},
>> @@ -185,12 +186,7 @@ async fn do_tick(task_state: &mut TaskState) -> Result<(), Error> {
>> get_remotes_with_finished_tasks(&remote_config, &poll_results)
>> };
>>
>> - let (all_tasks, update_state_for_remote) = fetch_remotes(
>> - remotes,
>> - Arc::new(cache_state),
>> - Arc::clone(&total_connections_semaphore),
>> - )
>> - .await;
>> + let (all_tasks, update_state_for_remote) = fetch_remotes(remotes, Arc::new(cache_state)).await;
>>
>> if !all_tasks.is_empty() {
>> update_task_cache(cache, all_tasks, update_state_for_remote, poll_results).await?;
>> @@ -221,42 +217,90 @@ async fn init_cache() -> Result<(), Error> {
>> async fn fetch_remotes(
>> remotes: Vec<Remote>,
>> cache_state: Arc<State>,
>> - total_connections_semaphore: Arc<Semaphore>,
>> ) -> (Vec<TaskCacheItem>, NodeFetchSuccessMap) {
>> - let mut join_set = JoinSet::new();
>> + let fetcher = ParallelFetcher {
>> + max_connections: MAX_CONNECTIONS,
>> + max_connections_per_remote: CONNECTIONS_PER_PVE_REMOTE,
>> + context: cache_state,
>> + };
>>
>> - for remote in remotes {
>> - let semaphore = Arc::clone(&total_connections_semaphore);
>> - let state_clone = Arc::clone(&cache_state);
>> -
>> - join_set.spawn(async move {
>> - log::debug!("fetching remote tasks for '{}'", remote.id);
>> - fetch_tasks(&remote, state_clone, semaphore)
>> - .await
>> - .map_err(|err| {
>> - format_err!("could not fetch tasks from remote '{}': {err}", remote.id)
>> - })
>> - });
>> - }
>> + let fetch_results = fetcher
>> + .do_for_all_remote_nodes(remotes.into_iter(), fetch_tasks_from_single_node)
>> + .await;
>>
>> let mut all_tasks = Vec::new();
>> - let mut update_state_for_remote = NodeFetchSuccessMap::default();
>> + let mut node_success_map = NodeFetchSuccessMap::default();
>>
>> - while let Some(res) = join_set.join_next().await {
>> - match res {
>> - Ok(Ok(FetchedTasks {
>> - tasks,
>> - node_results,
>> - })) => {
>> - all_tasks.extend(tasks);
>> - update_state_for_remote.merge(node_results);
>> + for (remote_name, result) in fetch_results.remote_results {
>> + match result {
>> + Ok(remote_result) => {
>> + for (node_name, node_result) in remote_result.node_results {
>> + match node_result {
>> + Ok(NodeResults { data, .. }) => {
>> + all_tasks.extend(data);
>> + node_success_map.set_node_success(remote_name.clone(), node_name);
>> + }
>> + Err(err) => {
>> + log::error!("could not fetch tasks from remote '{remote_name}', node {node_name}: {err:#}");
>> + }
>> + }
>> + }
>> + }
>> + Err(err) => {
>> + log::error!("could not fetch tasks from remote '{remote_name}': {err:#}");
>> }
>> - Ok(Err(err)) => log::error!("{err:#}"),
>> - Err(err) => log::error!("could not join task fetching future: {err:#}"),
>> }
>> }
>>
>> - (all_tasks, update_state_for_remote)
>> + (all_tasks, node_success_map)
>> +}
>> +
>> +async fn fetch_tasks_from_single_node(
>> + context: Arc<State>,
>> + remote: Remote,
>> + node: String,
>> +) -> Result<Vec<TaskCacheItem>, Error> {
>> + match remote.ty {
>> + RemoteType::Pve => {
>> + let since = context
>> + .cutoff_timestamp(&remote.id, &node)
>> + .unwrap_or_else(|| {
>> + proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
>> + });
>> +
>> + let params = ListTasks {
>> + source: Some(ListTasksSource::Archive),
>> + since: Some(since),
>> + // If `limit` is not provided, we only receive 50 tasks
>> + limit: Some(MAX_TASKS_TO_FETCH),
>> + ..Default::default()
>> + };
>> +
>> + let remote_clone = remote.clone();
>> + let client = pve::connect(&remote_clone)?;
>
> this clone can simply be omitted by using remote directly
>
True, thanks!
>> + let task_list = client.get_task_list(&node, params).await.map_err(|err| {
>> + format_err!("remote '{}', node '{}': {err}", remote_clone.id, node)
>> + })?;
>> +
>> + let task_list = task_list
>> + .into_iter()
>> + .map(|task| map_pve_task(task, &remote.id))
>> + .filter_map(|task| match task {
>> + Ok(task) => Some(task),
>> + Err(err) => {
>> + log::error!("could not map PVE task: {err:#}");
>> + None
>> + }
>> + })
>> + .collect();
>
> two things here:
>
> you could do just one filter_map calling map_pve_task inside
>
> and you can simply append `into_iter().filter_map(..)` on the original
> get_task_list call. no need for the extra `let task_list = task_list....`
>
Good point! Will make the changes you requested.
>> +
>> + Ok(task_list)
>> + }
>> + RemoteType::Pbs => {
>> + // TODO: Support PBS.
>> + Ok(vec![])
>> + }
>> + }
>> }
>>
>> /// Return all remotes from the given config.
>> @@ -301,135 +345,6 @@ async fn apply_journal(cache: TaskCache) -> Result<(), Error> {
>> tokio::task::spawn_blocking(move || cache.write()?.apply_journal()).await?
>> }
>>
>> -/// Fetched tasks from a single remote.
>> -struct FetchedTasks {
>> - /// List of tasks.
>> - tasks: Vec<TaskCacheItem>,
>> - /// Contains whether a cluster node was fetched successfully.
>> - node_results: NodeFetchSuccessMap,
>> -}
>> -
>> -/// Fetch tasks (active and finished) from a remote.
>> -async fn fetch_tasks(
>> - remote: &Remote,
>> - state: Arc<State>,
>> - total_connections_semaphore: Arc<Semaphore>,
>> -) -> Result<FetchedTasks, Error> {
>> - let mut tasks = Vec::new();
>> -
>> - let mut node_results = NodeFetchSuccessMap::default();
>> -
>> - match remote.ty {
>> - RemoteType::Pve => {
>> - let client = pve::connect(remote)?;
>> -
>> - let nodes = {
>> - // This permit *must* be dropped before we acquire the permits for the
>> - // per-node connections - otherwise we risk a deadlock.
>> - let _permit = total_connections_semaphore.acquire().await.unwrap();
>> - client.list_nodes().await?
>> - };
>> -
>> - // This second semaphore is used to limit the number of concurrent connections
>> - // *per remote*, not in total.
>> - let per_remote_semaphore = Arc::new(Semaphore::new(CONNECTIONS_PER_PVE_REMOTE));
>> - let mut join_set = JoinSet::new();
>> -
>> - for node in nodes {
>> - let node_name = node.node.to_string();
>> -
>> - let since = state
>> - .cutoff_timestamp(&remote.id, &node_name)
>> - .unwrap_or_else(|| {
>> - proxmox_time::epoch_i64() - (KEEP_OLD_FILES as u64 * ROTATE_AFTER) as i64
>> - });
>> -
>> - let params = ListTasks {
>> - source: Some(ListTasksSource::Archive),
>> - since: Some(since),
>> - // If `limit` is not provided, we only receive 50 tasks
>> - limit: Some(MAX_TASKS_TO_FETCH),
>> - ..Default::default()
>> - };
>> -
>> - let per_remote_permit = Arc::clone(&per_remote_semaphore)
>> - .acquire_owned()
>> - .await
>> - .unwrap();
>> -
>> - let total_connections_permit = Arc::clone(&total_connections_semaphore)
>> - .acquire_owned()
>> - .await
>> - .unwrap();
>> -
>> - let remote_clone = remote.clone();
>> -
>> - join_set.spawn(async move {
>> - let res = async {
>> - let client = pve::connect(&remote_clone)?;
>> - let task_list =
>> - client
>> - .get_task_list(&node.node, params)
>> - .await
>> - .map_err(|err| {
>> - format_err!(
>> - "remote '{}', node '{}': {err}",
>> - remote_clone.id,
>> - node.node
>> - )
>> - })?;
>> - Ok::<Vec<_>, Error>(task_list)
>> - }
>> - .await;
>> -
>> - drop(total_connections_permit);
>> - drop(per_remote_permit);
>> -
>> - (node_name, res)
>> - });
>> - }
>> -
>> - while let Some(result) = join_set.join_next().await {
>> - match result {
>> - Ok((node_name, result)) => match result {
>> - Ok(task_list) => {
>> - let mapped =
>> - task_list.into_iter().filter_map(|task| {
>> - match map_pve_task(task, &remote.id) {
>> - Ok(task) => Some(task),
>> - Err(err) => {
>> - log::error!(
>> - "could not map task data, skipping: {err:#}"
>> - );
>> - None
>> - }
>> - }
>> - });
>> -
>> - tasks.extend(mapped);
>> - node_results.set_node_success(remote.id.clone(), node_name);
>> - }
>> - Err(error) => {
>> - log::error!("could not fetch tasks: {error:#}");
>> - }
>> - },
>> - Err(error) => {
>> - log::error!("could not join task fetching task: {error:#}");
>> - }
>> - }
>> - }
>> - }
>> - RemoteType::Pbs => {
>> - // TODO: Add code for PBS
>> - }
>> - }
>> -
>> - Ok(FetchedTasks {
>> - tasks,
>> - node_results,
>> - })
>> -}
>> -
>> #[derive(PartialEq, Debug)]
>> /// Outcome from polling a tracked task.
>> enum PollResult {
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results from many remote nodes in parallel
2025-08-29 8:57 ` Dominik Csapak
2025-08-29 9:02 ` Stefan Hanreich
@ 2025-08-29 14:05 ` Lukas Wagner
1 sibling, 0 replies; 10+ messages in thread
From: Lukas Wagner @ 2025-08-29 14:05 UTC (permalink / raw)
To: Dominik Csapak, Proxmox Datacenter Manager development discussion
On Fri Aug 29, 2025 at 10:57 AM CEST, Dominik Csapak wrote:
> some minor comments inline, but looks good in general to me
>
> On 8/28/25 4:42 PM, Lukas Wagner wrote:
>> Implementing this over and over again is a lot of work; many parts of
>> PDM need to fetch some kind of resource from all remotes and all nodes
>> of these remotes.
>>
>> The general approach is the same as for remote task fetching (JoinSet
>> and semaphores to control concurrency), it will be refactored in a later
>> commit to use this new helper.
>>
>> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
>> ---
>> server/src/lib.rs | 1 +
>> server/src/parallel_fetcher.rs | 236 +++++++++++++++++++++++++++++++++
>> 2 files changed, 237 insertions(+)
>> create mode 100644 server/src/parallel_fetcher.rs
>>
>> diff --git a/server/src/lib.rs b/server/src/lib.rs
>> index 485bc792..3f8b7708 100644
>> --- a/server/src/lib.rs
>> +++ b/server/src/lib.rs
>> @@ -6,6 +6,7 @@ pub mod auth;
>> pub mod context;
>> pub mod env;
>> pub mod metric_collection;
>> +pub mod parallel_fetcher;
>> pub mod remote_cache;
>> pub mod remote_tasks;
>> pub mod resource_cache;
>> diff --git a/server/src/parallel_fetcher.rs b/server/src/parallel_fetcher.rs
>> new file mode 100644
>> index 00000000..3fad68a0
>> --- /dev/null
>> +++ b/server/src/parallel_fetcher.rs
>> @@ -0,0 +1,236 @@
>> +use std::{
>> + collections::HashMap,
>> + fmt::Debug,
>> + future::Future,
>> + sync::Arc,
>> + time::{Duration, Instant},
>> +};
>> +
>> +use anyhow::Error;
>> +use pdm_api_types::remotes::{Remote, RemoteType};
>> +use pve_api_types::ClusterNodeIndexResponse;
>> +use tokio::{
>> + sync::{OwnedSemaphorePermit, Semaphore},
>> + task::JoinSet,
>> +};
>> +
>> +use crate::connection;
>> +
>> +pub struct ParallelFetcher<C> {
>> + pub max_connections: usize,
>> + pub max_connections_per_remote: usize,
>> + pub context: C,
>> +}
>> +
>> +pub struct FetchResults<T> {
>> + pub remote_results: HashMap<String, Result<RemoteResult<T>, Error>>,
>> +}
>> +
>> +impl<T> Default for FetchResults<T> {
>> + fn default() -> Self {
>> + Self {
>> + remote_results: Default::default(),
>> + }
>> + }
>> +}
>> +
>> +#[derive(Debug)]
>> +pub struct RemoteResult<T> {
>> + pub node_results: HashMap<String, Result<NodeResults<T>, Error>>,
>> +}
>> +
>> +impl<T> Default for RemoteResult<T> {
>> + fn default() -> Self {
>> + Self {
>> + node_results: Default::default(),
>> + }
>> + }
>> +}
>> +
>> +#[derive(Debug)]
>> +pub struct NodeResults<T> {
>> + pub data: T,
>> + pub api_response_time: Duration,
>> +}
>> +
>> +impl<C: Clone + Send + 'static> ParallelFetcher<C> {
>> + pub fn new(context: C) -> Self {
>> + Self {
>> + max_connections: 20,
>> + max_connections_per_remote: 5,
>
> you probably thought of it already, but we probably want to
> make this settable from outside in a seperate constructor, or at least
> put the values in const values for now
I've extracted these into constants for v2, but I think for the future a
builder pattern could be nice for this one.
Stefan and I are playing around with some ideas on how to make the
whole thing a bit nicer to use in the call sites, but that's something
for a future series; since we both need this helper for our code we've
settled for something basic first, which can then be improved/refactored
later.
>
>> + context,
>> + }
>> + }
>> +
>> + pub async fn do_for_all_remote_nodes<A, F, T, Ft>(self, remotes: A, func: F) -> FetchResults<T>
>> + where
>> + A: Iterator<Item = Remote>,
>> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
>> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
>> + T: Send + Debug + 'static,
>> + {
>> + let total_connections_semaphore = Arc::new(Semaphore::new(self.max_connections));
>> +
>> + let mut remote_join_set = JoinSet::new();
>> +
>> + for remote in remotes {
>> + let semaphore = Arc::clone(&total_connections_semaphore);
>> +
>> + let f = func.clone();
>> +
>> + remote_join_set.spawn(Self::fetch_remote(
>> + remote,
>> + self.context.clone(),
>> + semaphore,
>> + f,
>> + self.max_connections_per_remote,
>> + ));
>
> not sure if it would work properly, but couldn't we put the semaphores
> into self, and clone that instead?
>
> then `fetch_remote` could become a normal method on self
> and we could implement a helper for getting permits and
> we wouldn't have to pass the semaphores around all the time.
>
> (maybe that does not work as i think though..)
Good idea actually, but I'd also revisit that a bit later; I'd prefer to
get this applied as soon as possible, so that Stefan and I can build on
it.
>
>> + }
>> +
>> + let mut results = FetchResults::default();
>> +
>> + while let Some(a) = remote_join_set.join_next().await {
>> + match a {
>> + Ok((remote_name, remote_result)) => {
>> + results.remote_results.insert(remote_name, remote_result);
>> + }
>> + Err(err) => {
>> + log::error!("join error when waiting for future: {err}")
>> + }
>> + }
>> + }
>> +
>> + results
>> + }
>> +
>> + async fn fetch_remote<F, Ft, T>(
>> + remote: Remote,
>> + context: C,
>> + semaphore: Arc<Semaphore>,
>> + func: F,
>> + max_connections_per_remote: usize,
>> + ) -> (String, Result<RemoteResult<T>, Error>)
>> + where
>> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
>> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
>> + T: Send + Debug + 'static,
>> + {
>> + let mut per_remote_results = RemoteResult::default();
>> +
>> + let mut permit = Some(Arc::clone(&semaphore).acquire_owned().await.unwrap());
>> + let per_remote_semaphore = Arc::new(Semaphore::new(max_connections_per_remote));
>> +
>> + match remote.ty {
>> + RemoteType::Pve => {
>> + let remote_clone = remote.clone();
>> +
>> + let nodes = match async move {
>> + let client = connection::make_pve_client(&remote_clone)?;
>> + let nodes = client.list_nodes().await?;
>> +
>> + Ok::<Vec<ClusterNodeIndexResponse>, Error>(nodes)
>> + }
>> + .await
>> + {
>> + Ok(nodes) => nodes,
>> + Err(err) => return (remote.id.clone(), Err(err)),
>> + };
>> +
>> + let mut nodes_join_set = JoinSet::new();
>> +
>> + for node in nodes {
>> + let permit = if let Some(permit) = permit.take() {
>> + permit
>> + } else {
>> + Arc::clone(&semaphore).acquire_owned().await.unwrap()
>> + };
>> +
>> + let per_remote_connections_permit = Arc::clone(&per_remote_semaphore)
>> + .acquire_owned()
>> + .await
>> + .unwrap();
>> +
>> + let func_clone = func.clone();
>> + let remote_clone = remote.clone();
>> + let node_name = node.node.clone();
>> + let context_clone = context.clone();
>> +
>> + nodes_join_set.spawn(Self::fetch_pve_node(
>> + func_clone,
>> + context_clone,
>> + remote_clone,
>> + node_name,
>> + permit,
>> + per_remote_connections_permit,
>> + ));
>> + }
>> +
>> + while let Some(join_result) = nodes_join_set.join_next().await {
>> + match join_result {
>> + Ok((node_name, per_node_result)) => {
>> + per_remote_results
>> + .node_results
>> + .insert(node_name, per_node_result);
>> + }
>> + Err(e) => {
>> + log::error!("join error when waiting for future: {e}")
>> + }
>> + }
>> + }
>> + }
>> + RemoteType::Pbs => {
>> + let node = "localhost".to_string();
>> +
>> + let now = Instant::now();
>> + let result = func(context, remote.clone(), node.clone()).await;
>> + let api_response_time = now.elapsed();
>> +
>> + match result {
>> + Ok(data) => {
>> + per_remote_results.node_results.insert(
>> + node,
>> + Ok(NodeResults {
>> + data,
>> + api_response_time,
>> + }),
>> + );
>> + }
>> + Err(err) => {
>> + per_remote_results.node_results.insert(node, Err(err));
>> + }
>> + }
>> + }
>> + }
>> +
>> + (remote.id, Ok(per_remote_results))
>> + }
>
> probably a `fn do_for_all_remotes` that only does one thing per remote
> would also be nice? (i think stefan said he'll need something like this)
Will be added for v2, I adopted (and slightly modified, after some
discussion) Stefan's patches for that
>> +
>> + async fn fetch_pve_node<F, Ft, T>(
>> + func: F,
>> + context: C,
>> + remote: Remote,
>> + node: String,
>> + _permit: OwnedSemaphorePermit,
>> + _per_remote_connections_permit: OwnedSemaphorePermit,
>> + ) -> (String, Result<NodeResults<T>, Error>)
>> + where
>> + F: Fn(C, Remote, String) -> Ft + Clone + Send + 'static,
>> + Ft: Future<Output = Result<T, Error>> + Send + 'static,
>> + T: Send + Debug + 'static,
>> + {
>> + let now = Instant::now();
>> + let result = func(context, remote.clone(), node.clone()).await;
>> + let api_response_time = now.elapsed();
>> +
>> + match result {
>> + Ok(data) => (
>> + node,
>> + Ok(NodeResults {
>> + data,
>> + api_response_time,
>> + }),
>> + ),
>> + Err(err) => (node, Err(err)),
>> + }
>> + }
>> +}
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel
2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner
@ 2025-08-29 14:11 ` Lukas Wagner
2 siblings, 0 replies; 10+ messages in thread
From: Lukas Wagner @ 2025-08-29 14:11 UTC (permalink / raw)
To: Proxmox Datacenter Manager development discussion; +Cc: pdm-devel
On Thu Aug 28, 2025 at 4:42 PM CEST, Lukas Wagner wrote:
> Still work in progress (code style improvements, documentation, naming),
> but sharing early because Stefan might need something similar.
>
> Please do not merge yet.
>
> proxmox-datacenter-manager:
>
> Lukas Wagner (2):
> server: add convenience helper to fetch results from many remote nodes
> in parallel
> remote task fetching: use ParallelFetcher helper
>
> .../tasks/remote_tasks.rs | 239 ++++++------------
> server/src/lib.rs | 1 +
> server/src/parallel_fetcher.rs | 236 +++++++++++++++++
> 3 files changed, 314 insertions(+), 162 deletions(-)
> create mode 100644 server/src/parallel_fetcher.rs
>
>
> Summary over all repositories:
> 3 files changed, 314 insertions(+), 162 deletions(-)
superseded by:
https://lore.proxmox.com/pdm-devel/20250829141028.309835-1-l.wagner@proxmox.com/T/#t
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-29 14:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-28 14:42 [pdm-devel] [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 1/2] server: add convenience helper to fetch results " Lukas Wagner
2025-08-29 8:57 ` Dominik Csapak
2025-08-29 9:02 ` Stefan Hanreich
2025-08-29 9:17 ` Lukas Wagner
2025-08-29 14:05 ` Lukas Wagner
2025-08-28 14:42 ` [pdm-devel] [RFC proxmox-datacenter-manager 2/2] remote task fetching: use ParallelFetcher helper Lukas Wagner
2025-08-29 9:15 ` Dominik Csapak
2025-08-29 9:19 ` Lukas Wagner
2025-08-29 14:11 ` [pdm-devel] superseded: [RFC proxmox-datacenter-manager 0/2] add helper to fetch from many remote nodes in parallel Lukas Wagner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox