From: Dominik Csapak <d.csapak@proxmox.com>
To: Proxmox Datacenter Manager development discussion
<pdm-devel@lists.proxmox.com>
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache
Date: Wed, 9 Jul 2025 13:35:33 +0200 [thread overview]
Message-ID: <9bf25aa1-c4cc-4af8-9ae9-15e0124d985f@proxmox.com> (raw)
In-Reply-To: <66712394-ea60-4a3b-88b1-4b5955be4108@proxmox.com>
On 7/9/25 13:22, Lukas Wagner wrote:
>
>
> On 2025-07-03 10:05, Lukas Wagner wrote:
>>>> +}
>>>> +
>>>> +/// Handle a single timer tick.
>>>> +/// Will handle archive file rotation, polling of tracked tasks and fetching or remote tasks.
>>>> +async fn do_tick(cycle: u64) -> Result<(), Error> {
>>>> + let cache = remote_tasks::get_cache()?;
>>>> +
>>>> + if should_check_for_cache_rotation(cycle) {
>>>> + log::debug!("checking if remote task archive should be rotated");
>>>> + if rotate_cache(cache.clone()).await? {
>>>> + log::info!("rotated remote task archive");
>>>> + }
>>>> + }
>>>> +
>>>> + let state = cache.read_state();
>>>> +
>>>> + let mut all_tasks = HashMap::new();
>>>> +
>>>> + let total_connections_semaphore = Arc::new(Semaphore::new(MAX_CONNECTIONS));
>>>> + let mut join_set = JoinSet::new();
>>>> +
>>>> + // Get a list of remotes that we should poll in this cycle.
>>>> + let remotes = remotes_to_check(cycle, &state).await?;
>>>> + for remote in remotes {
>>>> + let since = get_cutoff_timestamp(&remote, &state);
>>>> +
>>>> + let permit = if remote.ty == RemoteType::Pve {
>>>> + // Acquire multiple permits, for PVE remotes we want
>>>> + // to multiple nodes in parallel.
>>>> + //
>>>> + // `.unwrap()` is safe, we never close the semaphore.
>>>> + Arc::clone(&total_connections_semaphore)
>>>> + .acquire_many_owned(CONNECTIONS_PER_PVE_REMOTE as u32)
>>>> + .await
>>>> + .unwrap()
>>>
>>> would it be possible to acquire the connection semaphores dynamicall inside the
>>> `fetch_tasks` call up to the maximum?
>>>
>>> that way, we could e.g. connect to 20 remotes with one host in parallel
>>> instead of always having maximum of 4 ?
>>> (not sure about the tokio semaphore possibilities here)
>>>
>>> I'd still limit it to CONNECTIONS_PER_PVE_REMOTE for each remote,
>>> but in case one remote has less nodes, we could utilize the connection count
>>> for more remotes, doing more work in parallel.
>>
>> IIRC there was some problem with allocating these on demand, I think there was some potential
>> for a deadlock - though I can't come up with the 'why' right now. I'll check again and
>> add some comment if I remember the reason again.
>>
>
>
> Revisiting this again right now.
>
> The problem is that for PVE remotes, fetching the tasks is a 2 step process. First, we have to
> get a list of all nodes, and second, we have to connect to all nodes to get a list of the node's
> tasks. The 'get list of nodes' should be guarded by a semaphore, because if you have a huge amount
> of remotes, you don't want to connect to all of them at the same time.
>
> This means, if we aquire the permits on demand, we'd have to (pseude code):
>
> semaphore = Semaphore::new()
>
> for remote in remotes {
> spawn(|| {
> single_permit = acquire(&semaphore)
> nodes = list_nodes()
> remaining_permits = acquire_many(&semaphore, min(nodes.len(), CONNECTIONS_PER_PVE_REMOTE) - 1)
> for node in nodes {
> // Fetch tasks from node concurrently
> }
> drop(single_permit)
> drop(remaining_permits)
> })
> }
>
> Since the inner part is execute for many remotes at once, it could happen that the
> semaphore's number of permits is already exhausted by the concurrent calls
> to the first acquire, leading to a deadlock when the additional permits are requested.
>
> This is why we need to request all permits in advance for now. I'll add some comment
> to the code to document this.
>
> Once we are based on trixie, we could use SemaphorePermit::split() [1] and then release the
> permits we don't actually need.
>
> [1] https://docs.rs/tokio/1.46.1/tokio/sync/struct.SemaphorePermit.html#method.split
>
>
i get what you mean, but couldn't we release the listing semaphore before acquiring the ones
for the task fetching?
i.e. simply moving the `drop(single_permit)` above the `remaining_permits = ...` ?
that way the listing of nodes is still guarded, but we still can do it like you shown...
(Aside that keeping the permit for the listing for the duration of the remote api calls
seems false anyway...)
in the worst case, all remotes will be queried for the nodes before the querying for tasks starts,
but that seems unlikely...
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next prev parent reply other threads:[~2025-07-09 11:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-12 11:41 [pdm-devel] [PATCH proxmox-datacenter-manager v5 0/6] remote task cache fetching task / better cache backend Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 1/6] remote tasks: implement improved cache for remote tasks Lukas Wagner
2025-05-14 14:08 ` Dominik Csapak
2025-07-01 10:02 ` Lukas Wagner
2025-07-03 8:05 ` Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache Lukas Wagner
2025-05-14 15:27 ` Dominik Csapak
2025-07-03 8:05 ` Lukas Wagner
2025-07-03 11:25 ` Dominik Csapak
2025-07-09 11:22 ` Lukas Wagner
2025-07-09 11:35 ` Dominik Csapak [this message]
2025-07-09 12:25 ` Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 3/6] remote tasks: improve locking for task archive iterator Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 4/6] pdm-api-types: remote tasks: add new_from_str constructor for TaskStateType Lukas Wagner
2025-05-15 6:56 ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 5/6] fake remote: make the fake_remote feature compile again Lukas Wagner
2025-05-15 6:55 ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 6/6] fake remote: clippy fixes Lukas Wagner
2025-05-15 7:05 ` Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9bf25aa1-c4cc-4af8-9ae9-15e0124d985f@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal