From: Lukas Wagner <l.wagner@proxmox.com>
To: Proxmox Datacenter Manager development discussion
<pdm-devel@lists.proxmox.com>,
Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache
Date: Wed, 9 Jul 2025 14:25:23 +0200 [thread overview]
Message-ID: <2924f3d6-f178-4586-9372-ed2719623d3b@proxmox.com> (raw)
In-Reply-To: <9bf25aa1-c4cc-4af8-9ae9-15e0124d985f@proxmox.com>
On 2025-07-09 13:35, Dominik Csapak wrote:
> On 7/9/25 13:22, Lukas Wagner wrote:
>>
>>
>> On 2025-07-03 10:05, Lukas Wagner wrote:
>>>>> +}
>>>>> +
>>>>> +/// Handle a single timer tick.
>>>>> +/// Will handle archive file rotation, polling of tracked tasks and fetching or remote tasks.
>>>>> +async fn do_tick(cycle: u64) -> Result<(), Error> {
>>>>> + let cache = remote_tasks::get_cache()?;
>>>>> +
>>>>> + if should_check_for_cache_rotation(cycle) {
>>>>> + log::debug!("checking if remote task archive should be rotated");
>>>>> + if rotate_cache(cache.clone()).await? {
>>>>> + log::info!("rotated remote task archive");
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + let state = cache.read_state();
>>>>> +
>>>>> + let mut all_tasks = HashMap::new();
>>>>> +
>>>>> + let total_connections_semaphore = Arc::new(Semaphore::new(MAX_CONNECTIONS));
>>>>> + let mut join_set = JoinSet::new();
>>>>> +
>>>>> + // Get a list of remotes that we should poll in this cycle.
>>>>> + let remotes = remotes_to_check(cycle, &state).await?;
>>>>> + for remote in remotes {
>>>>> + let since = get_cutoff_timestamp(&remote, &state);
>>>>> +
>>>>> + let permit = if remote.ty == RemoteType::Pve {
>>>>> + // Acquire multiple permits, for PVE remotes we want
>>>>> + // to multiple nodes in parallel.
>>>>> + //
>>>>> + // `.unwrap()` is safe, we never close the semaphore.
>>>>> + Arc::clone(&total_connections_semaphore)
>>>>> + .acquire_many_owned(CONNECTIONS_PER_PVE_REMOTE as u32)
>>>>> + .await
>>>>> + .unwrap()
>>>>
>>>> would it be possible to acquire the connection semaphores dynamicall inside the
>>>> `fetch_tasks` call up to the maximum?
>>>>
>>>> that way, we could e.g. connect to 20 remotes with one host in parallel
>>>> instead of always having maximum of 4 ?
>>>> (not sure about the tokio semaphore possibilities here)
>>>>
>>>> I'd still limit it to CONNECTIONS_PER_PVE_REMOTE for each remote,
>>>> but in case one remote has less nodes, we could utilize the connection count
>>>> for more remotes, doing more work in parallel.
>>>
>>> IIRC there was some problem with allocating these on demand, I think there was some potential
>>> for a deadlock - though I can't come up with the 'why' right now. I'll check again and
>>> add some comment if I remember the reason again.
>>>
>>
>>
>> Revisiting this again right now.
>>
>> The problem is that for PVE remotes, fetching the tasks is a 2 step process. First, we have to
>> get a list of all nodes, and second, we have to connect to all nodes to get a list of the node's
>> tasks. The 'get list of nodes' should be guarded by a semaphore, because if you have a huge amount
>> of remotes, you don't want to connect to all of them at the same time.
>>
>> This means, if we aquire the permits on demand, we'd have to (pseude code):
>>
>> semaphore = Semaphore::new()
>>
>> for remote in remotes {
>> spawn(|| {
>> single_permit = acquire(&semaphore)
>> nodes = list_nodes()
>> remaining_permits = acquire_many(&semaphore, min(nodes.len(), CONNECTIONS_PER_PVE_REMOTE) - 1)
>> for node in nodes {
>> // Fetch tasks from node concurrently
>> }
>> drop(single_permit)
>> drop(remaining_permits)
>> })
>> }
>>
>> Since the inner part is execute for many remotes at once, it could happen that the
>> semaphore's number of permits is already exhausted by the concurrent calls
>> to the first acquire, leading to a deadlock when the additional permits are requested.
>>
>> This is why we need to request all permits in advance for now. I'll add some comment
>> to the code to document this.
>>
>> Once we are based on trixie, we could use SemaphorePermit::split() [1] and then release the
>> permits we don't actually need.
>>
>> [1] https://docs.rs/tokio/1.46.1/tokio/sync/struct.SemaphorePermit.html#method.split
>>
>>
>
> i get what you mean, but couldn't we release the listing semaphore before acquiring the ones
> for the task fetching?
>
> i.e. simply moving the `drop(single_permit)` above the `remaining_permits = ...` ?
>
> that way the listing of nodes is still guarded, but we still can do it like you shown...
> (Aside that keeping the permit for the listing for the duration of the remote api calls
> seems false anyway...)
>
> in the worst case, all remotes will be queried for the nodes before the querying for tasks starts, but that seems unlikely...
>
Yeah, that's what I wanted to avoid at first... feels a bit cleaner to handle one remote in 'one go', without
potentially blocking again. But I guess it's not too bad either, so I'll implement it as you said.
Thanks for the input!
--
- Lukas
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next prev parent reply other threads:[~2025-07-09 12:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-12 11:41 [pdm-devel] [PATCH proxmox-datacenter-manager v5 0/6] remote task cache fetching task / better cache backend Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 1/6] remote tasks: implement improved cache for remote tasks Lukas Wagner
2025-05-14 14:08 ` Dominik Csapak
2025-07-01 10:02 ` Lukas Wagner
2025-07-03 8:05 ` Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache Lukas Wagner
2025-05-14 15:27 ` Dominik Csapak
2025-07-03 8:05 ` Lukas Wagner
2025-07-03 11:25 ` Dominik Csapak
2025-07-09 11:22 ` Lukas Wagner
2025-07-09 11:35 ` Dominik Csapak
2025-07-09 12:25 ` Lukas Wagner [this message]
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 3/6] remote tasks: improve locking for task archive iterator Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 4/6] pdm-api-types: remote tasks: add new_from_str constructor for TaskStateType Lukas Wagner
2025-05-15 6:56 ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 5/6] fake remote: make the fake_remote feature compile again Lukas Wagner
2025-05-15 6:55 ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 6/6] fake remote: clippy fixes Lukas Wagner
2025-05-15 7:05 ` Dominik Csapak
2025-08-14 8:04 ` [pdm-devel] superseded: [PATCH proxmox-datacenter-manager v5 0/6] remote task cache fetching task / better cache backend Lukas Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2924f3d6-f178-4586-9372-ed2719623d3b@proxmox.com \
--to=l.wagner@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox