public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: Proxmox Datacenter Manager development discussion
	<pdm-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache
Date: Wed, 9 Jul 2025 14:25:23 +0200	[thread overview]
Message-ID: <2924f3d6-f178-4586-9372-ed2719623d3b@proxmox.com> (raw)
In-Reply-To: <9bf25aa1-c4cc-4af8-9ae9-15e0124d985f@proxmox.com>



On  2025-07-09 13:35, Dominik Csapak wrote:
> On 7/9/25 13:22, Lukas Wagner wrote:
>>
>>
>> On  2025-07-03 10:05, Lukas Wagner wrote:
>>>>> +}
>>>>> +
>>>>> +/// Handle a single timer tick.
>>>>> +/// Will handle archive file rotation, polling of tracked tasks and fetching or remote tasks.
>>>>> +async fn do_tick(cycle: u64) -> Result<(), Error> {
>>>>> +    let cache = remote_tasks::get_cache()?;
>>>>> +
>>>>> +    if should_check_for_cache_rotation(cycle) {
>>>>> +        log::debug!("checking if remote task archive should be rotated");
>>>>> +        if rotate_cache(cache.clone()).await? {
>>>>> +            log::info!("rotated remote task archive");
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    let state = cache.read_state();
>>>>> +
>>>>> +    let mut all_tasks = HashMap::new();
>>>>> +
>>>>> +    let total_connections_semaphore = Arc::new(Semaphore::new(MAX_CONNECTIONS));
>>>>> +    let mut join_set = JoinSet::new();
>>>>> +
>>>>> +    // Get a list of remotes that we should poll in this cycle.
>>>>> +    let remotes = remotes_to_check(cycle, &state).await?;
>>>>> +    for remote in remotes {
>>>>> +        let since = get_cutoff_timestamp(&remote, &state);
>>>>> +
>>>>> +        let permit = if remote.ty == RemoteType::Pve {
>>>>> +            // Acquire multiple permits, for PVE remotes we want
>>>>> +            // to multiple nodes in parallel.
>>>>> +            //
>>>>> +            // `.unwrap()` is safe, we never close the semaphore.
>>>>> +            Arc::clone(&total_connections_semaphore)
>>>>> +                .acquire_many_owned(CONNECTIONS_PER_PVE_REMOTE as u32)
>>>>> +                .await
>>>>> +                .unwrap()
>>>>
>>>> would it be possible to acquire the connection semaphores dynamicall inside the
>>>> `fetch_tasks` call up to the maximum?
>>>>
>>>> that way, we could e.g. connect to 20 remotes with one host in parallel
>>>> instead of always having maximum of 4 ?
>>>> (not sure about the tokio semaphore possibilities here)
>>>>
>>>> I'd still limit it to CONNECTIONS_PER_PVE_REMOTE for each remote,
>>>> but in case one remote has less nodes, we could utilize the connection count
>>>> for more remotes, doing more work in parallel.
>>>
>>> IIRC there was some problem with allocating these on demand, I think there was some potential
>>> for a deadlock - though I can't come up with the 'why' right now. I'll check again and
>>> add some comment if I remember the reason again.
>>>
>>
>>
>> Revisiting this again right now.
>>
>> The problem is that for PVE remotes, fetching the tasks is a 2 step process. First, we have to
>> get a list of all nodes, and second, we have to connect to all nodes to get a list of the node's
>> tasks. The 'get list of nodes' should be guarded by a semaphore, because if you have a huge amount
>> of remotes, you don't want to connect to all of them at the same time.
>>
>> This means, if we aquire the permits on demand, we'd have to (pseude code):
>>
>> semaphore = Semaphore::new()
>>
>> for remote in remotes {
>>    spawn(|| {
>>      single_permit = acquire(&semaphore)
>>      nodes = list_nodes()
>>      remaining_permits = acquire_many(&semaphore, min(nodes.len(), CONNECTIONS_PER_PVE_REMOTE) - 1)
>>      for node in nodes {
>>        // Fetch tasks from node concurrently
>>      }
>>      drop(single_permit)
>>      drop(remaining_permits)
>>    })
>> }
>>
>> Since the inner part is execute for many remotes at once, it could happen that the
>> semaphore's  number of permits is already exhausted by the concurrent calls
>> to the first acquire, leading to a deadlock when the additional permits are requested.
>>
>> This is why we need to request all permits in advance for now. I'll add some comment
>> to the code to document this.
>>
>> Once we are based on trixie, we could use SemaphorePermit::split() [1] and then release the
>> permits we don't actually need.
>>
>> [1] https://docs.rs/tokio/1.46.1/tokio/sync/struct.SemaphorePermit.html#method.split
>>
>>
> 
> i get what you mean, but couldn't we release the listing semaphore before acquiring the ones
> for the task fetching?
> 
> i.e. simply moving the `drop(single_permit)` above the `remaining_permits = ...` ?
> 
> that way the listing of nodes is still guarded, but we still can do it like you shown...
> (Aside that keeping the permit for the listing for the duration of the remote api calls
> seems false anyway...)
> 
> in the worst case, all remotes will be queried for the nodes before the querying for tasks starts, but that seems unlikely...
> 

Yeah, that's what I wanted to avoid at first... feels a bit cleaner to handle one remote in 'one go', without
potentially blocking again. But I guess it's not too bad either, so I'll implement it as you said.

Thanks for the input!
-- 
- Lukas



_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel

  reply	other threads:[~2025-07-09 12:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-12 11:41 [pdm-devel] [PATCH proxmox-datacenter-manager v5 0/6] remote task cache fetching task / better cache backend Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 1/6] remote tasks: implement improved cache for remote tasks Lukas Wagner
2025-05-14 14:08   ` Dominik Csapak
2025-07-01 10:02     ` Lukas Wagner
2025-07-03  8:05     ` Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 2/6] remote tasks: add background task for task polling, use new task cache Lukas Wagner
2025-05-14 15:27   ` Dominik Csapak
2025-07-03  8:05     ` Lukas Wagner
2025-07-03 11:25       ` Dominik Csapak
2025-07-09 11:22       ` Lukas Wagner
2025-07-09 11:35         ` Dominik Csapak
2025-07-09 12:25           ` Lukas Wagner [this message]
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 3/6] remote tasks: improve locking for task archive iterator Lukas Wagner
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 4/6] pdm-api-types: remote tasks: add new_from_str constructor for TaskStateType Lukas Wagner
2025-05-15  6:56   ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 5/6] fake remote: make the fake_remote feature compile again Lukas Wagner
2025-05-15  6:55   ` Dominik Csapak
2025-05-12 11:41 ` [pdm-devel] [PATCH proxmox-datacenter-manager v5 6/6] fake remote: clippy fixes Lukas Wagner
2025-05-15  7:05   ` Dominik Csapak
2025-08-14  8:04 ` [pdm-devel] superseded: [PATCH proxmox-datacenter-manager v5 0/6] remote task cache fetching task / better cache backend Lukas Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2924f3d6-f178-4586-9372-ed2719623d3b@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal