From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pdm-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 8F8301FF172
	for <inbox@lore.proxmox.com>; Wed, 16 Apr 2025 15:03:59 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 025E2375E2;
	Wed, 16 Apr 2025 15:03:58 +0200 (CEST)
Message-ID: <a5c2f0dc-1090-4f4f-aa0e-895b1f877a18@proxmox.com>
Date: Wed, 16 Apr 2025 15:03:53 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: Lukas Wagner <l.wagner@proxmox.com>
To: Proxmox Datacenter Manager development discussion
 <pdm-devel@lists.proxmox.com>
References: <20250214130653.283012-1-l.wagner@proxmox.com>
 <7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com>
 <88c03c89-f8e1-4538-94ad-89b829a6c06c@proxmox.com>
Content-Language: de-AT, en-US
In-Reply-To: <88c03c89-f8e1-4538-94ad-89b829a6c06c@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.015 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: [pdm-devel] superseded: [PATCH proxmox-datacenter-manager v2 00/28]
 metric collection improvements (concurrency, config, API, CLI)
X-BeenThere: pdm-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Datacenter Manager development discussion
 <pdm-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/>
List-Post: <mailto:pdm-devel@lists.proxmox.com>
List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Datacenter Manager development discussion
 <pdm-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pdm-devel-bounces@lists.proxmox.com
Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com>

Sorry for the late reply.

On  2025-03-16 22:45, Thomas Lamprecht wrote:
> On 14/03/2025 15:10, Lukas Wagner wrote:
>> On  2025-02-14 14:06, Lukas Wagner wrote:
>>> ## To reviewers / open questions:
>>> - Please review the defaults I've chosen for the settings, especially
>>>   the ones for the default metric collection interval (10 minutes) as
>>>   well as max-concurrency (10).
>>>   I also kindly ask to double-check the naming of the properties.
>>>   See "pdm-api-types: add CollectionSettings type" for details
>>>
>>> - Please review path and params for new API endpoints (anything public
>>>   facing that is hard to change later)
>>>
>>> - I've chosen a section-config config now, even though we only have a
>>>   single section for now. This was done for future-proofing reasons,
>>>   maybe we want to add support for different setting 'groups' or
>>>   something, e.g. to have different settings for distinct sets of
>>>   remotes. Does this make sense?
>>>   Or should I just stick to a simple config for now? (At moments like
>>>   these I wish for TOML configs where we could be a bit more flexible...)
>>>
>>> 	collection-settings: default
>>> 	    max-concurrency 10
>>> 	    collection-interval 180
>>> 	    min-interval-offset 0
>>> 	    max-interval-offset 20
>>> 	    min-connection-delay 10
>>> 	    max-connection-delay 100
>>>
>>
>> Currently thinking about generalizing the `max-concurrency` setting into something global
>> that affects all 'background' polling operations (resource cache/task cache refreshes/
>> metric fetching).
> 
> The usefulness of such a thing depends a bit on what we want to
> limit (amount of total requests to a target remote?), and why (reducing
> network traffic, load on target node, load on PDM, ...?)
> 
I think it makes sense to think about this from the following perspective:
It makes sense to limit concurrency (we already do), and if one limits
concurrency, I think that should be done globally, not per task.
Otherwise the total number of connections at a time might depended
on how polling tasks are scheduled (e.g. if they happen to run at the same
time or not, which might lead to hard to predict load spikes)

Later we could have separate settings for 'groups of remotes', e.g. to limit
the number of concurrent connections via some slow VPN tunnel to some data center that
houses a couple, but not all remotes.

>>
>> For actually controlling the concurrency we could maybe have a globally available
>> semaphore (potential deadlock potential in some cases).
>> Alternatively, we could think about having a 'background request queue' and
>> a 'background request scheduler', which does the actual requests on behalf of
>> the other tasks.
> 
> semaphores sound nice but IMO often aren't, especially as they lack
> introspection, I'm sure that's better in rust than in C, but  a dedicated
> queueing mechanism _might_ be nicer, especially as one can then also
> use more useful approaches to queue/schedule things. And, e.g., once our
> target APIs support something nice as QUIC we could even batch requests
> to a single target (remote) together.

After playing around with some ideas, I'll probably go with a semaphore at first, but
abstracted in a way that it should be quite easy to change to something more complex later.


> 
>>
>> I'll give this a bit more thoughts in the next days/weeks, so maybe don't merge
>> this in the meanwhile.
>>
> 
> ack.

I've sent a [v3] with some of the settings which would potentially be changed again
by a request queue/scheduler removed, namely:
  - max-concurrency
  - *-interval-offset
  - *-connection-delay

I'll be on vacation soon and I don't think I can whip out the request queue thingy
before that, and I didn't want to keep this patch series in a limbo state until I return :)
Hopefully v3 is ready to be applied and I can then build on that later.

[v3]: https://lore.proxmox.com/pdm-devel/20250416125642.291552-1-l.wagner@proxmox.com/T/#t

-- 
- Lukas



_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel