From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pdm-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 5BD0A1FF183
	for <inbox@lore.proxmox.com>; Fri, 21 Feb 2025 14:21:35 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id CEDF91D74;
	Fri, 21 Feb 2025 14:21:34 +0100 (CET)
References: <20250214130653.283012-1-l.wagner@proxmox.com>
User-agent: mu4e 1.10.8; emacs 29.4
From: Maximiliano Sandoval <m.sandoval@proxmox.com>
To: Proxmox Datacenter Manager development discussion
 <pdm-devel@lists.proxmox.com>
Date: Fri, 21 Feb 2025 14:19:36 +0100
In-reply-to: <20250214130653.283012-1-l.wagner@proxmox.com>
Message-ID: <s8o4j0n4en8.fsf@proxmox.com>
MIME-Version: 1.0
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.049 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 POISEN_SPAM_PILL          0.1 Meta: its spam
 POISEN_SPAM_PILL_1        0.1 random spam to be learned in bayes
 POISEN_SPAM_PILL_3        0.1 random spam to be learned in bayes
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [proxmox.com]
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager v2 00/28] metric
 collection improvements (concurrency, config, API, CLI)
X-BeenThere: pdm-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Datacenter Manager development discussion
 <pdm-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/>
List-Post: <mailto:pdm-devel@lists.proxmox.com>
List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Datacenter Manager development discussion
 <pdm-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pdm-devel-bounces@lists.proxmox.com
Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com>


I went through the series and it looks good to me.

Reviewed-by: Maximiliano Sandoval <m.sandoval@proxmox.com>

Lukas Wagner <l.wagner@proxmox.com> writes:

> Key points:
> - fetch metrics concurrently
> - configuration for metric collection
>   - new config /etc/proxmox-datacenter-manager/metric-collection.json
>       - max-concurrency (number of allowed parallel connections)
>       - collection-interval
>       - randomized offset for collection start
>          (min-interval-offset..max-interval-offset)
>       - randomized per-connection delay
>          (max-connection-delay..max-connection-delay)
> - Add some tests for the core logic in the metric collection system
> - Allow to trigger metric collection via the API
> - Record metric collection statistics in the RRD
>   - overall collection time for all remotes
>   - per remote response time when fetching metrics
> - Persist metric collection state to disk:
>   /var/lib/proxmox-datacenter-manager/metric-collection-state.json
>   (timestamps of last collection, errors)
> - Trigger metric collection for any new remotes added via the API
>
> - Add new API endpoints
> 	POST     /metric-collection/trigger with optional 'remote' param
> 	GET      /metric-collection/status
> 	GET/PUT  /config/metric-collection/default
> 	GET      /remotes/<remote>/metric-collection-rrddata
> 	GET      /metric-collection/rrddata
>
> - Add CLI tooling
> 	proxmox-datacenter-client metric-collection settings show
> 	proxmox-datacenter-client metric-collection settings update
> 	proxmox-datacenter-client metric-collection trigger [--remote <remote>]
> 	proxmox-datacenter-client metric-collection status
>
>
> ## To reviewers / open questions:
> - Please review the defaults I've chosen for the settings, especially
>   the ones for the default metric collection interval (10 minutes) as
>   well as max-concurrency (10).
>   I also kindly ask to double-check the naming of the properties.
>   See "pdm-api-types: add CollectionSettings type" for details
>
> - Please review path and params for new API endpoints (anything public
>   facing that is hard to change later)
>
> - I've chosen a section-config config now, even though we only have a
>   single section for now. This was done for future-proofing reasons,
>   maybe we want to add support for different setting 'groups' or
>   something, e.g. to have different settings for distinct sets of
>   remotes. Does this make sense?
>   Or should I just stick to a simple config for now? (At moments like
>   these I wish for TOML configs where we could be a bit more flexible...)
>
> 	collection-settings: default
> 	    max-concurrency 10
> 	    collection-interval 180
> 	    min-interval-offset 0
> 	    max-interval-offset 20
> 	    min-connection-delay 10
> 	    max-connection-delay 100
>
>
> - Should `GET /remotes/<remote>/metric-collection-rrddata` be
>   just `rrddata`?
>   not sure if we are going to add any other PDM-native per-remote
>   metrics and whether we want to return that from the same API call
>   as this...
>
> ## Potential future work
> - UI button for triggering metric collection
> - UI for metric collection settings
> - Show RRD graphs for metric collection stats somewhere
>
> ## Random offset/delay examples
> Example with 'max-concurrency' = 3 and 6 remotes.
>
>     X ... timer triggered
>     [ A ] .... fetching remote 'A'
>     **** .... interval-offset     (usually a couple of seconds)
>     #### .... random worker delay (usually in millisecond range)
>
>                          /--########[  B    ] ### [  C  ]--\
>                         /---####[  A  ] ###### [ D ]--------\
> ----X ************* ---/ ---###### [  E  ] #########[  F  ]--\----
>
> Changes since [v1]:
>   - add missing dependency to librust-rand-dev to d/control
>   - Fix a couple of minor spelling/punctuation issues (thx maximiliano)
>   - Some minor code style improvments, e.g. using unwrap_or_else instead of doing
>     a manual match
>   - Document return values of 'setup_timer' function
>   - Factor out handle_tick/handle_control_message
>   - Minor refatoring/code style improvments
>   - CLI: Change 'update-settings' to 'settings update'
>   - CLI: Change 'show-settings' to 'settings show'
>   - change missed tick behavior for tokio::time::Interval to 'skip' instead
>     of burst.
>
> The last three commits are new in v2.
>
> [v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t
>


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel