From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pdm-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 690C91FF172 for <inbox@lore.proxmox.com>; Wed, 16 Apr 2025 14:56:53 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9748536C4A; Wed, 16 Apr 2025 14:56:51 +0200 (CEST) From: Lukas Wagner <l.wagner@proxmox.com> To: pdm-devel@lists.proxmox.com Date: Wed, 16 Apr 2025 14:56:16 +0200 Message-Id: <20250416125642.291552-1-l.wagner@proxmox.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.136 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v3 00/26] metric collection improvements (concurrency, config, API, CLI) X-BeenThere: pdm-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Datacenter Manager development discussion <pdm-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/> List-Post: <mailto:pdm-devel@lists.proxmox.com> List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Datacenter Manager development discussion <pdm-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pdm-devel-bounces@lists.proxmox.com Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com> Key points: - fetch metrics concurrently - configuration for metric collection - new config /etc/proxmox-datacenter-manager/metric-collection.cfg, allowing configuration of the collection interval - Add some tests for the core logic in the metric collection system - Allow to trigger metric collection via the API - Record metric collection statistics in the RRD - overall collection time for all remotes - per remote response time when fetching metrics - Persist metric collection state to disk: /var/lib/proxmox-datacenter-manager/metric-collection-state.json (timestamps of last collection, errors) - Trigger metric collection for any new remotes added via the API - Add new API endpoints POST /metric-collection/trigger with optional 'remote' param GET /metric-collection/status GET/PUT /config/metric-collection/default GET /remotes/<remote>/metric-collection-rrddata GET /metric-collection/rrddata - Add CLI tooling proxmox-datacenter-client metric-collection settings show proxmox-datacenter-client metric-collection settings update proxmox-datacenter-client metric-collection trigger [--remote <remote>] proxmox-datacenter-client metric-collection status ## To reviewers / open questions: - Please review the defaults I've chosen for the settings, e.g. the ones for the default metric collection interval (10 minutes) I also kindly ask to double-check the naming of the properties. See "pdm-api-types: add CollectionSettings type" for details - Please review path and params for new API endpoints (anything public facing that is hard to change later) - I've chosen a section-config config now, even though we only have a single section for now. This was done for future-proofing reasons, maybe we want to add support for different setting 'groups' or something, e.g. to have different settings for distinct sets of remotes. Does this make sense? Or should I just stick to a simple config for now? (At moments like these I wish for TOML configs where we could be a bit more flexible...) collection-settings: default collection-interval 180 # These have been removed in v3, but might be readded # in some other form in the future: # max-concurrent-connections 10 # min-interval-offset 0 # max-interval-offset 20 # min-connection-delay 10 # max-connection-delay 100 - Should `GET /remotes/<remote>/metric-collection-rrddata` be just `rrddata`? not sure if we are going to add any other PDM-native per-remote metrics and whether we want to return that from the same API call as this... ## Potential future work - UI button for triggering metric collection - UI for metric collection settings - Show RRD graphs for metric collection stats somewhere - Have some global concurrency control knob for background requests [request scheduling]. Changes since [v2]: - For now, drop settings that might change any way with a global background request scheduling system [request scheduling]: - max-concurrency - {min,max}-interval-offset - {min,max}-connection-delay Changes since [v1]: - add missing dependency to librust-rand-dev to d/control - Fix a couple of minor spelling/punctuation issues (thx maximiliano) - Some minor code style improvments, e.g. using unwrap_or_else instead of doing a manual match - Document return values of 'setup_timer' function - Factor out handle_tick/handle_control_message - Minor refatoring/code style improvments - CLI: Change 'update-settings' to 'settings update' - CLI: Change 'show-settings' to 'settings show' - change missed tick behavior for tokio::time::Interval to 'skip' instead of burst. The last three commits are new in v2. [v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t [v2]: https://lore.proxmox.com/pdm-devel/20250214130653.283012-1-l.wagner@proxmox.com/ [request scheduling]: https://lore.proxmox.com/pdm-devel/7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com/ proxmox-datacenter-manager: Lukas Wagner (26): pdm-api-types: add CollectionSettings type pdm-config: add functions for reading/writing metric collection settings metric collection: split top_entities split into separate module metric collection: save metric data to RRD in separate task metric collection: rework metric poll task metric collection: persist state after metric collection metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL metric collection: collect overdue metrics on startup/timer change metric collection: add tests for the fetch_remotes function metric collection: add test for fetch_overdue metric collection: pass rrd cache instance as function parameter metric collection: add test for rrd task metric collection: wrap rrd_cache::Cache in a struct metric collection: record remote response time in metric database metric collection: save time needed for collection run to RRD metric collection: periodically clean removed remotes from statefile api: add endpoint for updating metric collection settings api: add endpoint to trigger metric collection api: remotes: trigger immediate metric collection for newly added nodes api: add api for querying metric collection RRD data api: metric-collection: add status endpoint pdm-client: add metric collection API methods cli: add commands for metric-collection settings, trigger, status metric collection: factor out handle_tick and handle_control_message fns metric collection: skip missed timer ticks metric collection: use JoinSet instead of joining from handles in a Vec cli/client/Cargo.toml | 1 + cli/client/src/main.rs | 2 + cli/client/src/metric_collection.rs | 145 ++++ lib/pdm-api-types/src/lib.rs | 3 + lib/pdm-api-types/src/metric_collection.rs | 83 +++ lib/pdm-api-types/src/rrddata.rs | 26 + lib/pdm-client/src/lib.rs | 87 +++ lib/pdm-config/src/lib.rs | 1 + lib/pdm-config/src/metric_collection.rs | 69 ++ server/src/api/config/metric_collection.rs | 97 +++ server/src/api/config/mod.rs | 2 + server/src/api/metric_collection.rs | 99 +++ server/src/api/mod.rs | 2 + server/src/api/remotes.rs | 59 ++ server/src/api/resources.rs | 3 +- server/src/api/rrd_common.rs | 11 +- server/src/bin/proxmox-datacenter-api/main.rs | 2 +- .../src/metric_collection/collection_task.rs | 702 ++++++++++++++++++ server/src/metric_collection/mod.rs | 346 ++------- server/src/metric_collection/rrd_cache.rs | 204 ++--- server/src/metric_collection/rrd_task.rs | 289 +++++++ server/src/metric_collection/state.rs | 150 ++++ server/src/metric_collection/top_entities.rs | 150 ++++ 23 files changed, 2165 insertions(+), 368 deletions(-) create mode 100644 cli/client/src/metric_collection.rs create mode 100644 lib/pdm-api-types/src/metric_collection.rs create mode 100644 lib/pdm-config/src/metric_collection.rs create mode 100644 server/src/api/config/metric_collection.rs create mode 100644 server/src/api/metric_collection.rs create mode 100644 server/src/metric_collection/collection_task.rs create mode 100644 server/src/metric_collection/rrd_task.rs create mode 100644 server/src/metric_collection/state.rs create mode 100644 server/src/metric_collection/top_entities.rs Summary over all repositories: 23 files changed, 2165 insertions(+), 368 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel