From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 3E0551FF170 for ; Thu, 21 Aug 2025 11:54:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 542511C966; Thu, 21 Aug 2025 11:54:15 +0200 (CEST) From: Lukas Wagner To: pdm-devel@lists.proxmox.com Date: Thu, 21 Aug 2025 11:52:56 +0200 Message-ID: <20250821095319.134215-1-l.wagner@proxmox.com> X-Mailer: git-send-email 2.47.2 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1755769975995 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.128 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v6 00/23] metric collection improvements (concurrency, API, CLI) X-BeenThere: pdm-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Datacenter Manager development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Datacenter Manager development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pdm-devel-bounces@lists.proxmox.com Sender: "pdm-devel" Key points: - fetch metrics concurrently - Add some tests for the core logic in the metric collection system - Allow to trigger metric collection via the API - Record metric collection statistics in the RRD - overall collection time for all remotes - per remote response time when fetching metrics - Persist metric collection state to disk: /var/lib/proxmox-datacenter-manager/metric-collection-state.json (timestamps of last collection, errors) - Trigger metric collection for any new remotes added via the API - Add new API endpoints POST /metric-collection/trigger with optional 'remote' param GET /metric-collection/status GET /remotes//metric-collection-rrddata GET /metric-collection/rrddata - Add CLI tooling proxmox-datacenter-client metric-collection trigger [--remote ] proxmox-datacenter-client metric-collection status ## To reviewers / open questions: - Please review path and params for new API endpoints (anything public facing that is hard to change later) - Should `GET /remotes//metric-collection-rrddata` be just `rrddata`? not sure if we are going to add any other PDM-native per-remote metrics and whether we want to return that from the same API call as this... ## Potential future work - UI button for triggering metric collection - UI for metric collection settings - Show RRD graphs for metric collection stats somewhere - Have some global concurrency control knob for background requests [request scheduling]. Changes since [v5]: - Rebased onto latest master Changes since [v4]: - Drop metric collection config file for now - these might better be stored together with config for other background tasks Changes since [v3]: - Rebase onto master - Fix a couple clippy warnings (CreateOptions is now Copy!) Changes since [v2]: - For now, drop settings that might change any way with a global background request scheduling system [request scheduling]: - max-concurrency - {min,max}-interval-offset - {min,max}-connection-delay Changes since [v1]: - add missing dependency to librust-rand-dev to d/control - Fix a couple of minor spelling/punctuation issues (thx maximiliano) - Some minor code style improvments, e.g. using unwrap_or_else instead of doing a manual match - Document return values of 'setup_timer' function - Factor out handle_tick/handle_control_message - Minor refatoring/code style improvments - CLI: Change 'update-settings' to 'settings update' - CLI: Change 'show-settings' to 'settings show' - change missed tick behavior for tokio::time::Interval to 'skip' instead of burst. The last three commits are new in v2. [v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t [v2]: https://lore.proxmox.com/pdm-devel/20250214130653.283012-1-l.wagner@proxmox.com/ [v3]: https://lore.proxmox.com/pdm-devel/20250416125642.291552-1-l.wagner@proxmox.com/T/#t [v4]: https://lore.proxmox.com/pdm-devel/20250512133725.262263-1-l.wagner@proxmox.com/T/#t [v5]: https://lore.proxmox.com/pdm-devel/c1a8deae-9590-471c-8505-d3e799bc7125@proxmox.com/T/#t [request scheduling]: https://lore.proxmox.com/pdm-devel/7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com/ proxmox-datacenter-manager: Lukas Wagner (23): metric collection: split top_entities split into separate module metric collection: save metric data to RRD in separate task metric collection: rework metric poll task metric collection: persist state after metric collection metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL metric collection: collect overdue metrics on startup/timer change metric collection: add tests for the fetch_remotes function metric collection: add test for fetch_overdue metric collection: pass rrd cache instance as function parameter metric collection: add test for rrd task metric collection: wrap rrd_cache::Cache in a struct metric collection: record remote response time in metric database metric collection: save time needed for collection run to RRD metric collection: periodically clean removed remotes from statefile api: add endpoint to trigger metric collection api: remotes: trigger immediate metric collection for newly added nodes api: add api for querying metric collection RRD data api: metric-collection: add status endpoint pdm-client: add metric collection API methods cli: add commands for metric-collection trigger and status metric collection: factor out handle_tick and handle_control_message fns metric collection: skip missed timer ticks metric collection: use JoinSet instead of joining from handles in a Vec cli/client/Cargo.toml | 1 + cli/client/src/main.rs | 2 + cli/client/src/metric_collection.rs | 70 ++ lib/pdm-api-types/src/lib.rs | 3 + lib/pdm-api-types/src/metric_collection.rs | 20 + lib/pdm-api-types/src/rrddata.rs | 26 + lib/pdm-client/src/lib.rs | 58 ++ server/src/api/metric_collection.rs | 99 +++ server/src/api/mod.rs | 2 + server/src/api/remotes.rs | 59 ++ server/src/api/resources.rs | 3 +- server/src/api/rrd_common.rs | 11 +- server/src/bin/proxmox-datacenter-api/main.rs | 2 +- .../src/metric_collection/collection_task.rs | 656 ++++++++++++++++++ server/src/metric_collection/mod.rs | 346 +++------ server/src/metric_collection/rrd_cache.rs | 204 +++--- server/src/metric_collection/rrd_task.rs | 289 ++++++++ server/src/metric_collection/state.rs | 150 ++++ server/src/metric_collection/top_entities.rs | 150 ++++ 19 files changed, 1783 insertions(+), 368 deletions(-) create mode 100644 cli/client/src/metric_collection.rs create mode 100644 lib/pdm-api-types/src/metric_collection.rs create mode 100644 server/src/api/metric_collection.rs create mode 100644 server/src/metric_collection/collection_task.rs create mode 100644 server/src/metric_collection/rrd_task.rs create mode 100644 server/src/metric_collection/state.rs create mode 100644 server/src/metric_collection/top_entities.rs Summary over all repositories: 19 files changed, 1783 insertions(+), 368 deletions(-) -- Generated by murpp 0.9.0 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel