From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pdm-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 71CB61FF1B7 for <inbox@lore.proxmox.com>; Fri, 14 Feb 2025 14:07:43 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 12C8319D26; Fri, 14 Feb 2025 14:07:44 +0100 (CET) From: Lukas Wagner <l.wagner@proxmox.com> To: pdm-devel@lists.proxmox.com Date: Fri, 14 Feb 2025 14:06:25 +0100 Message-Id: <20250214130653.283012-1-l.wagner@proxmox.com> X-Mailer: git-send-email 2.39.5 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.140 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment POISEN_SPAM_PILL 0.1 Meta: its spam POISEN_SPAM_PILL_1 0.1 random spam to be learned in bayes POISEN_SPAM_PILL_3 0.1 random spam to be learned in bayes SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v2 00/28] metric collection improvements (concurrency, config, API, CLI) X-BeenThere: pdm-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Datacenter Manager development discussion <pdm-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/> List-Post: <mailto:pdm-devel@lists.proxmox.com> List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Datacenter Manager development discussion <pdm-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pdm-devel-bounces@lists.proxmox.com Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com> Key points: - fetch metrics concurrently - configuration for metric collection - new config /etc/proxmox-datacenter-manager/metric-collection.json - max-concurrency (number of allowed parallel connections) - collection-interval - randomized offset for collection start (min-interval-offset..max-interval-offset) - randomized per-connection delay (max-connection-delay..max-connection-delay) - Add some tests for the core logic in the metric collection system - Allow to trigger metric collection via the API - Record metric collection statistics in the RRD - overall collection time for all remotes - per remote response time when fetching metrics - Persist metric collection state to disk: /var/lib/proxmox-datacenter-manager/metric-collection-state.json (timestamps of last collection, errors) - Trigger metric collection for any new remotes added via the API - Add new API endpoints POST /metric-collection/trigger with optional 'remote' param GET /metric-collection/status GET/PUT /config/metric-collection/default GET /remotes/<remote>/metric-collection-rrddata GET /metric-collection/rrddata - Add CLI tooling proxmox-datacenter-client metric-collection settings show proxmox-datacenter-client metric-collection settings update proxmox-datacenter-client metric-collection trigger [--remote <remote>] proxmox-datacenter-client metric-collection status ## To reviewers / open questions: - Please review the defaults I've chosen for the settings, especially the ones for the default metric collection interval (10 minutes) as well as max-concurrency (10). I also kindly ask to double-check the naming of the properties. See "pdm-api-types: add CollectionSettings type" for details - Please review path and params for new API endpoints (anything public facing that is hard to change later) - I've chosen a section-config config now, even though we only have a single section for now. This was done for future-proofing reasons, maybe we want to add support for different setting 'groups' or something, e.g. to have different settings for distinct sets of remotes. Does this make sense? Or should I just stick to a simple config for now? (At moments like these I wish for TOML configs where we could be a bit more flexible...) collection-settings: default max-concurrency 10 collection-interval 180 min-interval-offset 0 max-interval-offset 20 min-connection-delay 10 max-connection-delay 100 - Should `GET /remotes/<remote>/metric-collection-rrddata` be just `rrddata`? not sure if we are going to add any other PDM-native per-remote metrics and whether we want to return that from the same API call as this... ## Potential future work - UI button for triggering metric collection - UI for metric collection settings - Show RRD graphs for metric collection stats somewhere ## Random offset/delay examples Example with 'max-concurrency' = 3 and 6 remotes. X ... timer triggered [ A ] .... fetching remote 'A' **** .... interval-offset (usually a couple of seconds) #### .... random worker delay (usually in millisecond range) /--########[ B ] ### [ C ]--\ /---####[ A ] ###### [ D ]--------\ ----X ************* ---/ ---###### [ E ] #########[ F ]--\---- Changes since [v1]: - add missing dependency to librust-rand-dev to d/control - Fix a couple of minor spelling/punctuation issues (thx maximiliano) - Some minor code style improvments, e.g. using unwrap_or_else instead of doing a manual match - Document return values of 'setup_timer' function - Factor out handle_tick/handle_control_message - Minor refatoring/code style improvments - CLI: Change 'update-settings' to 'settings update' - CLI: Change 'show-settings' to 'settings show' - change missed tick behavior for tokio::time::Interval to 'skip' instead of burst. The last three commits are new in v2. [v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t proxmox-datacenter-manager: Lukas Wagner (28): test support: add NamedTempFile helper test support: add NamedTempDir helper pdm-api-types: add CollectionSettings type pdm-config: add functions for reading/writing metric collection settings metric collection: split top_entities split into separate module metric collection: save metric data to RRD in separate task metric collection: rework metric poll task metric collection: persist state after metric collection metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL metric collection: collect overdue metrics on startup/timer change metric collection: add tests for the fetch_remotes function metric collection: add test for fetch_overdue metric collection: pass rrd cache instance as function parameter metric collection: add test for rrd task metric collection: wrap rrd_cache::Cache in a struct metric collection: record remote response time in metric database metric collection: save time needed for collection run to RRD metric collection: periodically clean removed remotes from statefile api: add endpoint for updating metric collection settings api: add endpoint to trigger metric collection api: remotes: trigger immediate metric collection for newly added nodes api: add api for querying metric collection RRD data api: metric-collection: add status endpoint pdm-client: add metric collection API methods cli: add commands for metric-collection settings, trigger, status metric collection: factor out handle_tick and handle_control_message fns metric collection: skip missed timer ticks metric collection: use JoinSet instead of joining from handles in a Vec Cargo.toml | 1 + cli/client/Cargo.toml | 1 + cli/client/src/main.rs | 2 + cli/client/src/metric_collection.rs | 170 ++++ debian/control | 1 + lib/pdm-api-types/src/lib.rs | 3 + lib/pdm-api-types/src/metric_collection.rs | 203 +++++ lib/pdm-api-types/src/rrddata.rs | 26 + lib/pdm-client/src/lib.rs | 87 +++ lib/pdm-config/src/lib.rs | 1 + lib/pdm-config/src/metric_collection.rs | 69 ++ server/Cargo.toml | 1 + server/src/api/config/metric_collection.rs | 156 ++++ server/src/api/config/mod.rs | 2 + server/src/api/metric_collection.rs | 99 +++ server/src/api/mod.rs | 2 + server/src/api/remotes.rs | 59 ++ server/src/api/resources.rs | 3 +- server/src/api/rrd_common.rs | 11 +- server/src/bin/proxmox-datacenter-api.rs | 2 +- server/src/lib.rs | 2 +- .../src/metric_collection/collection_task.rs | 739 ++++++++++++++++++ server/src/metric_collection/mod.rs | 332 ++------ server/src/metric_collection/rrd_cache.rs | 204 ++--- server/src/metric_collection/rrd_task.rs | 289 +++++++ server/src/metric_collection/state.rs | 150 ++++ server/src/metric_collection/top_entities.rs | 150 ++++ server/src/test_support/mod.rs | 4 + server/src/test_support/temp.rs | 60 ++ 29 files changed, 2467 insertions(+), 362 deletions(-) create mode 100644 cli/client/src/metric_collection.rs create mode 100644 lib/pdm-api-types/src/metric_collection.rs create mode 100644 lib/pdm-config/src/metric_collection.rs create mode 100644 server/src/api/config/metric_collection.rs create mode 100644 server/src/api/metric_collection.rs create mode 100644 server/src/metric_collection/collection_task.rs create mode 100644 server/src/metric_collection/rrd_task.rs create mode 100644 server/src/metric_collection/state.rs create mode 100644 server/src/metric_collection/top_entities.rs create mode 100644 server/src/test_support/temp.rs Summary over all repositories: 29 files changed, 2467 insertions(+), 362 deletions(-) -- Generated by git-murpp 0.8.0 _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel