public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: pdm-devel@lists.proxmox.com
Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI)
Date: Tue, 26 Aug 2025 15:50:55 +0200	[thread overview]
Message-ID: <20250826135119.336510-1-l.wagner@proxmox.com> (raw)

Key points:
- fetch metrics concurrently
- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
  - overall collection time for all remotes
  - per remote response time when fetching metrics
- Persist metric collection state to disk:
  /var/lib/proxmox-datacenter-manager/metric-collection-state.json
  (timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API

- Add new API endpoints
	POST     /metric-collection/trigger with optional 'remote' param
	GET      /metric-collection/status
	GET      /remotes/<remote>/rrddata
	GET      /nodes/localhost/rrddata

- Add CLI tooling
	proxmox-datacenter-client metric-collection trigger [--remote <remote>]
	proxmox-datacenter-client metric-collection status


## Potential future work
- UI button for triggering metric collection
- Make collection interval configurable
- Show RRD graphs for metric collection stats somewhere
- Have some global concurrency control knob for background
  requests [request scheduling].

Changes since [v6]:
  - Changed API paths for RRD data
     /nodes/localhost/rrdata includes the total time needed for metric collection
     /remotes/{remote}/rrddata includes the api-response time for collection a single remote
  - Request latest metric for a single remote when requesting 'hourly' RRD stats
    for remote nodes, VMs, CTs
  - Folded in some fixup patches at the end (not all, some led to too many conflicts)

Changes since [v5]:
  - Rebased onto latest master

Changes since [v4]:
  - Drop metric collection config file for now - 
    these might better be stored together with config for other
    background tasks

Changes since [v3]:
  - Rebase onto master
  - Fix a couple clippy warnings (CreateOptions is now Copy!)

Changes since [v2]:
  - For now, drop settings that might change any way with a
    global background request scheduling system [request scheduling]:
       - max-concurrency
       - {min,max}-interval-offset
       - {min,max}-connection-delay

Changes since [v1]:
  - add missing dependency to librust-rand-dev to d/control
  - Fix a couple of minor spelling/punctuation issues (thx maximiliano)
  - Some minor code style improvments, e.g. using unwrap_or_else instead
    of doing a manual match
  - Document return values of 'setup_timer' function
  - Factor out handle_tick/handle_control_message
  - Minor refatoring/code style improvments
  - CLI: Change 'update-settings' to 'settings update'
  - CLI: Change 'show-settings' to 'settings show'
  - change missed tick behavior for tokio::time::Interval to 'skip'
    instead of burst.

The last three commits are new in v2.

[v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t
[v2]: https://lore.proxmox.com/pdm-devel/20250214130653.283012-1-l.wagner@proxmox.com/
[v3]: https://lore.proxmox.com/pdm-devel/20250416125642.291552-1-l.wagner@proxmox.com/T/#t
[v4]: https://lore.proxmox.com/pdm-devel/20250512133725.262263-1-l.wagner@proxmox.com/T/#t
[v5]: https://lore.proxmox.com/pdm-devel/c1a8deae-9590-471c-8505-d3e799bc7125@proxmox.com/T/#t
[request scheduling]: https://lore.proxmox.com/pdm-devel/7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com/


proxmox-datacenter-manager:

Lukas Wagner (24):
  metric collection: split top_entities split into separate module
  metric collection: save metric data to RRD in separate task
  metric collection: rework metric poll task
  metric collection: persist state after metric collection
  metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
  metric collection: collect overdue metrics on startup/timer change
  metric collection: add tests for the fetch_remotes function
  metric collection: add test for fetch_overdue
  metric collection: pass rrd cache instance as function parameter
  metric collection: add test for rrd task
  metric collection: wrap rrd_cache::Cache in a struct
  metric collection: record remote response time in metric database
  metric collection: save time needed for collection run to RRD
  metric collection: periodically clean removed remotes from statefile
  api: add endpoint to trigger metric collection
  api: remotes: trigger immediate metric collection for newly added
    nodes
  api: add api for querying metric collection RRD data
  api: metric-collection: add status endpoint
  pdm-client: add metric collection API methods
  cli: add commands for metric-collection trigger and status
  metric collection: skip missed timer ticks
  metric collection: use JoinSet instead of joining from handles in a
    Vec
  metric collection: allow to wait until completion when triggering
    collection manually
  api: pve: rrd: trigger and wait for metric collection when requesting
    RRD data

 cli/client/Cargo.toml                         |   1 +
 cli/client/src/main.rs                        |   2 +
 cli/client/src/metric_collection.rs           |  70 ++
 lib/pdm-api-types/src/lib.rs                  |   3 +
 lib/pdm-api-types/src/metric_collection.rs    |  20 +
 lib/pdm-api-types/src/rrddata.rs              |  26 +
 lib/pdm-client/src/lib.rs                     |  56 ++
 server/src/api/metric_collection.rs           |  46 ++
 server/src/api/mod.rs                         |   2 +
 server/src/api/nodes/mod.rs                   |   2 +
 server/src/api/nodes/rrddata.rs               |  48 ++
 server/src/api/pve/rrddata.rs                 |  43 +-
 server/src/api/remotes.rs                     |  59 ++
 server/src/api/resources.rs                   |   3 +-
 server/src/api/rrd_common.rs                  |  11 +-
 server/src/bin/proxmox-datacenter-api/main.rs |   2 +-
 .../src/metric_collection/collection_task.rs  | 661 ++++++++++++++++++
 server/src/metric_collection/mod.rs           | 346 +++------
 server/src/metric_collection/rrd_cache.rs     | 206 +++---
 server/src/metric_collection/rrd_task.rs      | 289 ++++++++
 server/src/metric_collection/state.rs         | 150 ++++
 server/src/metric_collection/top_entities.rs  | 150 ++++
 22 files changed, 1817 insertions(+), 379 deletions(-)
 create mode 100644 cli/client/src/metric_collection.rs
 create mode 100644 lib/pdm-api-types/src/metric_collection.rs
 create mode 100644 server/src/api/metric_collection.rs
 create mode 100644 server/src/api/nodes/rrddata.rs
 create mode 100644 server/src/metric_collection/collection_task.rs
 create mode 100644 server/src/metric_collection/rrd_task.rs
 create mode 100644 server/src/metric_collection/state.rs
 create mode 100644 server/src/metric_collection/top_entities.rs


Summary over all repositories:
  22 files changed, 1817 insertions(+), 379 deletions(-)

-- 
Generated by murpp 0.9.0


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel


             reply	other threads:[~2025-08-26 13:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 13:50 Lukas Wagner [this message]
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 01/24] metric collection: split top_entities split into separate module Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 02/24] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 03/24] metric collection: rework metric poll task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 04/24] metric collection: persist state after metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 05/24] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 06/24] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 07/24] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 08/24] metric collection: add test for fetch_overdue Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 09/24] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 10/24] metric collection: add test for rrd task Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 11/24] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 12/24] metric collection: record remote response time in metric database Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 13/24] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 14/24] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 15/24] api: add endpoint to trigger metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 16/24] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 17/24] api: add api for querying metric collection RRD data Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 18/24] api: metric-collection: add status endpoint Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 19/24] pdm-client: add metric collection API methods Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 20/24] cli: add commands for metric-collection trigger and status Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 21/24] metric collection: skip missed timer ticks Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 22/24] metric collection: use JoinSet instead of joining from handles in a Vec Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 23/24] metric collection: allow to wait until completion when triggering collection manually Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 24/24] api: pve: rrd: trigger and wait for metric collection when requesting RRD data Lukas Wagner
2025-08-28 19:37 ` [pdm-devel] applied: [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI) Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250826135119.336510-1-l.wagner@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal