all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: pdm-devel@lists.proxmox.com
Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI)
Date: Tue, 26 Aug 2025 15:50:55 +0200	[thread overview]
Message-ID: <20250826135119.336510-1-l.wagner@proxmox.com> (raw)

Key points:
- fetch metrics concurrently
- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
  - overall collection time for all remotes
  - per remote response time when fetching metrics
- Persist metric collection state to disk:
  /var/lib/proxmox-datacenter-manager/metric-collection-state.json
  (timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API

- Add new API endpoints
	POST     /metric-collection/trigger with optional 'remote' param
	GET      /metric-collection/status
	GET      /remotes/<remote>/rrddata
	GET      /nodes/localhost/rrddata

- Add CLI tooling
	proxmox-datacenter-client metric-collection trigger [--remote <remote>]
	proxmox-datacenter-client metric-collection status


## Potential future work
- UI button for triggering metric collection
- Make collection interval configurable
- Show RRD graphs for metric collection stats somewhere
- Have some global concurrency control knob for background
  requests [request scheduling].

Changes since [v6]:
  - Changed API paths for RRD data
     /nodes/localhost/rrdata includes the total time needed for metric collection
     /remotes/{remote}/rrddata includes the api-response time for collection a single remote
  - Request latest metric for a single remote when requesting 'hourly' RRD stats
    for remote nodes, VMs, CTs
  - Folded in some fixup patches at the end (not all, some led to too many conflicts)

Changes since [v5]:
  - Rebased onto latest master

Changes since [v4]:
  - Drop metric collection config file for now - 
    these might better be stored together with config for other
    background tasks

Changes since [v3]:
  - Rebase onto master
  - Fix a couple clippy warnings (CreateOptions is now Copy!)

Changes since [v2]:
  - For now, drop settings that might change any way with a
    global background request scheduling system [request scheduling]:
       - max-concurrency
       - {min,max}-interval-offset
       - {min,max}-connection-delay

Changes since [v1]:
  - add missing dependency to librust-rand-dev to d/control
  - Fix a couple of minor spelling/punctuation issues (thx maximiliano)
  - Some minor code style improvments, e.g. using unwrap_or_else instead
    of doing a manual match
  - Document return values of 'setup_timer' function
  - Factor out handle_tick/handle_control_message
  - Minor refatoring/code style improvments
  - CLI: Change 'update-settings' to 'settings update'
  - CLI: Change 'show-settings' to 'settings show'
  - change missed tick behavior for tokio::time::Interval to 'skip'
    instead of burst.

The last three commits are new in v2.

[v1]: https://lore.proxmox.com/pdm-devel/20250211120541.163621-1-l.wagner@proxmox.com/T/#t
[v2]: https://lore.proxmox.com/pdm-devel/20250214130653.283012-1-l.wagner@proxmox.com/
[v3]: https://lore.proxmox.com/pdm-devel/20250416125642.291552-1-l.wagner@proxmox.com/T/#t
[v4]: https://lore.proxmox.com/pdm-devel/20250512133725.262263-1-l.wagner@proxmox.com/T/#t
[v5]: https://lore.proxmox.com/pdm-devel/c1a8deae-9590-471c-8505-d3e799bc7125@proxmox.com/T/#t
[request scheduling]: https://lore.proxmox.com/pdm-devel/7b3e90c8-6ebb-400f-acf9-cac084cc39fe@proxmox.com/


proxmox-datacenter-manager:

Lukas Wagner (24):
  metric collection: split top_entities split into separate module
  metric collection: save metric data to RRD in separate task
  metric collection: rework metric poll task
  metric collection: persist state after metric collection
  metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
  metric collection: collect overdue metrics on startup/timer change
  metric collection: add tests for the fetch_remotes function
  metric collection: add test for fetch_overdue
  metric collection: pass rrd cache instance as function parameter
  metric collection: add test for rrd task
  metric collection: wrap rrd_cache::Cache in a struct
  metric collection: record remote response time in metric database
  metric collection: save time needed for collection run to RRD
  metric collection: periodically clean removed remotes from statefile
  api: add endpoint to trigger metric collection
  api: remotes: trigger immediate metric collection for newly added
    nodes
  api: add api for querying metric collection RRD data
  api: metric-collection: add status endpoint
  pdm-client: add metric collection API methods
  cli: add commands for metric-collection trigger and status
  metric collection: skip missed timer ticks
  metric collection: use JoinSet instead of joining from handles in a
    Vec
  metric collection: allow to wait until completion when triggering
    collection manually
  api: pve: rrd: trigger and wait for metric collection when requesting
    RRD data

 cli/client/Cargo.toml                         |   1 +
 cli/client/src/main.rs                        |   2 +
 cli/client/src/metric_collection.rs           |  70 ++
 lib/pdm-api-types/src/lib.rs                  |   3 +
 lib/pdm-api-types/src/metric_collection.rs    |  20 +
 lib/pdm-api-types/src/rrddata.rs              |  26 +
 lib/pdm-client/src/lib.rs                     |  56 ++
 server/src/api/metric_collection.rs           |  46 ++
 server/src/api/mod.rs                         |   2 +
 server/src/api/nodes/mod.rs                   |   2 +
 server/src/api/nodes/rrddata.rs               |  48 ++
 server/src/api/pve/rrddata.rs                 |  43 +-
 server/src/api/remotes.rs                     |  59 ++
 server/src/api/resources.rs                   |   3 +-
 server/src/api/rrd_common.rs                  |  11 +-
 server/src/bin/proxmox-datacenter-api/main.rs |   2 +-
 .../src/metric_collection/collection_task.rs  | 661 ++++++++++++++++++
 server/src/metric_collection/mod.rs           | 346 +++------
 server/src/metric_collection/rrd_cache.rs     | 206 +++---
 server/src/metric_collection/rrd_task.rs      | 289 ++++++++
 server/src/metric_collection/state.rs         | 150 ++++
 server/src/metric_collection/top_entities.rs  | 150 ++++
 22 files changed, 1817 insertions(+), 379 deletions(-)
 create mode 100644 cli/client/src/metric_collection.rs
 create mode 100644 lib/pdm-api-types/src/metric_collection.rs
 create mode 100644 server/src/api/metric_collection.rs
 create mode 100644 server/src/api/nodes/rrddata.rs
 create mode 100644 server/src/metric_collection/collection_task.rs
 create mode 100644 server/src/metric_collection/rrd_task.rs
 create mode 100644 server/src/metric_collection/state.rs
 create mode 100644 server/src/metric_collection/top_entities.rs


Summary over all repositories:
  22 files changed, 1817 insertions(+), 379 deletions(-)

-- 
Generated by murpp 0.9.0


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel


             reply	other threads:[~2025-08-26 13:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 13:50 Lukas Wagner [this message]
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 01/24] metric collection: split top_entities split into separate module Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 02/24] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 03/24] metric collection: rework metric poll task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 04/24] metric collection: persist state after metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 05/24] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 06/24] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 07/24] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 08/24] metric collection: add test for fetch_overdue Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 09/24] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 10/24] metric collection: add test for rrd task Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 11/24] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 12/24] metric collection: record remote response time in metric database Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 13/24] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 14/24] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 15/24] api: add endpoint to trigger metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 16/24] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 17/24] api: add api for querying metric collection RRD data Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 18/24] api: metric-collection: add status endpoint Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 19/24] pdm-client: add metric collection API methods Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 20/24] cli: add commands for metric-collection trigger and status Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 21/24] metric collection: skip missed timer ticks Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 22/24] metric collection: use JoinSet instead of joining from handles in a Vec Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 23/24] metric collection: allow to wait until completion when triggering collection manually Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 24/24] api: pve: rrd: trigger and wait for metric collection when requesting RRD data Lukas Wagner
2025-08-28 19:37 ` [pdm-devel] applied: [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI) Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250826135119.336510-1-l.wagner@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal