From: Lukas Wagner <l.wagner@proxmox.com>
To: pdm-devel@lists.proxmox.com
Subject: [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI)
Date: Tue, 11 Feb 2025 13:05:16 +0100 [thread overview]
Message-ID: <20250211120541.163621-1-l.wagner@proxmox.com> (raw)
Key points:
- fetch metrics concurrently
- configuration for metric collection
- new config /etc/proxmox-datacenter-manager/metric-collection.json
- max-concurrency (number of allowed parallel connections)
- collection-interval
- randomized offset for collection start
(min-interval-offset..max-interval-offset)
- randomized per-connection delay
(max-connection-delay..max-connection-delay)
- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
- overall collection time for all remotes
- per remote response time when fetching metrics
- Persist metric collection state to disk:
/var/lib/proxmox-datacenter-manager/metric-collection-state.json
(timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API
- Add new API endpoints
POST /metric-collection/trigger with optional 'remote' param
GET /metric-collection/status
GET/PUT /config/metric-collection/default
GET /remotes/<remote>/metric-collection-rrddata
GET /metric-collection/rrddata
- Add CLI tooling
proxmox-datacenter-client metric-collection show-settings
proxmox-datacenter-client metric-collection update-settings
proxmox-datacenter-client metric-collection trigger [--remote <remote>]
proxmox-datacenter-client metric-collection status
## To reviewers / open questions:
- Please review the defaults I've chosen for the settings, especially
the ones for the default metric collection interval (10 minutes) as
well as max-concurrency (10).
I also kindly ask to double-check the naming of the properties.
See "pdm-api-types: add CollectionSettings type" for details
- Please review path and params for new API endpoints (anything public
facing that is hard to change later)
- I've chosen a section-config config now, even though we only have a
single section for now. This was done for future-proofing reasons,
maybe we want to add support for different setting 'groups' or
something, e.g. to have different settings for distinct sets of
remotes. Does this make sense?
Or should I just stick to a simple config for now? (At moments like
these I wish for TOML configs where we could be a bit more flexible...)
collection-settings: default
max-concurrency 10
collection-interval 180
min-interval-offset 0
max-interval-offset 20
min-connection-delay 10
max-connection-delay 100
- Should `GET /remotes/<remote>/metric-collection-rrddata` be
just `rrddata`?
not sure if we are going to add any other PDM-native per-remote
metrics and whether we want to return that from the same API call
as this...
## Random offset/delay examples
Example with 'max-concurrency' = 3 and 6 remotes.
X ... timer triggered
[ A ] .... fetching remote 'A'
**** .... interval-offset (usually a couple of seconds)
#### .... random worker delay (usually in millisecond range)
/--########[ B ] ### [ C ]--\
/---####[ A ] ###### [ D ]--------\
----X ************* ---/ ---###### [ E ] #########[ F ]--\----
proxmox-datacenter-manager:
Lukas Wagner (25):
test support: add NamedTempFile helper
test support: add NamedTempDir helper
pdm-api-types: add CollectionSettings type
pdm-config: add functions for reading/writing metric collection
settings
metric collection: split top_entities split into separate module
metric collection: save metric data to RRD in separate task
metric collection: rework metric poll task
metric collection: persist state after metric collection
metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
metric collection: collect overdue metrics on startup/timer change
metric collection: add tests for the fetch_remotes function
metric collection: add test for fetch_overdue
metric collection: pass rrd cache instance as function parameter
metric collection: add test for rrd task
metric collection: wrap rrd_cache::Cache in a struct
metric collection: record remote response time in metric database
metric collection: save time needed for collection run to RRD
metric collection: periodically clean removed remotes from statefile
api: add endpoint for updating metric collection settings
api: add endpoint to trigger metric collection
api: remotes: trigger immediate metric collection for newly added
nodes
api: add api for querying metric collection RRD data
api: metric-collection: add status endpoint
pdm-client: add metric collection API methods
cli: add commands for metric-collection settings, trigger, status
Cargo.toml | 1 +
cli/client/Cargo.toml | 1 +
cli/client/src/main.rs | 2 +
cli/client/src/metric_collection.rs | 164 ++++
lib/pdm-api-types/src/lib.rs | 3 +
lib/pdm-api-types/src/metric_collection.rs | 195 +++++
lib/pdm-api-types/src/rrddata.rs | 26 +
lib/pdm-client/src/lib.rs | 87 ++
lib/pdm-config/src/lib.rs | 1 +
lib/pdm-config/src/metric_collection.rs | 69 ++
server/Cargo.toml | 1 +
server/src/api/config/metric_collection.rs | 166 ++++
server/src/api/config/mod.rs | 2 +
server/src/api/metric_collection.rs | 99 +++
server/src/api/mod.rs | 2 +
server/src/api/remotes.rs | 59 ++
server/src/api/resources.rs | 3 +-
server/src/api/rrd_common.rs | 11 +-
server/src/bin/proxmox-datacenter-api.rs | 2 +-
server/src/lib.rs | 2 +-
.../src/metric_collection/collection_task.rs | 756 ++++++++++++++++++
server/src/metric_collection/mod.rs | 333 ++------
server/src/metric_collection/rrd_cache.rs | 204 ++---
server/src/metric_collection/rrd_task.rs | 286 +++++++
server/src/metric_collection/state.rs | 152 ++++
server/src/metric_collection/top_entities.rs | 150 ++++
server/src/test_support/mod.rs | 4 +
server/src/test_support/temp.rs | 60 ++
28 files changed, 2479 insertions(+), 362 deletions(-)
create mode 100644 cli/client/src/metric_collection.rs
create mode 100644 lib/pdm-api-types/src/metric_collection.rs
create mode 100644 lib/pdm-config/src/metric_collection.rs
create mode 100644 server/src/api/config/metric_collection.rs
create mode 100644 server/src/api/metric_collection.rs
create mode 100644 server/src/metric_collection/collection_task.rs
create mode 100644 server/src/metric_collection/rrd_task.rs
create mode 100644 server/src/metric_collection/state.rs
create mode 100644 server/src/metric_collection/top_entities.rs
create mode 100644 server/src/test_support/temp.rs
Summary over all repositories:
28 files changed, 2479 insertions(+), 362 deletions(-)
--
Generated by git-murpp 0.8.0
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next reply other threads:[~2025-02-11 12:06 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 12:05 Lukas Wagner [this message]
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/25] test support: add NamedTempFile helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/25] test support: add NamedTempDir helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/25] pdm-api-types: add CollectionSettings type Lukas Wagner
2025-02-11 14:18 ` Maximiliano Sandoval
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/25] pdm-config: add functions for reading/writing metric collection settings Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/25] metric collection: split top_entities split into separate module Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/25] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-02-12 13:59 ` Wolfgang Bumiller
2025-02-12 14:32 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/25] metric collection: rework metric poll task Lukas Wagner
2025-02-11 12:58 ` Lukas Wagner
2025-02-12 15:57 ` Wolfgang Bumiller
2025-02-13 12:31 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/25] metric collection: persist state after metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/25] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-02-13 8:55 ` Wolfgang Bumiller
2025-02-13 13:50 ` Lukas Wagner
2025-02-13 14:19 ` Wolfgang Bumiller
2025-02-13 15:21 ` Lukas Wagner
2025-02-13 15:34 ` Wolfgang Bumiller
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/25] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/25] metric collection: add test for fetch_overdue Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/25] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/25] metric collection: add test for rrd task Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/25] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 16/25] metric collection: record remote response time in metric database Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 17/25] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-02-13 11:53 ` Wolfgang Bumiller
2025-02-13 12:12 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 18/25] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 19/25] api: add endpoint for updating metric collection settings Lukas Wagner
2025-02-13 12:09 ` Wolfgang Bumiller
2025-02-13 12:15 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 20/25] api: add endpoint to trigger metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 21/25] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 22/25] api: add api for querying metric collection RRD data Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 23/25] api: metric-collection: add status endpoint Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 24/25] pdm-client: add metric collection API methods Lukas Wagner
2025-02-13 12:10 ` Wolfgang Bumiller
2025-02-13 13:52 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 25/25] cli: add commands for metric-collection settings, trigger, status Lukas Wagner
2025-02-13 12:14 ` Wolfgang Bumiller
2025-02-13 14:17 ` Lukas Wagner
2025-02-13 14:56 ` Wolfgang Bumiller
2025-02-13 14:58 ` Lukas Wagner
2025-02-13 15:11 ` Lukas Wagner
2025-02-14 13:08 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211120541.163621-1-l.wagner@proxmox.com \
--to=l.wagner@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.