public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: pdm-devel@lists.proxmox.com
Subject: [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI)
Date: Tue, 11 Feb 2025 13:05:16 +0100	[thread overview]
Message-ID: <20250211120541.163621-1-l.wagner@proxmox.com> (raw)

Key points:
- fetch metrics concurrently
- configuration for metric collection
  - new config /etc/proxmox-datacenter-manager/metric-collection.json
      - max-concurrency (number of allowed parallel connections)
      - collection-interval
      - randomized offset for collection start
         (min-interval-offset..max-interval-offset)
      - randomized per-connection delay
         (max-connection-delay..max-connection-delay)
- Add some tests for the core logic in the metric collection system
- Allow to trigger metric collection via the API
- Record metric collection statistics in the RRD
  - overall collection time for all remotes
  - per remote response time when fetching metrics
- Persist metric collection state to disk:
  /var/lib/proxmox-datacenter-manager/metric-collection-state.json
  (timestamps of last collection, errors)
- Trigger metric collection for any new remotes added via the API

- Add new API endpoints
	POST     /metric-collection/trigger with optional 'remote' param
	GET      /metric-collection/status
	GET/PUT  /config/metric-collection/default
	GET      /remotes/<remote>/metric-collection-rrddata
	GET      /metric-collection/rrddata

- Add CLI tooling
	proxmox-datacenter-client metric-collection show-settings
	proxmox-datacenter-client metric-collection update-settings
	proxmox-datacenter-client metric-collection trigger [--remote <remote>]
	proxmox-datacenter-client metric-collection status


## To reviewers / open questions:
- Please review the defaults I've chosen for the settings, especially
  the ones for the default metric collection interval (10 minutes) as
  well as max-concurrency (10).
  I also kindly ask to double-check the naming of the properties.
  See "pdm-api-types: add CollectionSettings type" for details

- Please review path and params for new API endpoints (anything public
  facing that is hard to change later)

- I've chosen a section-config config now, even though we only have a
  single section for now. This was done for future-proofing reasons,
  maybe we want to add support for different setting 'groups' or
  something, e.g. to have different settings for distinct sets of
  remotes. Does this make sense?
  Or should I just stick to a simple config for now? (At moments like
  these I wish for TOML configs where we could be a bit more flexible...)

	collection-settings: default
	    max-concurrency 10
	    collection-interval 180
	    min-interval-offset 0
	    max-interval-offset 20
	    min-connection-delay 10
	    max-connection-delay 100


- Should `GET /remotes/<remote>/metric-collection-rrddata` be 
  just `rrddata`?
  not sure if we are going to add any other PDM-native per-remote
  metrics and whether we want to return that from the same API call
  as this...

## Random offset/delay examples
Example with 'max-concurrency' = 3 and 6 remotes.

    X ... timer triggered
    [ A ] .... fetching remote 'A'
    **** .... interval-offset     (usually a couple of seconds)
    #### .... random worker delay (usually in millisecond range)

                         /--########[  B    ] ### [  C  ]--\
                        /---####[  A  ] ###### [ D ]--------\
----X ************* ---/ ---###### [  E  ] #########[  F  ]--\----

proxmox-datacenter-manager:

Lukas Wagner (25):
  test support: add NamedTempFile helper
  test support: add NamedTempDir helper
  pdm-api-types: add CollectionSettings type
  pdm-config: add functions for reading/writing metric collection
    settings
  metric collection: split top_entities split into separate module
  metric collection: save metric data to RRD in separate task
  metric collection: rework metric poll task
  metric collection: persist state after metric collection
  metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL
  metric collection: collect overdue metrics on startup/timer change
  metric collection: add tests for the fetch_remotes function
  metric collection: add test for fetch_overdue
  metric collection: pass rrd cache instance as function parameter
  metric collection: add test for rrd task
  metric collection: wrap rrd_cache::Cache in a struct
  metric collection: record remote response time in metric database
  metric collection: save time needed for collection run to RRD
  metric collection: periodically clean removed remotes from statefile
  api: add endpoint for updating metric collection settings
  api: add endpoint to trigger metric collection
  api: remotes: trigger immediate metric collection for newly added
    nodes
  api: add api for querying metric collection RRD data
  api: metric-collection: add status endpoint
  pdm-client: add metric collection API methods
  cli: add commands for metric-collection settings, trigger, status

 Cargo.toml                                    |   1 +
 cli/client/Cargo.toml                         |   1 +
 cli/client/src/main.rs                        |   2 +
 cli/client/src/metric_collection.rs           | 164 ++++
 lib/pdm-api-types/src/lib.rs                  |   3 +
 lib/pdm-api-types/src/metric_collection.rs    | 195 +++++
 lib/pdm-api-types/src/rrddata.rs              |  26 +
 lib/pdm-client/src/lib.rs                     |  87 ++
 lib/pdm-config/src/lib.rs                     |   1 +
 lib/pdm-config/src/metric_collection.rs       |  69 ++
 server/Cargo.toml                             |   1 +
 server/src/api/config/metric_collection.rs    | 166 ++++
 server/src/api/config/mod.rs                  |   2 +
 server/src/api/metric_collection.rs           |  99 +++
 server/src/api/mod.rs                         |   2 +
 server/src/api/remotes.rs                     |  59 ++
 server/src/api/resources.rs                   |   3 +-
 server/src/api/rrd_common.rs                  |  11 +-
 server/src/bin/proxmox-datacenter-api.rs      |   2 +-
 server/src/lib.rs                             |   2 +-
 .../src/metric_collection/collection_task.rs  | 756 ++++++++++++++++++
 server/src/metric_collection/mod.rs           | 333 ++------
 server/src/metric_collection/rrd_cache.rs     | 204 ++---
 server/src/metric_collection/rrd_task.rs      | 286 +++++++
 server/src/metric_collection/state.rs         | 152 ++++
 server/src/metric_collection/top_entities.rs  | 150 ++++
 server/src/test_support/mod.rs                |   4 +
 server/src/test_support/temp.rs               |  60 ++
 28 files changed, 2479 insertions(+), 362 deletions(-)
 create mode 100644 cli/client/src/metric_collection.rs
 create mode 100644 lib/pdm-api-types/src/metric_collection.rs
 create mode 100644 lib/pdm-config/src/metric_collection.rs
 create mode 100644 server/src/api/config/metric_collection.rs
 create mode 100644 server/src/api/metric_collection.rs
 create mode 100644 server/src/metric_collection/collection_task.rs
 create mode 100644 server/src/metric_collection/rrd_task.rs
 create mode 100644 server/src/metric_collection/state.rs
 create mode 100644 server/src/metric_collection/top_entities.rs
 create mode 100644 server/src/test_support/temp.rs


Summary over all repositories:
  28 files changed, 2479 insertions(+), 362 deletions(-)

-- 
Generated by git-murpp 0.8.0


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel


             reply	other threads:[~2025-02-11 12:06 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 12:05 Lukas Wagner [this message]
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/25] test support: add NamedTempFile helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/25] test support: add NamedTempDir helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/25] pdm-api-types: add CollectionSettings type Lukas Wagner
2025-02-11 14:18   ` Maximiliano Sandoval
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/25] pdm-config: add functions for reading/writing metric collection settings Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/25] metric collection: split top_entities split into separate module Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/25] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-02-12 13:59   ` Wolfgang Bumiller
2025-02-12 14:32     ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/25] metric collection: rework metric poll task Lukas Wagner
2025-02-11 12:58   ` Lukas Wagner
2025-02-12 15:57   ` Wolfgang Bumiller
2025-02-13 12:31     ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/25] metric collection: persist state after metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/25] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-02-13  8:55   ` Wolfgang Bumiller
2025-02-13 13:50     ` Lukas Wagner
2025-02-13 14:19       ` Wolfgang Bumiller
2025-02-13 15:21         ` Lukas Wagner
2025-02-13 15:34           ` Wolfgang Bumiller
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/25] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/25] metric collection: add test for fetch_overdue Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/25] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/25] metric collection: add test for rrd task Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/25] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 16/25] metric collection: record remote response time in metric database Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 17/25] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-02-13 11:53   ` Wolfgang Bumiller
2025-02-13 12:12     ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 18/25] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 19/25] api: add endpoint for updating metric collection settings Lukas Wagner
2025-02-13 12:09   ` Wolfgang Bumiller
2025-02-13 12:15     ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 20/25] api: add endpoint to trigger metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 21/25] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 22/25] api: add api for querying metric collection RRD data Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 23/25] api: metric-collection: add status endpoint Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 24/25] pdm-client: add metric collection API methods Lukas Wagner
2025-02-13 12:10   ` Wolfgang Bumiller
2025-02-13 13:52     ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 25/25] cli: add commands for metric-collection settings, trigger, status Lukas Wagner
2025-02-13 12:14   ` Wolfgang Bumiller
2025-02-13 14:17     ` Lukas Wagner
2025-02-13 14:56       ` Wolfgang Bumiller
2025-02-13 14:58         ` Lukas Wagner
2025-02-13 15:11           ` Lukas Wagner
2025-02-14 13:08 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250211120541.163621-1-l.wagner@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal