From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Lukas Wagner <l.wagner@proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change
Date: Thu, 13 Feb 2025 09:55:28 +0100 [thread overview]
Message-ID: <kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi> (raw)
In-Reply-To: <20250211120541.163621-11-l.wagner@proxmox.com>
On Tue, Feb 11, 2025 at 01:05:26PM +0100, Lukas Wagner wrote:
> Due to the fact that the timer fires at aligned points in time and might
> now fire right away after being set up, it could happen that we get gaps
> in the data if we change the timer interval or at daemon startup.
>
> To mitigate this, on daemon startup and also if the collection interval
> changes, we
> - check if the time until the next scheduled regular collection
> plus the time *since* the last successful collection exceeds
> the configured collection interval
> - if yes, we collect immediately
> - if no, we do nothing and let the remote be collected at the
> next timer tick
>
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
> .../src/metric_collection/collection_task.rs | 72 +++++++++++++++++--
> 1 file changed, 65 insertions(+), 7 deletions(-)
>
> diff --git a/server/src/metric_collection/collection_task.rs b/server/src/metric_collection/collection_task.rs
> index b4e3207..f0742ea 100644
> --- a/server/src/metric_collection/collection_task.rs
> +++ b/server/src/metric_collection/collection_task.rs
> @@ -1,4 +1,7 @@
> -use std::{sync::Arc, time::Duration};
> +use std::{
> + sync::Arc,
> + time::{Duration, Instant},
> +};
>
> use anyhow::Error;
> use rand::Rng;
> @@ -69,13 +72,18 @@ impl MetricCollectionTask {
> /// This function does never return.
> #[tracing::instrument(skip_all, name = "metric_collection_task")]
> pub(super) async fn run(&mut self) {
> - let mut timer = Self::setup_timer(self.settings.collection_interval_or_default());
> + let (mut timer, mut next_run) =
> + Self::setup_timer(self.settings.collection_interval_or_default());
>
> log::debug!(
> "metric collection starting up. collection interval set to {} seconds",
> self.settings.collection_interval_or_default()
> );
>
> + // Check and fetch any remote which would be overdue by the time the
> + // timer first fires.
> + self.fetch_overdue_and_save_state(next_run).await;
> +
> loop {
> let old_settings = self.settings.clone();
> tokio::select! {
> @@ -124,7 +132,12 @@ impl MetricCollectionTask {
> "metric collection interval changed to {} seconds, reloading timer",
> interval
> );
> - timer = Self::setup_timer(interval);
> + (timer, next_run) = Self::setup_timer(interval);
> + // If change (and therefore reset) our timer right before it fires,
> + // we could potentially miss one collection event.
Couldn't we instead just pass `next_run` through to `setup_timer` and
call `reset_at(next_run)` on it? (`first_run` would only be used in the
initial setup, so `next_run` could either be an `Option`, or the setup
code does the `next_aligned_instant` call...
This should be much less code by making the new
`fetch_overdue{,_and_save_sate}()` functions unnecessary, or am I
missing something?
> + // Therefore fetch all remotes which would be due for metric collection before
> + // the new timer fires.
> + self.fetch_overdue_and_save_state(next_run).await;
> }
> }
> }
> @@ -208,12 +221,12 @@ impl MetricCollectionTask {
> /// Set up a [`tokio::time::Interval`] instance with the provided interval.
> /// The timer will be aligned, e.g. an interval of `60` will let the timer
> /// fire at minute boundaries.
> - fn setup_timer(interval: u64) -> Interval {
> + fn setup_timer(interval: u64) -> (Interval, Instant) {
> let mut timer = tokio::time::interval(Duration::from_secs(interval));
> - let first_run = task_utils::next_aligned_instant(interval).into();
> - timer.reset_at(first_run);
> + let first_run = task_utils::next_aligned_instant(interval);
> + timer.reset_at(first_run.into());
>
> - timer
> + (timer, first_run)
> }
>
> /// Convenience helper to load `remote.cfg`, logging the error
> @@ -292,6 +305,51 @@ impl MetricCollectionTask {
> }
> }
>
> + /// Fetch metric data from remotes which are overdue for collection and save
> + /// collection state.
> + async fn fetch_overdue_and_save_state(&mut self, next_run: Instant) {
> + if let Some(remotes) = Self::load_remote_config() {
> + self.fetch_overdue(&remotes, next_run).await;
> + if let Err(e) = self.state.save() {
> + log::error!("could not update metric collection state: {e}");
> + }
> + }
> + }
> +
> + /// Fetch metric data from remotes which are overdue for collection.
> + ///
> + /// Use this on startup of the metric collection loop as well as
> + /// when the collection interval changes.
> + ///
> + /// Does nothing if the remote config could not be read, in this case an
> + /// error is logged.
> + async fn fetch_overdue(&mut self, remotes: &SectionConfigData<Remote>, next_run: Instant) {
> + let left_until_scheduled = next_run - Instant::now();
> + let now = proxmox_time::epoch_i64();
> +
> + let mut overdue = Vec::new();
> +
> + for remote in &remotes.order {
> + let last_collection = self
> + .state
> + .get_status(remote)
> + .and_then(|s| s.last_collection)
> + .unwrap_or(0);
> +
> + let diff = now - last_collection;
> +
> + if diff + left_until_scheduled.as_secs() as i64
> + > self.settings.collection_interval_or_default() as i64
> + {
> + log::debug!(
> + "starting metric collection for remote '{remote}' - triggered because collection is overdue"
> + );
> + overdue.push(remote.clone());
> + }
> + }
> + self.fetch_remotes(remotes, &overdue).await;
> + }
> +
> /// Fetch a single remote.
> #[tracing::instrument(skip_all, fields(remote = remote.id), name = "metric_collection_task")]
> async fn fetch_single_remote(
> --
> 2.39.5
>
>
>
> _______________________________________________
> pdm-devel mailing list
> pdm-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
>
>
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next prev parent reply other threads:[~2025-02-13 8:56 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 12:05 [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/25] test support: add NamedTempFile helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/25] test support: add NamedTempDir helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/25] pdm-api-types: add CollectionSettings type Lukas Wagner
2025-02-11 14:18 ` Maximiliano Sandoval
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/25] pdm-config: add functions for reading/writing metric collection settings Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/25] metric collection: split top_entities split into separate module Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/25] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-02-12 13:59 ` Wolfgang Bumiller
2025-02-12 14:32 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/25] metric collection: rework metric poll task Lukas Wagner
2025-02-11 12:58 ` Lukas Wagner
2025-02-12 15:57 ` Wolfgang Bumiller
2025-02-13 12:31 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/25] metric collection: persist state after metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/25] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-02-13 8:55 ` Wolfgang Bumiller [this message]
2025-02-13 13:50 ` Lukas Wagner
2025-02-13 14:19 ` Wolfgang Bumiller
2025-02-13 15:21 ` Lukas Wagner
2025-02-13 15:34 ` Wolfgang Bumiller
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/25] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/25] metric collection: add test for fetch_overdue Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/25] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/25] metric collection: add test for rrd task Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/25] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 16/25] metric collection: record remote response time in metric database Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 17/25] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-02-13 11:53 ` Wolfgang Bumiller
2025-02-13 12:12 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 18/25] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 19/25] api: add endpoint for updating metric collection settings Lukas Wagner
2025-02-13 12:09 ` Wolfgang Bumiller
2025-02-13 12:15 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 20/25] api: add endpoint to trigger metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 21/25] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 22/25] api: add api for querying metric collection RRD data Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 23/25] api: metric-collection: add status endpoint Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 24/25] pdm-client: add metric collection API methods Lukas Wagner
2025-02-13 12:10 ` Wolfgang Bumiller
2025-02-13 13:52 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 25/25] cli: add commands for metric-collection settings, trigger, status Lukas Wagner
2025-02-13 12:14 ` Wolfgang Bumiller
2025-02-13 14:17 ` Lukas Wagner
2025-02-13 14:56 ` Wolfgang Bumiller
2025-02-13 14:58 ` Lukas Wagner
2025-02-13 15:11 ` Lukas Wagner
2025-02-14 13:08 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi \
--to=w.bumiller@proxmox.com \
--cc=l.wagner@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.