From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Lukas Wagner <l.wagner@proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change
Date: Thu, 13 Feb 2025 09:55:28 +0100 [thread overview]
Message-ID: <kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi> (raw)
In-Reply-To: <20250211120541.163621-11-l.wagner@proxmox.com>
On Tue, Feb 11, 2025 at 01:05:26PM +0100, Lukas Wagner wrote:
> Due to the fact that the timer fires at aligned points in time and might
> now fire right away after being set up, it could happen that we get gaps
> in the data if we change the timer interval or at daemon startup.
>
> To mitigate this, on daemon startup and also if the collection interval
> changes, we
> - check if the time until the next scheduled regular collection
> plus the time *since* the last successful collection exceeds
> the configured collection interval
> - if yes, we collect immediately
> - if no, we do nothing and let the remote be collected at the
> next timer tick
>
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
> .../src/metric_collection/collection_task.rs | 72 +++++++++++++++++--
> 1 file changed, 65 insertions(+), 7 deletions(-)
>
> diff --git a/server/src/metric_collection/collection_task.rs b/server/src/metric_collection/collection_task.rs
> index b4e3207..f0742ea 100644
> --- a/server/src/metric_collection/collection_task.rs
> +++ b/server/src/metric_collection/collection_task.rs
> @@ -1,4 +1,7 @@
> -use std::{sync::Arc, time::Duration};
> +use std::{
> + sync::Arc,
> + time::{Duration, Instant},
> +};
>
> use anyhow::Error;
> use rand::Rng;
> @@ -69,13 +72,18 @@ impl MetricCollectionTask {
> /// This function does never return.
> #[tracing::instrument(skip_all, name = "metric_collection_task")]
> pub(super) async fn run(&mut self) {
> - let mut timer = Self::setup_timer(self.settings.collection_interval_or_default());
> + let (mut timer, mut next_run) =
> + Self::setup_timer(self.settings.collection_interval_or_default());
>
> log::debug!(
> "metric collection starting up. collection interval set to {} seconds",
> self.settings.collection_interval_or_default()
> );
>
> + // Check and fetch any remote which would be overdue by the time the
> + // timer first fires.
> + self.fetch_overdue_and_save_state(next_run).await;
> +
> loop {
> let old_settings = self.settings.clone();
> tokio::select! {
> @@ -124,7 +132,12 @@ impl MetricCollectionTask {
> "metric collection interval changed to {} seconds, reloading timer",
> interval
> );
> - timer = Self::setup_timer(interval);
> + (timer, next_run) = Self::setup_timer(interval);
> + // If change (and therefore reset) our timer right before it fires,
> + // we could potentially miss one collection event.
Couldn't we instead just pass `next_run` through to `setup_timer` and
call `reset_at(next_run)` on it? (`first_run` would only be used in the
initial setup, so `next_run` could either be an `Option`, or the setup
code does the `next_aligned_instant` call...
This should be much less code by making the new
`fetch_overdue{,_and_save_sate}()` functions unnecessary, or am I
missing something?
> + // Therefore fetch all remotes which would be due for metric collection before
> + // the new timer fires.
> + self.fetch_overdue_and_save_state(next_run).await;
> }
> }
> }
> @@ -208,12 +221,12 @@ impl MetricCollectionTask {
> /// Set up a [`tokio::time::Interval`] instance with the provided interval.
> /// The timer will be aligned, e.g. an interval of `60` will let the timer
> /// fire at minute boundaries.
> - fn setup_timer(interval: u64) -> Interval {
> + fn setup_timer(interval: u64) -> (Interval, Instant) {
> let mut timer = tokio::time::interval(Duration::from_secs(interval));
> - let first_run = task_utils::next_aligned_instant(interval).into();
> - timer.reset_at(first_run);
> + let first_run = task_utils::next_aligned_instant(interval);
> + timer.reset_at(first_run.into());
>
> - timer
> + (timer, first_run)
> }
>
> /// Convenience helper to load `remote.cfg`, logging the error
> @@ -292,6 +305,51 @@ impl MetricCollectionTask {
> }
> }
>
> + /// Fetch metric data from remotes which are overdue for collection and save
> + /// collection state.
> + async fn fetch_overdue_and_save_state(&mut self, next_run: Instant) {
> + if let Some(remotes) = Self::load_remote_config() {
> + self.fetch_overdue(&remotes, next_run).await;
> + if let Err(e) = self.state.save() {
> + log::error!("could not update metric collection state: {e}");
> + }
> + }
> + }
> +
> + /// Fetch metric data from remotes which are overdue for collection.
> + ///
> + /// Use this on startup of the metric collection loop as well as
> + /// when the collection interval changes.
> + ///
> + /// Does nothing if the remote config could not be read, in this case an
> + /// error is logged.
> + async fn fetch_overdue(&mut self, remotes: &SectionConfigData<Remote>, next_run: Instant) {
> + let left_until_scheduled = next_run - Instant::now();
> + let now = proxmox_time::epoch_i64();
> +
> + let mut overdue = Vec::new();
> +
> + for remote in &remotes.order {
> + let last_collection = self
> + .state
> + .get_status(remote)
> + .and_then(|s| s.last_collection)
> + .unwrap_or(0);
> +
> + let diff = now - last_collection;
> +
> + if diff + left_until_scheduled.as_secs() as i64
> + > self.settings.collection_interval_or_default() as i64
> + {
> + log::debug!(
> + "starting metric collection for remote '{remote}' - triggered because collection is overdue"
> + );
> + overdue.push(remote.clone());
> + }
> + }
> + self.fetch_remotes(remotes, &overdue).await;
> + }
> +
> /// Fetch a single remote.
> #[tracing::instrument(skip_all, fields(remote = remote.id), name = "metric_collection_task")]
> async fn fetch_single_remote(
> --
> 2.39.5
>
>
>
> _______________________________________________
> pdm-devel mailing list
> pdm-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
>
>
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next prev parent reply other threads:[~2025-02-13 8:56 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 12:05 [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/25] test support: add NamedTempFile helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/25] test support: add NamedTempDir helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/25] pdm-api-types: add CollectionSettings type Lukas Wagner
2025-02-11 14:18 ` Maximiliano Sandoval
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/25] pdm-config: add functions for reading/writing metric collection settings Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/25] metric collection: split top_entities split into separate module Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/25] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-02-12 13:59 ` Wolfgang Bumiller
2025-02-12 14:32 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/25] metric collection: rework metric poll task Lukas Wagner
2025-02-11 12:58 ` Lukas Wagner
2025-02-12 15:57 ` Wolfgang Bumiller
2025-02-13 12:31 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/25] metric collection: persist state after metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/25] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-02-13 8:55 ` Wolfgang Bumiller [this message]
2025-02-13 13:50 ` Lukas Wagner
2025-02-13 14:19 ` Wolfgang Bumiller
2025-02-13 15:21 ` Lukas Wagner
2025-02-13 15:34 ` Wolfgang Bumiller
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/25] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/25] metric collection: add test for fetch_overdue Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/25] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/25] metric collection: add test for rrd task Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/25] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 16/25] metric collection: record remote response time in metric database Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 17/25] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-02-13 11:53 ` Wolfgang Bumiller
2025-02-13 12:12 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 18/25] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 19/25] api: add endpoint for updating metric collection settings Lukas Wagner
2025-02-13 12:09 ` Wolfgang Bumiller
2025-02-13 12:15 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 20/25] api: add endpoint to trigger metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 21/25] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 22/25] api: add api for querying metric collection RRD data Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 23/25] api: metric-collection: add status endpoint Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 24/25] pdm-client: add metric collection API methods Lukas Wagner
2025-02-13 12:10 ` Wolfgang Bumiller
2025-02-13 13:52 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 25/25] cli: add commands for metric-collection settings, trigger, status Lukas Wagner
2025-02-13 12:14 ` Wolfgang Bumiller
2025-02-13 14:17 ` Lukas Wagner
2025-02-13 14:56 ` Wolfgang Bumiller
2025-02-13 14:58 ` Lukas Wagner
2025-02-13 15:11 ` Lukas Wagner
2025-02-14 13:08 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi \
--to=w.bumiller@proxmox.com \
--cc=l.wagner@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal