public inbox for pdm-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: pdm-devel@lists.proxmox.com
Subject: [pdm-devel] [PATCH proxmox-datacenter-manager v7 06/24] metric collection: collect overdue metrics on startup/timer change
Date: Tue, 26 Aug 2025 15:51:01 +0200	[thread overview]
Message-ID: <20250826135119.336510-7-l.wagner@proxmox.com> (raw)
In-Reply-To: <20250826135119.336510-1-l.wagner@proxmox.com>

Due to the fact that the timer fires at aligned points in time and might
now fire right away after being set up, it could happen that we get gaps
in the data if we change the timer interval or at daemon startup.

To mitigate this, on daemon startup and also if the collection interval
changes, we
  - check if the time until the next scheduled regular collection
    plus the time *since* the last successful collection exceeds
    the collection interval
  - if yes, we collect immediately
  - if no, we do nothing and let the remote be collected at the
    next timer tick

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
---

Notes:
    Changes since v1:
      - Document return values of `setup_timer`

 .../src/metric_collection/collection_task.rs  | 57 +++++++++++++++++--
 1 file changed, 51 insertions(+), 6 deletions(-)

diff --git a/server/src/metric_collection/collection_task.rs b/server/src/metric_collection/collection_task.rs
index f6052b73..5d19e1c5 100644
--- a/server/src/metric_collection/collection_task.rs
+++ b/server/src/metric_collection/collection_task.rs
@@ -1,4 +1,7 @@
-use std::{sync::Arc, time::Duration};
+use std::{
+    sync::Arc,
+    time::{Duration, Instant},
+};
 
 use anyhow::Error;
 use tokio::{
@@ -67,12 +70,17 @@ impl MetricCollectionTask {
     /// This function never returns.
     #[tracing::instrument(skip_all, name = "metric_collection_task")]
     pub(super) async fn run(&mut self) {
-        let mut timer = Self::setup_timer(DEFAULT_COLLECTION_INTERVAL);
+        let (mut timer, first_tick) = Self::setup_timer(DEFAULT_COLLECTION_INTERVAL);
 
         log::debug!(
             "metric collection starting up. Collection interval set to {} seconds.",
             DEFAULT_COLLECTION_INTERVAL,
         );
+        // Check and fetch any remote which would be overdue by the time the
+        // timer first fires.
+        if let Some(remote_config) = Self::load_remote_config() {
+            self.fetch_overdue(&remote_config, first_tick).await;
+        }
 
         loop {
             tokio::select! {
@@ -127,12 +135,16 @@ impl MetricCollectionTask {
     /// Set up a [`tokio::time::Interval`] instance with the provided interval.
     /// The timer will be aligned, e.g. an interval of `60` will let the timer
     /// fire at minute boundaries.
-    fn setup_timer(interval: u64) -> Interval {
+    ///
+    /// The return values are a tuple of the [`tokio::time::Interval`] timer instance
+    /// and the [`std::time::Instant`] at which the timer first fires.
+    fn setup_timer(interval: u64) -> (Interval, Instant) {
+        log::debug!("setting metric collection interval timer to {interval} seconds.",);
         let mut timer = tokio::time::interval(Duration::from_secs(interval));
-        let first_run = task_utils::next_aligned_instant(interval).into();
-        timer.reset_at(first_run);
+        let first_run = task_utils::next_aligned_instant(interval);
+        timer.reset_at(first_run.into());
 
-        timer
+        (timer, first_run)
     }
 
     /// Convenience helper to load `remote.cfg`, logging the error
@@ -208,6 +220,39 @@ impl MetricCollectionTask {
         }
     }
 
+    /// Fetch metric data from remotes which are overdue for collection.
+    ///
+    /// Use this on startup of the metric collection loop as well as
+    /// when the collection interval changes.
+    async fn fetch_overdue(
+        &mut self,
+        remote_config: &SectionConfigData<Remote>,
+        next_run: Instant,
+    ) {
+        let left_until_scheduled = next_run - Instant::now();
+        let now = proxmox_time::epoch_i64();
+
+        let mut overdue = Vec::new();
+
+        for (remote, _) in remote_config.iter() {
+            let last_collection = self
+                .state
+                .get_status(remote)
+                .and_then(|s| s.last_collection)
+                .unwrap_or(0);
+
+            let diff = now - last_collection;
+
+            if diff + left_until_scheduled.as_secs() as i64 > DEFAULT_COLLECTION_INTERVAL as i64 {
+                log::debug!(
+                    "starting metric collection for remote '{remote}' - triggered because collection is overdue"
+                );
+                overdue.push(remote.into());
+            }
+        }
+        self.fetch_remotes(remote_config, &overdue).await;
+    }
+
     /// Fetch a single remote.
     #[tracing::instrument(skip_all, fields(remote = remote.id), name = "metric_collection_task")]
     async fn fetch_single_remote(
-- 
2.47.2



_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel


  parent reply	other threads:[~2025-08-26 13:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-26 13:50 [pdm-devel] [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI) Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 01/24] metric collection: split top_entities split into separate module Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 02/24] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 03/24] metric collection: rework metric poll task Lukas Wagner
2025-08-26 13:50 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 04/24] metric collection: persist state after metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 05/24] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-08-26 13:51 ` Lukas Wagner [this message]
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 07/24] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 08/24] metric collection: add test for fetch_overdue Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 09/24] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 10/24] metric collection: add test for rrd task Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 11/24] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 12/24] metric collection: record remote response time in metric database Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 13/24] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 14/24] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 15/24] api: add endpoint to trigger metric collection Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 16/24] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 17/24] api: add api for querying metric collection RRD data Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 18/24] api: metric-collection: add status endpoint Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 19/24] pdm-client: add metric collection API methods Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 20/24] cli: add commands for metric-collection trigger and status Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 21/24] metric collection: skip missed timer ticks Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 22/24] metric collection: use JoinSet instead of joining from handles in a Vec Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 23/24] metric collection: allow to wait until completion when triggering collection manually Lukas Wagner
2025-08-26 13:51 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 24/24] api: pve: rrd: trigger and wait for metric collection when requesting RRD data Lukas Wagner
2025-08-28 19:37 ` [pdm-devel] applied: [PATCH proxmox-datacenter-manager v7 00/24] metric collection improvements (concurrency, API, CLI) Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250826135119.336510-7-l.wagner@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=pdm-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal