From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pdm-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id C56181FF16F
	for <inbox@lore.proxmox.com>; Thu, 13 Feb 2025 16:34:36 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id D11448EA5;
	Thu, 13 Feb 2025 16:34:33 +0100 (CET)
Date: Thu, 13 Feb 2025 16:34:29 +0100
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Lukas Wagner <l.wagner@proxmox.com>
Message-ID: <siuxzgzefcm6kye63hkpjmmz4ibqrxhjo233egh545dqddpcxh@ezwzvaqhjihi>
References: <20250211120541.163621-1-l.wagner@proxmox.com>
 <20250211120541.163621-11-l.wagner@proxmox.com>
 <kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi>
 <f617ae45-f205-4dca-a3b3-7febb924e67b@proxmox.com>
 <tioielukadqvyajbzi5444is2eycwapzrn6d6pl5pndbxono6y@utdkmguj6v6l>
 <ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.081 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric
 collection: collect overdue metrics on startup/timer change
X-BeenThere: pdm-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Datacenter Manager development discussion
 <pdm-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/>
List-Post: <mailto:pdm-devel@lists.proxmox.com>
List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, 
 <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Datacenter Manager development discussion
 <pdm-devel@lists.proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pdm-devel-bounces@lists.proxmox.com
Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com>

On Thu, Feb 13, 2025 at 04:21:33PM +0100, Lukas Wagner wrote:
> 
> 
> On  2025-02-13 15:19, Wolfgang Bumiller wrote:
> > On Thu, Feb 13, 2025 at 02:50:32PM +0100, Lukas Wagner wrote:
> >>
> >>
> >> On  2025-02-13 09:55, Wolfgang Bumiller wrote:
> >>>>          loop {
> >>>>              let old_settings = self.settings.clone();
> >>>>              tokio::select! {
> >>>> @@ -124,7 +132,12 @@ impl MetricCollectionTask {
> >>>>                      "metric collection interval changed to {} seconds, reloading timer",
> >>>>                      interval
> >>>>                  );
> >>>> -                timer = Self::setup_timer(interval);
> >>>> +                (timer, next_run) = Self::setup_timer(interval);
> >>>> +                // If change (and therefore reset) our timer right before it fires,
> >>>> +                // we could potentially miss one collection event.
> >>>
> >>> Couldn't we instead just pass `next_run` through to `setup_timer` and
> >>> call `reset_at(next_run)` on it? (`first_run` would only be used in the
> >>> initial setup, so `next_run` could either be an `Option`, or the setup
> >>> code does the `next_aligned_instant` call...
> >>>
> >>> This should be much less code by making the new
> >>> `fetch_overdue{,_and_save_sate}()` functions unnecessary, or am I
> >>> missing something?
> >>>
> >>
> >> I guess the question is, do we want nicely aligned timer ticks?
> >>
> >> e.g. 14:01:00, 14:02:00, 14:03:00 ... for 60 second interval
> >> or   14:00:00, 14:05:00, 14:10:00 ... for a 5 minute interval?
> >>
> >> Because that was the main intention behind using the 'collection-interval' as
> >> a base for calculating the aligned instant for the first timer reset.
> >> If we reuse the 'old' `next_run` when the interval is changed, we
> >> also reuse the old alignment. 
> >>
> >> For instance, when changing from initially 1 minute to 5 minutes, the
> >> timer ticks might come at 
> >>   14:01:00, 14:06:00, 14:11:00
> >>
> >> Technically, the naming for the `next_run` variable is not the best,
> >> since it just contains the Instant when the timer *first* fires, but
> >> this is then never updated to the *next* time the timer will fire...
> >> So that means that when changing the interval with your suggested change,
> >> you'd pass an Instant to `reset_at` that is already in the past,
> >> causing the timer to fire immediately.
> >>
> >> If we *don't* care about the aligned ticks as described above, we could
> >> just use a static alignment boundary, e.g. 60 seconds.
> >> In this case we can also get rid of the fetch_overdue stuff, since
> >> at worst case we have 60 seconds until the next tick on startup or timer change,
> >> which should be good enough to prevent any significant gaps in the data.
> > 
> > What about setting a flag - if the current next tick was earlier than
> > the new next tick - to tell tick() to re-align the timer when it is next
> > triggered?
> 
> The problem is that tokio::time::Interval doesn't give you a way to query when
> the next expected tick will be. We can only approximate it by recalculating
> a new aligned instant with the same interval, but I guess this might behave 
> unpredictably in edge cases.
> 
> > 
> > So when going from 1 to 5 minutes at 14:01:50, we `.reset_at(14:02:00)`
> > and also set `realign = true`, and at 14:02, tick() should
> > `.reset_at(14:05:00)`.
> > 
> > I just feel like the logic in the "fetch_overdue" code should not be
> > necessary to have, but if it's too awkward to handle via the tick timer,
> > it's fine to keep it in v2.
> 
> fetch_overdue is also called when the daemon starts up. If we align to the collection interval
> and the daemon was down for a while, we otherwise might end up with gaps in the data.
> 
> Remotes keep metric history for 30 minutes. If PDM is down for, say, 29 minutes and
> we are aligning to 15min boundaries, in the worst case we might have to wait for
> another 15min to fetch metrics, resulting in a gap.
> 
> Of course we could just unconditionally force collection after startup, but I think the
> fetch_overdue solution solves this and the timer change issue quite okayish.
> 
> In v2 I got rid of the fetch_overdue_and_save_state wrapper by putting the state.save()
> that we already had in the main loop at the end of the loop, so it's a bit less code now.
> The remaining code is not really that complex, I think I'd prefer to keep it
> for now.

Okay.


_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel