From: Lukas Wagner <l.wagner@proxmox.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change
Date: Thu, 13 Feb 2025 16:21:33 +0100 [thread overview]
Message-ID: <ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com> (raw)
In-Reply-To: <tioielukadqvyajbzi5444is2eycwapzrn6d6pl5pndbxono6y@utdkmguj6v6l>
On 2025-02-13 15:19, Wolfgang Bumiller wrote:
> On Thu, Feb 13, 2025 at 02:50:32PM +0100, Lukas Wagner wrote:
>>
>>
>> On 2025-02-13 09:55, Wolfgang Bumiller wrote:
>>>> loop {
>>>> let old_settings = self.settings.clone();
>>>> tokio::select! {
>>>> @@ -124,7 +132,12 @@ impl MetricCollectionTask {
>>>> "metric collection interval changed to {} seconds, reloading timer",
>>>> interval
>>>> );
>>>> - timer = Self::setup_timer(interval);
>>>> + (timer, next_run) = Self::setup_timer(interval);
>>>> + // If change (and therefore reset) our timer right before it fires,
>>>> + // we could potentially miss one collection event.
>>>
>>> Couldn't we instead just pass `next_run` through to `setup_timer` and
>>> call `reset_at(next_run)` on it? (`first_run` would only be used in the
>>> initial setup, so `next_run` could either be an `Option`, or the setup
>>> code does the `next_aligned_instant` call...
>>>
>>> This should be much less code by making the new
>>> `fetch_overdue{,_and_save_sate}()` functions unnecessary, or am I
>>> missing something?
>>>
>>
>> I guess the question is, do we want nicely aligned timer ticks?
>>
>> e.g. 14:01:00, 14:02:00, 14:03:00 ... for 60 second interval
>> or 14:00:00, 14:05:00, 14:10:00 ... for a 5 minute interval?
>>
>> Because that was the main intention behind using the 'collection-interval' as
>> a base for calculating the aligned instant for the first timer reset.
>> If we reuse the 'old' `next_run` when the interval is changed, we
>> also reuse the old alignment.
>>
>> For instance, when changing from initially 1 minute to 5 minutes, the
>> timer ticks might come at
>> 14:01:00, 14:06:00, 14:11:00
>>
>> Technically, the naming for the `next_run` variable is not the best,
>> since it just contains the Instant when the timer *first* fires, but
>> this is then never updated to the *next* time the timer will fire...
>> So that means that when changing the interval with your suggested change,
>> you'd pass an Instant to `reset_at` that is already in the past,
>> causing the timer to fire immediately.
>>
>> If we *don't* care about the aligned ticks as described above, we could
>> just use a static alignment boundary, e.g. 60 seconds.
>> In this case we can also get rid of the fetch_overdue stuff, since
>> at worst case we have 60 seconds until the next tick on startup or timer change,
>> which should be good enough to prevent any significant gaps in the data.
>
> What about setting a flag - if the current next tick was earlier than
> the new next tick - to tell tick() to re-align the timer when it is next
> triggered?
The problem is that tokio::time::Interval doesn't give you a way to query when
the next expected tick will be. We can only approximate it by recalculating
a new aligned instant with the same interval, but I guess this might behave
unpredictably in edge cases.
>
> So when going from 1 to 5 minutes at 14:01:50, we `.reset_at(14:02:00)`
> and also set `realign = true`, and at 14:02, tick() should
> `.reset_at(14:05:00)`.
>
> I just feel like the logic in the "fetch_overdue" code should not be
> necessary to have, but if it's too awkward to handle via the tick timer,
> it's fine to keep it in v2.
fetch_overdue is also called when the daemon starts up. If we align to the collection interval
and the daemon was down for a while, we otherwise might end up with gaps in the data.
Remotes keep metric history for 30 minutes. If PDM is down for, say, 29 minutes and
we are aligning to 15min boundaries, in the worst case we might have to wait for
another 15min to fetch metrics, resulting in a gap.
Of course we could just unconditionally force collection after startup, but I think the
fetch_overdue solution solves this and the timer change issue quite okayish.
In v2 I got rid of the fetch_overdue_and_save_state wrapper by putting the state.save()
that we already had in the main loop at the end of the loop, so it's a bit less code now.
The remaining code is not really that complex, I think I'd prefer to keep it
for now.
--
- Lukas
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
next prev parent reply other threads:[~2025-02-13 15:21 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-11 12:05 [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 01/25] test support: add NamedTempFile helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 02/25] test support: add NamedTempDir helper Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 03/25] pdm-api-types: add CollectionSettings type Lukas Wagner
2025-02-11 14:18 ` Maximiliano Sandoval
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 04/25] pdm-config: add functions for reading/writing metric collection settings Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 05/25] metric collection: split top_entities split into separate module Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 06/25] metric collection: save metric data to RRD in separate task Lukas Wagner
2025-02-12 13:59 ` Wolfgang Bumiller
2025-02-12 14:32 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 07/25] metric collection: rework metric poll task Lukas Wagner
2025-02-11 12:58 ` Lukas Wagner
2025-02-12 15:57 ` Wolfgang Bumiller
2025-02-13 12:31 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 08/25] metric collection: persist state after metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 09/25] metric collection: skip if last_collection < MIN_COLLECTION_INTERVAL Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change Lukas Wagner
2025-02-13 8:55 ` Wolfgang Bumiller
2025-02-13 13:50 ` Lukas Wagner
2025-02-13 14:19 ` Wolfgang Bumiller
2025-02-13 15:21 ` Lukas Wagner [this message]
2025-02-13 15:34 ` Wolfgang Bumiller
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 11/25] metric collection: add tests for the fetch_remotes function Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 12/25] metric collection: add test for fetch_overdue Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 13/25] metric collection: pass rrd cache instance as function parameter Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 14/25] metric collection: add test for rrd task Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 15/25] metric collection: wrap rrd_cache::Cache in a struct Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 16/25] metric collection: record remote response time in metric database Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 17/25] metric collection: save time needed for collection run to RRD Lukas Wagner
2025-02-13 11:53 ` Wolfgang Bumiller
2025-02-13 12:12 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 18/25] metric collection: periodically clean removed remotes from statefile Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 19/25] api: add endpoint for updating metric collection settings Lukas Wagner
2025-02-13 12:09 ` Wolfgang Bumiller
2025-02-13 12:15 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 20/25] api: add endpoint to trigger metric collection Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 21/25] api: remotes: trigger immediate metric collection for newly added nodes Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 22/25] api: add api for querying metric collection RRD data Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 23/25] api: metric-collection: add status endpoint Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 24/25] pdm-client: add metric collection API methods Lukas Wagner
2025-02-13 12:10 ` Wolfgang Bumiller
2025-02-13 13:52 ` Lukas Wagner
2025-02-11 12:05 ` [pdm-devel] [PATCH proxmox-datacenter-manager 25/25] cli: add commands for metric-collection settings, trigger, status Lukas Wagner
2025-02-13 12:14 ` Wolfgang Bumiller
2025-02-13 14:17 ` Lukas Wagner
2025-02-13 14:56 ` Wolfgang Bumiller
2025-02-13 14:58 ` Lukas Wagner
2025-02-13 15:11 ` Lukas Wagner
2025-02-14 13:08 ` [pdm-devel] [PATCH proxmox-datacenter-manager 00/25] metric collection improvements (concurrency, config, API, CLI) Lukas Wagner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com \
--to=l.wagner@proxmox.com \
--cc=pdm-devel@lists.proxmox.com \
--cc=w.bumiller@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal