From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pdm-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id C56181FF16F for <inbox@lore.proxmox.com>; Thu, 13 Feb 2025 16:34:36 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D11448EA5; Thu, 13 Feb 2025 16:34:33 +0100 (CET) Date: Thu, 13 Feb 2025 16:34:29 +0100 From: Wolfgang Bumiller <w.bumiller@proxmox.com> To: Lukas Wagner <l.wagner@proxmox.com> Message-ID: <siuxzgzefcm6kye63hkpjmmz4ibqrxhjo233egh545dqddpcxh@ezwzvaqhjihi> References: <20250211120541.163621-1-l.wagner@proxmox.com> <20250211120541.163621-11-l.wagner@proxmox.com> <kpzd2j2hmms6nkugixdagifikc3mii5hyqj66ofic7wsxztv7h@losm6xjj4cqi> <f617ae45-f205-4dca-a3b3-7febb924e67b@proxmox.com> <tioielukadqvyajbzi5444is2eycwapzrn6d6pl5pndbxono6y@utdkmguj6v6l> <ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <ff8f6b15-b3aa-4229-ae50-3218a919c5d4@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.081 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pdm-devel] [PATCH proxmox-datacenter-manager 10/25] metric collection: collect overdue metrics on startup/timer change X-BeenThere: pdm-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Datacenter Manager development discussion <pdm-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pdm-devel/> List-Post: <mailto:pdm-devel@lists.proxmox.com> List-Help: <mailto:pdm-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel>, <mailto:pdm-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Datacenter Manager development discussion <pdm-devel@lists.proxmox.com> Cc: pdm-devel@lists.proxmox.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pdm-devel-bounces@lists.proxmox.com Sender: "pdm-devel" <pdm-devel-bounces@lists.proxmox.com> On Thu, Feb 13, 2025 at 04:21:33PM +0100, Lukas Wagner wrote: > > > On 2025-02-13 15:19, Wolfgang Bumiller wrote: > > On Thu, Feb 13, 2025 at 02:50:32PM +0100, Lukas Wagner wrote: > >> > >> > >> On 2025-02-13 09:55, Wolfgang Bumiller wrote: > >>>> loop { > >>>> let old_settings = self.settings.clone(); > >>>> tokio::select! { > >>>> @@ -124,7 +132,12 @@ impl MetricCollectionTask { > >>>> "metric collection interval changed to {} seconds, reloading timer", > >>>> interval > >>>> ); > >>>> - timer = Self::setup_timer(interval); > >>>> + (timer, next_run) = Self::setup_timer(interval); > >>>> + // If change (and therefore reset) our timer right before it fires, > >>>> + // we could potentially miss one collection event. > >>> > >>> Couldn't we instead just pass `next_run` through to `setup_timer` and > >>> call `reset_at(next_run)` on it? (`first_run` would only be used in the > >>> initial setup, so `next_run` could either be an `Option`, or the setup > >>> code does the `next_aligned_instant` call... > >>> > >>> This should be much less code by making the new > >>> `fetch_overdue{,_and_save_sate}()` functions unnecessary, or am I > >>> missing something? > >>> > >> > >> I guess the question is, do we want nicely aligned timer ticks? > >> > >> e.g. 14:01:00, 14:02:00, 14:03:00 ... for 60 second interval > >> or 14:00:00, 14:05:00, 14:10:00 ... for a 5 minute interval? > >> > >> Because that was the main intention behind using the 'collection-interval' as > >> a base for calculating the aligned instant for the first timer reset. > >> If we reuse the 'old' `next_run` when the interval is changed, we > >> also reuse the old alignment. > >> > >> For instance, when changing from initially 1 minute to 5 minutes, the > >> timer ticks might come at > >> 14:01:00, 14:06:00, 14:11:00 > >> > >> Technically, the naming for the `next_run` variable is not the best, > >> since it just contains the Instant when the timer *first* fires, but > >> this is then never updated to the *next* time the timer will fire... > >> So that means that when changing the interval with your suggested change, > >> you'd pass an Instant to `reset_at` that is already in the past, > >> causing the timer to fire immediately. > >> > >> If we *don't* care about the aligned ticks as described above, we could > >> just use a static alignment boundary, e.g. 60 seconds. > >> In this case we can also get rid of the fetch_overdue stuff, since > >> at worst case we have 60 seconds until the next tick on startup or timer change, > >> which should be good enough to prevent any significant gaps in the data. > > > > What about setting a flag - if the current next tick was earlier than > > the new next tick - to tell tick() to re-align the timer when it is next > > triggered? > > The problem is that tokio::time::Interval doesn't give you a way to query when > the next expected tick will be. We can only approximate it by recalculating > a new aligned instant with the same interval, but I guess this might behave > unpredictably in edge cases. > > > > > So when going from 1 to 5 minutes at 14:01:50, we `.reset_at(14:02:00)` > > and also set `realign = true`, and at 14:02, tick() should > > `.reset_at(14:05:00)`. > > > > I just feel like the logic in the "fetch_overdue" code should not be > > necessary to have, but if it's too awkward to handle via the tick timer, > > it's fine to keep it in v2. > > fetch_overdue is also called when the daemon starts up. If we align to the collection interval > and the daemon was down for a while, we otherwise might end up with gaps in the data. > > Remotes keep metric history for 30 minutes. If PDM is down for, say, 29 minutes and > we are aligning to 15min boundaries, in the worst case we might have to wait for > another 15min to fetch metrics, resulting in a gap. > > Of course we could just unconditionally force collection after startup, but I think the > fetch_overdue solution solves this and the timer change issue quite okayish. > > In v2 I got rid of the fetch_overdue_and_save_state wrapper by putting the state.save() > that we already had in the main loop at the end of the loop, so it's a bit less code now. > The remaining code is not really that complex, I think I'd prefer to keep it > for now. Okay. _______________________________________________ pdm-devel mailing list pdm-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel