From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 1C7981FF142 for ; Tue, 21 Apr 2026 08:49:50 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E681E153A4; Tue, 21 Apr 2026 08:49:49 +0200 (CEST) Date: Tue, 21 Apr 2026 08:49:42 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH proxmox-backup v6 03/15] tools: implement buffered logger for concurrent log messages To: Christian Ebner , pbs-devel@lists.proxmox.com References: <20260417092621.455374-1-c.ebner@proxmox.com> <20260417092621.455374-4-c.ebner@proxmox.com> <1776677069.eizi09d274.astroid@yuna.none> In-Reply-To: MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1776753720.36yeg85o88.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776754100553 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: CVS2XWOPS4DYIL75NZPPGB6CY4TWKXOI X-Message-ID-Hash: CVS2XWOPS4DYIL75NZPPGB6CY4TWKXOI X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On April 20, 2026 7:15 pm, Christian Ebner wrote: > On 4/20/26 12:56 PM, Fabian Gr=C3=BCnbichler wrote: >> On April 17, 2026 11:26 am, Christian Ebner wrote: >>> Implements a buffered logger instance which collects messages send >>> from different sender instances via an async tokio channel and >>> buffers them. Sender identify by label and provide a log level for >>> each log line to be buffered and flushed. >>> >>> On collection, log lines are grouped by label and buffered in >>> sequence of arrival per label, up to the configured maximum number of >>> per group lines or periodically with the configured interval. The >>> interval timeout is reset when contents are flushed. In addition, >>> senders can request flushing at any given point. >>> >>> When the timeout set based on the interval is reached, all labels >>> log buffers are flushed. There is no guarantee on the order of labels >>> when flushing. >>> >>> Log output is written based on provided log line level and prefixed >>> by the label. >>> >>> Signed-off-by: Christian Ebner > [..] >>> + /// Starts the collection loop spawned on a new tokio task >>> + /// Finishes when all sender belonging to the channel have been dr= opped. >>> + pub fn run_log_collection(mut self) { >>> + let future =3D async move { >>> + loop { >>> + let deadline =3D Instant::now() + self.max_aggregation= _time; >>> + match time::timeout_at(deadline, self.receive_log_line= ()).await { >>=20 >> why manually calculate the deadline, wouldn't using `time::timeout` work >> as well? the only difference from a quick glance is that that one does a >> checked_add for now + delay.. >=20 > No specific reason for using timeout_at() here, was primed by having=20 > based this on the s3 client timeout I guess it depends how we tackle below, for some approaches timeout_at with a deadline calculation that is not reset each iteration would also be appropriate.. >> but also, isn't this kind of broken in any case? let's say I have two >> labels A and B: >>=20 >> 0.99 A1 >> 1.98 A2 >> 2.97 A3 >> 3.96 A4 >> 4.95 A5 (now A is at capacity) >> 5.94 B1 >> 9.90 B5 (now B is at capacity as well) >>=20 >> either >>=20 >> 10.90 timeout elapses, everything is flushed >>=20 >> or >>=20 >> 10.89 A6 (A gets flushed and can start over - but B hasn't been flushed) >> 11.88 A7 >> 12.87 A8 >> 13.86 A9 >> 14.85 A10 (A has 5 buffered messages again) >> .. >>=20 >> this means that any label that doesn't log a 6th message can stall for >> quite a long time, as long as other labels make progress (and it isn't >> flushed explicitly)? >=20 > Yes, this is true, but that is not really avoidable unless there is a=20 > timeout per label. Or would you suggest to simply flush all buffered=20 > lines at periodic intervals, without resetting at all? yeah, I think we want to have a hard limit for the delay per label as well, not just one for the delay-if-no-activity-at-all. because the log timestamp is only added when emitting the log line to the real logger, and if we delay too long the log doesn't reflect reality anymore.. basically what we want to achieve here is that - a single no-change sync should be emitted as one block of log lines, unless it takes unusually long - a small burst of back-to-back log messages of the same group (e.g., two warning lines) are emitted as one block, unless timing was really bad - no individual log line should be delayed for more than N where N is rather small that does mean we need to do some extra checking when handling the messages coming in over the channel, because otherwise the channel traffic could overwhelm the flushing logic and violate the third property. >>> + Ok(finished) =3D> { >>> + if finished { >>> + break; >>> + } >>> + } >>> + Err(_timeout) =3D> self.flush_all_buffered(), >>> + } >>> + } >>> + }; >>> + match LogContext::current() { >>> + None =3D> tokio::spawn(future), >>> + Some(context) =3D> tokio::spawn(context.scope(future)), >>> + }; >>> + } >>> + >>> + /// Collects new log lines, buffers and flushes them if max lines = limit exceeded. >>> + /// >>> + /// Returns `true` if all the senders have been dropped and the ta= sk should no >>> + /// longer wait for new messages and finish. >>> + async fn receive_log_line(&mut self) -> bool { >>> + if let Some(request) =3D self.receiver.recv().await { >>> + match request { >>> + SenderRequest::Flush(label) =3D> { >>> + if let Some(log_lines) =3D self.buffer_map.get_mut= (&label) { >>> + Self::log_with_label(&label, log_lines); >>> + log_lines.clear(); >>> + } >>> + } >>> + SenderRequest::Message(log_line) =3D> { >>=20 >> if this would be Message((label, level, line)) or Message((label, >> level_and_line)) the label would not need to be stored in the buffer >> keys and values.. >=20 > Yes, adapted based on above mention already >=20 >>=20 >>> + if self.max_buffered_lines =3D=3D 0 >>> + || self.max_aggregation_time < Duration::from_= secs(0) >>=20 >> the timeout can never be below zero, as that is the minimum duration >> (duration is unsigned)? >=20 > This is a typo, the intention was to check for durations below 1 second,=20 > but since the granularity is seconds, this should check for 0 instead. in that case, you can just call self.max_aggregation_time.is_zero() :) though if we go with the trait/non-buffering implementation, this whole check could go, since a buffered logger would never be instantiated with buffering disabled? >>> + { >>> + // shortcut if no buffering should happen >>> + Self::log_by_level(&log_line.label, &log_line)= ; >>=20 >> shouldn't we rather handle this by not using the buffered logger in the >> first place? e.g., have this and a simple not-buffering logger implement >> a shared logging trait, or something similar? >=20 > Hmm, that might be better, yes. Will add a trait with 2 implementations=20 > based on which logger is required. >=20 >>=20 >> one simple approach would be to just make the LogLineSender log directly >> in this case, and not send anything at all? >>=20 >> because if we don't want buffering, sending all log messages through a >> channel and setting up the timeout machinery can be avoided completely.. >>=20 > [..]