From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 9ECA41FF163 for ; Thu, 21 Nov 2024 20:32:51 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D42EF4F4F; Thu, 21 Nov 2024 20:32:59 +0100 (CET) Date: Thu, 21 Nov 2024 20:32:56 +0100 (CET) From: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= To: Proxmox Backup Server development discussion , Gabriel Goller , Christian Ebner Message-ID: <1840736894.8111.1732217576163@webmail.proxmox.com> In-Reply-To: References: <20241121154337.471425-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev69 X-Originating-Client: open-xchange-appsuite X-SPAM-LEVEL: Spam detection results: 0 AWL 0.048 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup 0/4] improve push sync job log messages X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" > Gabriel Goller hat am 21.11.2024 18:04 CET geschrieben: > > > On 21.11.2024 17:26, Christian Ebner wrote: > >On 11/21/24 17:06, Gabriel Goller wrote: > >>I think I'd be nice if we use `.context()` instead of `format_err` and > >>debug print ("{err:?}") instead of default ("{}") here. As these > >>messages are all errors they shouldn't appear too often in the log and > >>if an error happens, you get much more information. > >> > >>What do you think? > > > >Hmm, can check that out, but wouldn't that disrupt the single line > >character of the log entries, prefixed by the time stamp? > > We could use "{err:#}", which will print everything in one line? > > Although on the other hand, I'd understand if you want to have strict > control over what is displayed to the users, so no hard feelings on > this one. big tasks/job like the sync ones are a bit special w.r.t. error handling because they "catch" errors often at intermediate levels, log them, record that an error occur, but then proceed with a sensible next unit of work. so it's not possible to just add a lot of context to an error like with simple(r) API calls, where many/most errors can be treated as fatal, maybe requiring a bit of cleanup before bailing completely. I think a good compromise is to treat those units of work as a sort of error scope, and - add context where it is missing (e.g., low level file or network access, where just bubbling up errors might be meaningless) - and then when logging the error, think about formatting e.g., for syncs we have groups as lowest unit of work - if something fails within a group, we abort that group, but proceed with the next one. but if we just log "syncing group X failed - permission denied" that doesn't help whatsoever. it might be better to have three lines of warnings if something goes wrong, if those three lines contain appropriate information like - request X / file access Y /.. failed with error Z (root cause) - while processing snapshot A in group B (important context, because the request or path of the root cause might not tell us this) - syncing group B failed (result of the error chain) achieving that requires carefully analyzing all error sources/chains though. when in doubt, I'd rather have a bit too much context initially (that usually means too much is added at lower levels and we can optimize there) than too little, but within reason ;) we can't travel back in time and add needed information to debug issues to our or users' logs, a stray duplicate context can easily be filtered out though. of course that doesn't mean every error is allowed to grow 10 lines long by default, balance is still important even when doing a first pass/implementation of improved error messages! and of course, the general rule applies - if we use errors with proper context, it should be done fairly consistently, so that the next level up can actually use it. it might be easier to use format_err or explicit additional `warn!` invocations to improve the situation immediately, and postpone adding proper contexts to do it for all of sync without time pressure. _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel