From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 595DA1FF142 for ; Tue, 21 Apr 2026 10:01:08 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 0AA701754E; Tue, 21 Apr 2026 10:01:08 +0200 (CEST) Date: Tue, 21 Apr 2026 10:00:26 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context To: Christian Ebner , pbs-devel@lists.proxmox.com References: <20260417092621.455374-1-c.ebner@proxmox.com> <20260417092621.455374-12-c.ebner@proxmox.com> <1776684473.p5hg4a1vym.astroid@yuna.none> <295ee353-2731-474b-aad0-d1f8a2bcb443@proxmox.com> <7f8852e8-c092-4e2d-a031-192c087fb364@proxmox.com> In-Reply-To: <7f8852e8-c092-4e2d-a031-192c087fb364@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1776757625.a5d7yahnh6.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776758344267 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.946 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_MAILER 2 Automated Mailer Tag Left in Email SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Message-ID-Hash: ARLWJRXKT5PJZZUU2MOV3WQGQX7JZKSH X-Message-ID-Hash: ARLWJRXKT5PJZZUU2MOV3WQGQX7JZKSH X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On April 21, 2026 9:42 am, Christian Ebner wrote: > On 4/21/26 9:21 AM, Christian Ebner wrote: >> On 4/20/26 1:54 PM, Fabian Gr=C3=BCnbichler wrote: >>> On April 17, 2026 11:26 am, Christian Ebner wrote: >>>> Pulling groups and therefore also snapshots in parallel leads to >>>> unordered log outputs, making it mostly impossible to relate a log >>>> message to a backup snapshot/group. >>>> >>>> Therefore, prefix pull job log messages by the corresponding group or >>>> snapshot and set the error context accordingly. >>>> >>>> Also, reword some messages, inline variables in format strings and >>>> start log lines with capital letters to get consistent output. >>>> >>>> By using the buffered logger implementation and buffer up to 5 lines >>>> with a timeout of 1 second, subsequent log lines arriving in fast >>>> succession are kept together, reducing the mixing of lines. >>>> >>>> Example output for a sequential pull job: >>>> ``` >>>> ... >>>> [ct/100]: 2025-11-17T10:11:42Z: start sync >>>> [ct/100]: 2025-11-17T10:11:42Z: pct.conf.blob: sync archive >>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: sync archive >>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: downloaded 16.785=20 >>>> MiB (280.791 MiB/s) >>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: sync archive >>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: downloaded 65.703=20 >>>> KiB (29.1 MiB/s) >>>> [ct/100]: 2025-11-17T10:11:42Z: sync done >>>> [ct/100]: percentage done: 9.09% (1/11 groups) >>>> [ct/101]: 2026-03-31T12:20:16Z: start sync >>>> [ct/101]: 2026-03-31T12:20:16Z: pct.conf.blob: sync archive >>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: sync archive >>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: downloaded 199.806=20 >>>> MiB (311.91 MiB/s) >>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: sync archive >>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: downloaded=20 >>>> 180.379 KiB (22.748 MiB/s) >>>> [ct/101]: 2026-03-31T12:20:16Z: sync done >>> >>> this is probably the wrong patch for this comment, but since you have >>> the sample output here ;) >>> >>> should we pad the labels? the buffered logger knows all currently activ= e >>> labels and could adapt it? otherwise especially for host backups the >>> logs are again not really scannable by humans, because there will be >>> weird jumps in alignment.. or (.. continued below) >>=20 >> That will not work though? While the logger knows about the labels when=20 >> receiving them, it does not know at all about future ones that might=20 >> still come in. So it can happen that the padding does not fit anymore,=20 >> and one gets even uglier formatting issues. >>=20 >> I would prefer to keep them as is for the time being. For vm/ct it is=20 >> not that bad, one could maybe forsee the ID to be a 4 digit number and=20 >> define a minimum label lenght to be padded if not reached. ack, let's leave this as it is for now, can always be bikeshedded if needed later on! >>=20 >> Host backups or backups with explicit label >>=20 cut off? >>=20 >>>> ... >>>> ``` >>>> >>>> Example output for a parallel pull job: >>>> ``` >>>> ... >>>> [ct/107]: 2025-07-16T09:14:01Z: start sync >>>> [ct/107]: 2025-07-16T09:14:01Z: pct.conf.blob: sync archive >>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: sync archive >>>> [vm/108]: 2025-09-19T07:37:19Z: start sync >>>> [vm/108]: 2025-09-19T07:37:19Z: qemu-server.conf.blob: sync archive >>>> [vm/108]: 2025-09-19T07:37:19Z: drive-scsi0.img.fidx: sync archive >>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: downloaded 609.233=20 >>>> MiB (112.628 MiB/s) >>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: sync archive >>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: downloaded 1.172 MiB=20 >>>> (17.838 MiB/s) >>>> [ct/107]: 2025-07-16T09:14:01Z: sync done >>>> [ct/107]: percentage done: 72.73% (8/11 groups) >>> >>> the way the prefix and snapshot are formatted could also be interpreted >>> at first glance as a timestamp of the log line.. why not just prepend >>> the prefix on the logger side, and leave it up to the caller to do the >>> formatting? then we could us "{type}/{id}/" as prefix here? or add >>> 'snapshot' in those lines to make it clear? granted, this is more of an >>> issue when viewing the log via `proxmox-backup-manager`, as in the UI w= e >>> have the log timestamps up front.. >>=20 >> I simply followed along the line of your suggestion here. In prior=20 >> versions I had exactly that but you rejected it as to verbose? :) >>=20 >> https://lore.proxmox.com/pbs-devel/1774263381.bngcrer2th.astroid@yuna.no= ne/ >>=20 my suggestions (for just the ct/107 lines above) would have been: ct/107: 2025-07-16T09:14:01Z: start sync ct/107: 2025-07-16T09:14:01Z pct.conf.blob: sync archive ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: sync archive ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: downloaded 609.233 MiB (112.= 628 MiB/s) ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: sync archive ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: downloaded 1.172 MiB (17.838= MiB/s) ct/107: 2025-07-16T09:14:01Z: sync done ct/107: percentage done: 72.73% (8/11 groups) though it might make sense to group that even more visually ct/107: 2025-07-16T09:14:01Z: start sync ct/107: 2025-07-16T09:14:01Z/pct.conf.blob: sync archive ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: sync archive ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: downloaded 609.233 MiB (112.= 628 MiB/s) ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: sync archive ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: downloaded 1.172 MiB (17.838= MiB/s) ct/107: 2025-07-16T09:14:01Z: sync done ct/107: percentage done: 72.73% (8/11 groups) or [ct/107 2025-07-16T09:14:01Z] start sync [ct/107 2025-07-16T09:14:01Z pct.conf.blob] sync archive [ct/107 2025-07-16T09:14:01Z root.ppxar.didx] sync archive [ct/107 2025-07-16T09:14:01Z root.ppxar.didx] downloaded 609.233 MiB (112.= 628 MiB/s) [ct/107 2025-07-16T09:14:01Z root.mpxar.didx] sync archive [ct/107 2025-07-16T09:14:01Z root.mpxar.didx] downloaded 1.172 MiB (17.838= MiB/s) [ct/107 2025-07-16T09:14:01Z] sync done [ct/107] percentage done: 72.73% (8/11 groups) but that is probably a matter of taste, so let's postpone this aspect for now. >>> >>> and maybe(?) log the progress line using a different prefix? because >>> right now the information that the group [ct/107] is finished is not >>> really clear from the output, IMHO. >>=20 >> That I can add, yes. thanks! >>=20 >>> >>> the progress logging is also still broken (this is for a sync that take= s >>> a while, this is not log messages being buffered and re-ordered!): >>> >>> $ proxmox-backup-manager task log=20 >>> 'UPID:yuna:00070656:001E420F:00000002:69E610EB:syncjob:local\x3atest\x3= atank\x3a\x3as\x2dbc01cba6\x2d805a:root@pam:' | grep -e namespace -e 'perce= ntage done' >>> Syncing datastore 'test', root namespace into datastore 'tank', root=20 >>> namespace >>> Finished syncing root namespace, current progress: 0 groups, 0 snapshot= s >>> Syncing datastore 'test', namespace 'test' into datastore 'tank',=20 >>> namespace 'test' >>> [host/exclusion-test]: percentage done: 5.26% (1/19 groups) >>> [host/acltest]: percentage done: 5.26% (1/19 groups) >>> [host/logtest]: percentage done: 5.26% (1/19 groups) >>> [host/onemeg]: percentage done: 5.26% (1/19 groups) >>> [host/fourmeg]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in=20 >>> group #1) >>> [host/symlink]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in=20 >>> group #1) >>> [host/symlink]: percentage done: 5.26% (1/19 groups) >>> [host/fourmeg]: percentage done: 5.26% (1/19 groups) >>> [host/format-v2-test]: percentage done: 1.75% (0/19 groups, 1/3=20 >>> snapshots in group #1) >>> [host/format-v2-test]: percentage done: 3.51% (0/19 groups, 2/3=20 >>> snapshots in group #1) >>> [host/format-v2-test]: percentage done: 5.26% (1/19 groups) >>> [host/incrementaltest2]: percentage done: 2.63% (0/19 groups, 1/2=20 >>> snapshots in group #1) >>> [host/incrementaltest2]: percentage done: 5.26% (1/19 groups) >>=20 >> There will always be cases where it does not work I guess, not sure what= =20 >> I can do to make you happy here. >=20 > Oh, sorry, might have been to quick on this one. You are saying this=20 > happens for a non-parallel sync? no, this was a parallel sync, but the progress should be correct for parallel syncs as well, shouldn't it? here's the same log but less truncated: [..] 2026-04-20T13:41:31+02:00: ---- 2026-04-20T13:41:31+02:00: Syncing datastore 'test', namespace 'test' into = datastore 'tank', namespace 'test' 2026-04-20T13:41:31+02:00: Found 19 groups to sync (out of 19 total) 2026-04-20T13:41:31+02:00: [ct/106]: sync group ct/106 failed - owner check= failed (test@pbs !=3D root@pam) 2026-04-20T13:41:31+02:00: [ct/999]: sync group ct/999 failed - owner check= failed (test@pbs !=3D root@pam) 2026-04-20T13:41:31+02:00: skipping snapshot host/test-another-mail/2024-03= -14T08:44:25Z - in-progress backup 2026-04-20T13:41:31+02:00: [host/exclusion-test]: host/exclusion-test: skip= ped: 1 snapshot(s) (2024-04-23T06:53:44Z) - older than the newest snapshot = present on sync target 2026-04-20T13:41:31+02:00: [host/exclusion-test]: re-sync snapshot 2024-04-= 23T06:54:30Z 2026-04-20T13:41:31+02:00: [host/exclusion-test]: 2024-04-23T06:54:30Z: no = data changes 2026-04-20T13:41:31+02:00: [host/exclusion-test]: percentage done: 5.26% (1= /19 groups) 2026-04-20T13:41:31+02:00: [host/acltest]: re-sync snapshot 2024-10-07T11:1= 7:58Z 2026-04-20T13:41:31+02:00: [host/acltest]: 2024-10-07T11:17:58Z: no data ch= anges 2026-04-20T13:41:31+02:00: [host/acltest]: percentage done: 5.26% (1/19 gro= ups) 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: start sync 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.= didx: sync archive 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.= didx: downloaded 13.169 KiB (168.332 KiB/s) 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pc= at1.didx: sync archive 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pc= at1.didx: downloaded 2.157 KiB (10.2 KiB/s) 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: rsa-encryp= ted.key.blob: sync archive 2026-04-20T13:41:33+02:00: [host/logtest]: 2024-01-30T10:52:34Z: sync done 2026-04-20T13:41:33+02:00: [host/logtest]: percentage done: 5.26% (1/19 gro= ups) 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: start sync 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fi= dx: sync archive 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fi= dx: downloaded 2.012 KiB (5.704 KiB/s) 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: sync done 2026-04-20T13:41:33+02:00: [host/onemeg]: percentage done: 5.26% (1/19 grou= ps) 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: start sync 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.f= idx: sync archive 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.f= idx: downloaded 2.012 KiB (5.885 KiB/s) 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: sync done 2026-04-20T13:41:33+02:00: [host/fourmeg]: percentage done: 2.63% (0/19 gro= ups, 1/2 snapshots in group #1) 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: start sync 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: start sync 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.px= ar.didx: sync archive 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.px= ar.didx: downloaded 145 B (802.587 B/s) 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pc= at1.didx: sync archive 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pc= at1.didx: downloaded 57 B (279.566 B/s) 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: sync done 2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: s= tart sync 2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: t= est.pxar.didx: sync archive 2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: sta= rt sync 2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: lin= ux.ppxar.didx: sync archive 2026-04-20T13:41:37+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.f= idx: sync archive 2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: star= t sync 2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: linu= x.pxar.didx: sync archive 2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: start sync 2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: test.pxar.= didx: sync archive 2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: start syn= c 2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: test.pxar= .didx: sync archive 2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: sta= rt sync 2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: lin= ux.ppxar.didx: sync archive 2026-04-20T13:41:37+02:00: [host/foobar]: re-sync snapshot 2024-06-10T07:48= :51Z due to corruption 2026-04-20T13:41:37+02:00: [host/foobar]: 2024-06-10T07:48:51Z: foobar.ppxa= r.didx: sync archive 2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 2.63% (0/19 gro= ups, 1/2 snapshots in group #1) 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: start sync 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.d= idx: sync archive 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.d= idx: downloaded 0 B (0 B/s) 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pc= at1.didx: sync archive 2026-04-20T13:41:37+02:00: [host/bookworm]: re-sync snapshot 2023-01-18T11:= 10:34Z due to corruption 2026-04-20T13:41:37+02:00: [host/bookworm]: 2023-01-18T11:10:34Z: repo.pxar= .didx: sync archive 2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: start sync 2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: test.img.fidx= : sync archive 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pc= at1.didx: downloaded 53 B (216.705 B/s) 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: sync done 2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 5.26% (1/19 gro= ups) 2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.f= idx: downloaded 4 MiB (51.191 KiB/s) 2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: sync done 2026-04-20T13:42:55+02:00: [host/fourmeg]: percentage done: 5.26% (1/19 gro= ups) those percentage lines are just complete bogus?? and there are at least two "breaks" of over a second where all buffered log lines must have been flushed, so it's also not that the lines with lower perecentage were re-ordered somehow.. it goes on and on and the percentage always stays at at most 5.26% since the finished group count never goes higher than 1.. though I have to admit I stopped it before it finished, so maybe it would have later gone higher? I can re-run it to completion and check what it says before it's done?