From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 464B31FF142 for ; Tue, 21 Apr 2026 10:05:37 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 20F8A17672; Tue, 21 Apr 2026 10:05:37 +0200 (CEST) Message-ID: <76777076-e4c3-4d14-9430-2ffe2cf73b37@proxmox.com> Date: Tue, 21 Apr 2026 10:04:57 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context To: =?UTF-8?Q?Fabian_Gr=C3=BCnbichler?= , pbs-devel@lists.proxmox.com References: <20260417092621.455374-1-c.ebner@proxmox.com> <20260417092621.455374-12-c.ebner@proxmox.com> <1776684473.p5hg4a1vym.astroid@yuna.none> <295ee353-2731-474b-aad0-d1f8a2bcb443@proxmox.com> <7f8852e8-c092-4e2d-a031-192c087fb364@proxmox.com> <1776757625.a5d7yahnh6.astroid@yuna.none> Content-Language: en-US, de-DE From: Christian Ebner In-Reply-To: <1776757625.a5d7yahnh6.astroid@yuna.none> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1776758613631 X-SPAM-LEVEL: Spam detection results: 0 AWL -0.929 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_MAILER 2 Automated Mailer Tag Left in Email SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: CAHCNZNN2ML6UE7S2KHPYFTHPRXDJCQ5 X-Message-ID-Hash: CAHCNZNN2ML6UE7S2KHPYFTHPRXDJCQ5 X-MailFrom: c.ebner@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On 4/21/26 9:59 AM, Fabian Grünbichler wrote: > On April 21, 2026 9:42 am, Christian Ebner wrote: >> On 4/21/26 9:21 AM, Christian Ebner wrote: >>> On 4/20/26 1:54 PM, Fabian Grünbichler wrote: >>>> On April 17, 2026 11:26 am, Christian Ebner wrote: >>>>> Pulling groups and therefore also snapshots in parallel leads to >>>>> unordered log outputs, making it mostly impossible to relate a log >>>>> message to a backup snapshot/group. >>>>> >>>>> Therefore, prefix pull job log messages by the corresponding group or >>>>> snapshot and set the error context accordingly. >>>>> >>>>> Also, reword some messages, inline variables in format strings and >>>>> start log lines with capital letters to get consistent output. >>>>> >>>>> By using the buffered logger implementation and buffer up to 5 lines >>>>> with a timeout of 1 second, subsequent log lines arriving in fast >>>>> succession are kept together, reducing the mixing of lines. >>>>> >>>>> Example output for a sequential pull job: >>>>> ``` >>>>> ... >>>>> [ct/100]: 2025-11-17T10:11:42Z: start sync >>>>> [ct/100]: 2025-11-17T10:11:42Z: pct.conf.blob: sync archive >>>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: sync archive >>>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: downloaded 16.785 >>>>> MiB (280.791 MiB/s) >>>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: sync archive >>>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: downloaded 65.703 >>>>> KiB (29.1 MiB/s) >>>>> [ct/100]: 2025-11-17T10:11:42Z: sync done >>>>> [ct/100]: percentage done: 9.09% (1/11 groups) >>>>> [ct/101]: 2026-03-31T12:20:16Z: start sync >>>>> [ct/101]: 2026-03-31T12:20:16Z: pct.conf.blob: sync archive >>>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: sync archive >>>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: downloaded 199.806 >>>>> MiB (311.91 MiB/s) >>>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: sync archive >>>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: downloaded >>>>> 180.379 KiB (22.748 MiB/s) >>>>> [ct/101]: 2026-03-31T12:20:16Z: sync done >>>> >>>> this is probably the wrong patch for this comment, but since you have >>>> the sample output here ;) >>>> >>>> should we pad the labels? the buffered logger knows all currently active >>>> labels and could adapt it? otherwise especially for host backups the >>>> logs are again not really scannable by humans, because there will be >>>> weird jumps in alignment.. or (.. continued below) >>> >>> That will not work though? While the logger knows about the labels when >>> receiving them, it does not know at all about future ones that might >>> still come in. So it can happen that the padding does not fit anymore, >>> and one gets even uglier formatting issues. >>> >>> I would prefer to keep them as is for the time being. For vm/ct it is >>> not that bad, one could maybe forsee the ID to be a 4 digit number and >>> define a minimum label lenght to be padded if not reached. > > ack, let's leave this as it is for now, can always be bikeshedded if > needed later on! > >>> >>> Host backups or backups with explicit label >>> > > cut off? > >>> >>>>> ... >>>>> ``` >>>>> >>>>> Example output for a parallel pull job: >>>>> ``` >>>>> ... >>>>> [ct/107]: 2025-07-16T09:14:01Z: start sync >>>>> [ct/107]: 2025-07-16T09:14:01Z: pct.conf.blob: sync archive >>>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: sync archive >>>>> [vm/108]: 2025-09-19T07:37:19Z: start sync >>>>> [vm/108]: 2025-09-19T07:37:19Z: qemu-server.conf.blob: sync archive >>>>> [vm/108]: 2025-09-19T07:37:19Z: drive-scsi0.img.fidx: sync archive >>>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: downloaded 609.233 >>>>> MiB (112.628 MiB/s) >>>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: sync archive >>>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: downloaded 1.172 MiB >>>>> (17.838 MiB/s) >>>>> [ct/107]: 2025-07-16T09:14:01Z: sync done >>>>> [ct/107]: percentage done: 72.73% (8/11 groups) >>>> >>>> the way the prefix and snapshot are formatted could also be interpreted >>>> at first glance as a timestamp of the log line.. why not just prepend >>>> the prefix on the logger side, and leave it up to the caller to do the >>>> formatting? then we could us "{type}/{id}/" as prefix here? or add >>>> 'snapshot' in those lines to make it clear? granted, this is more of an >>>> issue when viewing the log via `proxmox-backup-manager`, as in the UI we >>>> have the log timestamps up front.. >>> >>> I simply followed along the line of your suggestion here. In prior >>> versions I had exactly that but you rejected it as to verbose? :) >>> >>> https://lore.proxmox.com/pbs-devel/1774263381.bngcrer2th.astroid@yuna.none/ >>> > > my suggestions (for just the ct/107 lines above) would have been: > > ct/107: 2025-07-16T09:14:01Z: start sync > ct/107: 2025-07-16T09:14:01Z pct.conf.blob: sync archive > ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: sync archive > ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: downloaded 609.233 MiB (112.628 MiB/s) > ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: sync archive > ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: downloaded 1.172 MiB (17.838 MiB/s) > ct/107: 2025-07-16T09:14:01Z: sync done > ct/107: percentage done: 72.73% (8/11 groups) > > though it might make sense to group that even more visually > > ct/107: 2025-07-16T09:14:01Z: start sync > ct/107: 2025-07-16T09:14:01Z/pct.conf.blob: sync archive > ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: sync archive > ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: downloaded 609.233 MiB (112.628 MiB/s) > ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: sync archive > ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: downloaded 1.172 MiB (17.838 MiB/s) > ct/107: 2025-07-16T09:14:01Z: sync done > ct/107: percentage done: 72.73% (8/11 groups) > > or > > [ct/107 2025-07-16T09:14:01Z] start sync > [ct/107 2025-07-16T09:14:01Z pct.conf.blob] sync archive > [ct/107 2025-07-16T09:14:01Z root.ppxar.didx] sync archive > [ct/107 2025-07-16T09:14:01Z root.ppxar.didx] downloaded 609.233 MiB (112.628 MiB/s) > [ct/107 2025-07-16T09:14:01Z root.mpxar.didx] sync archive > [ct/107 2025-07-16T09:14:01Z root.mpxar.didx] downloaded 1.172 MiB (17.838 MiB/s) > [ct/107 2025-07-16T09:14:01Z] sync done > [ct/107] percentage done: 72.73% (8/11 groups) > > but that is probably a matter of taste, so let's postpone this aspect > for now. > >>>> >>>> and maybe(?) log the progress line using a different prefix? because >>>> right now the information that the group [ct/107] is finished is not >>>> really clear from the output, IMHO. >>> >>> That I can add, yes. > > thanks! > >>> >>>> >>>> the progress logging is also still broken (this is for a sync that takes >>>> a while, this is not log messages being buffered and re-ordered!): >>>> >>>> $ proxmox-backup-manager task log >>>> 'UPID:yuna:00070656:001E420F:00000002:69E610EB:syncjob:local\x3atest\x3atank\x3a\x3as\x2dbc01cba6\x2d805a:root@pam:' | grep -e namespace -e 'percentage done' >>>> Syncing datastore 'test', root namespace into datastore 'tank', root >>>> namespace >>>> Finished syncing root namespace, current progress: 0 groups, 0 snapshots >>>> Syncing datastore 'test', namespace 'test' into datastore 'tank', >>>> namespace 'test' >>>> [host/exclusion-test]: percentage done: 5.26% (1/19 groups) >>>> [host/acltest]: percentage done: 5.26% (1/19 groups) >>>> [host/logtest]: percentage done: 5.26% (1/19 groups) >>>> [host/onemeg]: percentage done: 5.26% (1/19 groups) >>>> [host/fourmeg]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in >>>> group #1) >>>> [host/symlink]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in >>>> group #1) >>>> [host/symlink]: percentage done: 5.26% (1/19 groups) >>>> [host/fourmeg]: percentage done: 5.26% (1/19 groups) >>>> [host/format-v2-test]: percentage done: 1.75% (0/19 groups, 1/3 >>>> snapshots in group #1) >>>> [host/format-v2-test]: percentage done: 3.51% (0/19 groups, 2/3 >>>> snapshots in group #1) >>>> [host/format-v2-test]: percentage done: 5.26% (1/19 groups) >>>> [host/incrementaltest2]: percentage done: 2.63% (0/19 groups, 1/2 >>>> snapshots in group #1) >>>> [host/incrementaltest2]: percentage done: 5.26% (1/19 groups) >>> >>> There will always be cases where it does not work I guess, not sure what >>> I can do to make you happy here. >> >> Oh, sorry, might have been to quick on this one. You are saying this >> happens for a non-parallel sync? > > no, this was a parallel sync, but the progress should be correct for > parallel syncs as well, shouldn't it? > > here's the same log but less truncated: > > [..] > 2026-04-20T13:41:31+02:00: ---- > 2026-04-20T13:41:31+02:00: Syncing datastore 'test', namespace 'test' into datastore 'tank', namespace 'test' > 2026-04-20T13:41:31+02:00: Found 19 groups to sync (out of 19 total) > 2026-04-20T13:41:31+02:00: [ct/106]: sync group ct/106 failed - owner check failed (test@pbs != root@pam) > 2026-04-20T13:41:31+02:00: [ct/999]: sync group ct/999 failed - owner check failed (test@pbs != root@pam) > 2026-04-20T13:41:31+02:00: skipping snapshot host/test-another-mail/2024-03-14T08:44:25Z - in-progress backup > 2026-04-20T13:41:31+02:00: [host/exclusion-test]: host/exclusion-test: skipped: 1 snapshot(s) (2024-04-23T06:53:44Z) - older than the newest snapshot present on sync target > 2026-04-20T13:41:31+02:00: [host/exclusion-test]: re-sync snapshot 2024-04-23T06:54:30Z > 2026-04-20T13:41:31+02:00: [host/exclusion-test]: 2024-04-23T06:54:30Z: no data changes > 2026-04-20T13:41:31+02:00: [host/exclusion-test]: percentage done: 5.26% (1/19 groups) > 2026-04-20T13:41:31+02:00: [host/acltest]: re-sync snapshot 2024-10-07T11:17:58Z > 2026-04-20T13:41:31+02:00: [host/acltest]: 2024-10-07T11:17:58Z: no data changes > 2026-04-20T13:41:31+02:00: [host/acltest]: percentage done: 5.26% (1/19 groups) > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: start sync > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.didx: sync archive > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.didx: downloaded 13.169 KiB (168.332 KiB/s) > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pcat1.didx: sync archive > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pcat1.didx: downloaded 2.157 KiB (10.2 KiB/s) > 2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: rsa-encrypted.key.blob: sync archive > 2026-04-20T13:41:33+02:00: [host/logtest]: 2024-01-30T10:52:34Z: sync done > 2026-04-20T13:41:33+02:00: [host/logtest]: percentage done: 5.26% (1/19 groups) > 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: start sync > 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fidx: sync archive > 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fidx: downloaded 2.012 KiB (5.704 KiB/s) > 2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: sync done > 2026-04-20T13:41:33+02:00: [host/onemeg]: percentage done: 5.26% (1/19 groups) > 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: start sync > 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.fidx: sync archive > 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.fidx: downloaded 2.012 KiB (5.885 KiB/s) > 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: sync done > 2026-04-20T13:41:33+02:00: [host/fourmeg]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in group #1) > 2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: start sync > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: start sync > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.pxar.didx: sync archive > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.pxar.didx: downloaded 145 B (802.587 B/s) > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pcat1.didx: sync archive > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pcat1.didx: downloaded 57 B (279.566 B/s) > 2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: sync done > 2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: start sync > 2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: test.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: start sync > 2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: linux.ppxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.fidx: sync archive > 2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: start sync > 2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: linux.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: start sync > 2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: test.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: start sync > 2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: test.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: start sync > 2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: linux.ppxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/foobar]: re-sync snapshot 2024-06-10T07:48:51Z due to corruption > 2026-04-20T13:41:37+02:00: [host/foobar]: 2024-06-10T07:48:51Z: foobar.ppxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in group #1) > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: start sync > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.didx: downloaded 0 B (0 B/s) > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pcat1.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/bookworm]: re-sync snapshot 2023-01-18T11:10:34Z due to corruption > 2026-04-20T13:41:37+02:00: [host/bookworm]: 2023-01-18T11:10:34Z: repo.pxar.didx: sync archive > 2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: start sync > 2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: test.img.fidx: sync archive > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pcat1.didx: downloaded 53 B (216.705 B/s) > 2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: sync done > 2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 5.26% (1/19 groups) > 2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.fidx: downloaded 4 MiB (51.191 KiB/s) > 2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: sync done > 2026-04-20T13:42:55+02:00: [host/fourmeg]: percentage done: 5.26% (1/19 groups) > > those percentage lines are just complete bogus?? and there are at least > two "breaks" of over a second where all buffered log lines must have > been flushed, so it's also not that the lines with lower perecentage > were re-ordered somehow.. > > it goes on and on and the percentage always stays at at most 5.26% since > the finished group count never goes higher than 1.. though I have to > admit I stopped it before it finished, so maybe it would have later gone > higher? I can re-run it to completion and check what it says before it's > done? That is because the store progress calculates the percent based on the groups and snapshots within that group. But since groups are now being pulled in parallel, the percent is per group... Not sure if and how to fix that.