From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 595DA1FF142
	for <inbox@lore.proxmox.com>; Tue, 21 Apr 2026 10:01:08 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 0AA701754E;
	Tue, 21 Apr 2026 10:01:08 +0200 (CEST)
Date: Tue, 21 Apr 2026 10:00:26 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
Subject: Re: [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages
 and add error context
To: Christian Ebner <c.ebner@proxmox.com>, pbs-devel@lists.proxmox.com
References: <20260417092621.455374-1-c.ebner@proxmox.com>
	<20260417092621.455374-12-c.ebner@proxmox.com>
	<1776684473.p5hg4a1vym.astroid@yuna.none>
	<295ee353-2731-474b-aad0-d1f8a2bcb443@proxmox.com>
	<7f8852e8-c092-4e2d-a031-192c087fb364@proxmox.com>
In-Reply-To: <7f8852e8-c092-4e2d-a031-192c087fb364@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid)
Message-Id: <1776757625.a5d7yahnh6.astroid@yuna.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1776758344267
X-SPAM-LEVEL: Spam detection results:  0
	AWL                    -0.946 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	KAM_MAILER                  2 Automated Mailer Tag Left in Email
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
	URIBL_BLOCKED           0.001 ADMINISTRATOR NOTICE: The query to URIBL was
 blocked.  See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information. [proxmox.com]
Message-ID-Hash: ARLWJRXKT5PJZZUU2MOV3WQGQX7JZKSH
X-Message-ID-Hash: ARLWJRXKT5PJZZUU2MOV3WQGQX7JZKSH
X-MailFrom: f.gruenbichler@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

On April 21, 2026 9:42 am, Christian Ebner wrote:
> On 4/21/26 9:21 AM, Christian Ebner wrote:
>> On 4/20/26 1:54 PM, Fabian Gr=C3=BCnbichler wrote:
>>> On April 17, 2026 11:26 am, Christian Ebner wrote:
>>>> Pulling groups and therefore also snapshots in parallel leads to
>>>> unordered log outputs, making it mostly impossible to relate a log
>>>> message to a backup snapshot/group.
>>>>
>>>> Therefore, prefix pull job log messages by the corresponding group or
>>>> snapshot and set the error context accordingly.
>>>>
>>>> Also, reword some messages, inline variables in format strings and
>>>> start log lines with capital letters to get consistent output.
>>>>
>>>> By using the buffered logger implementation and buffer up to 5 lines
>>>> with a timeout of 1 second, subsequent log lines arriving in fast
>>>> succession are kept together, reducing the mixing of lines.
>>>>
>>>> Example output for a sequential pull job:
>>>> ```
>>>> ...
>>>> [ct/100]: 2025-11-17T10:11:42Z: start sync
>>>> [ct/100]: 2025-11-17T10:11:42Z: pct.conf.blob: sync archive
>>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: sync archive
>>>> [ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: downloaded 16.785=20
>>>> MiB (280.791 MiB/s)
>>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: sync archive
>>>> [ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: downloaded 65.703=20
>>>> KiB (29.1 MiB/s)
>>>> [ct/100]: 2025-11-17T10:11:42Z: sync done
>>>> [ct/100]: percentage done: 9.09% (1/11 groups)
>>>> [ct/101]: 2026-03-31T12:20:16Z: start sync
>>>> [ct/101]: 2026-03-31T12:20:16Z: pct.conf.blob: sync archive
>>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: sync archive
>>>> [ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: downloaded 199.806=20
>>>> MiB (311.91 MiB/s)
>>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: sync archive
>>>> [ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: downloaded=20
>>>> 180.379 KiB (22.748 MiB/s)
>>>> [ct/101]: 2026-03-31T12:20:16Z: sync done
>>>
>>> this is probably the wrong patch for this comment, but since you have
>>> the sample output here ;)
>>>
>>> should we pad the labels? the buffered logger knows all currently activ=
e
>>> labels and could adapt it? otherwise especially for host backups the
>>> logs are again not really scannable by humans, because there will be
>>> weird jumps in alignment.. or (.. continued below)
>>=20
>> That will not work though? While the logger knows about the labels when=20
>> receiving them, it does not know at all about future ones that might=20
>> still come in. So it can happen that the padding does not fit anymore,=20
>> and one gets even uglier formatting issues.
>>=20
>> I would prefer to keep them as is for the time being. For vm/ct it is=20
>> not that bad, one could maybe forsee the ID to be a 4 digit number and=20
>> define a minimum label lenght to be padded if not reached.

ack, let's leave this as it is for now, can always be bikeshedded if
needed later on!

>>=20
>> Host backups or backups with explicit label
>>=20

cut off?

>>=20
>>>> ...
>>>> ```
>>>>
>>>> Example output for a parallel pull job:
>>>> ```
>>>> ...
>>>> [ct/107]: 2025-07-16T09:14:01Z: start sync
>>>> [ct/107]: 2025-07-16T09:14:01Z: pct.conf.blob: sync archive
>>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: sync archive
>>>> [vm/108]: 2025-09-19T07:37:19Z: start sync
>>>> [vm/108]: 2025-09-19T07:37:19Z: qemu-server.conf.blob: sync archive
>>>> [vm/108]: 2025-09-19T07:37:19Z: drive-scsi0.img.fidx: sync archive
>>>> [ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: downloaded 609.233=20
>>>> MiB (112.628 MiB/s)
>>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: sync archive
>>>> [ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: downloaded 1.172 MiB=20
>>>> (17.838 MiB/s)
>>>> [ct/107]: 2025-07-16T09:14:01Z: sync done
>>>> [ct/107]: percentage done: 72.73% (8/11 groups)
>>>
>>> the way the prefix and snapshot are formatted could also be interpreted
>>> at first glance as a timestamp of the log line.. why not just prepend
>>> the prefix on the logger side, and leave it up to the caller to do the
>>> formatting? then we could us "{type}/{id}/" as prefix here? or add
>>> 'snapshot' in those lines to make it clear? granted, this is more of an
>>> issue when viewing the log via `proxmox-backup-manager`, as in the UI w=
e
>>> have the log timestamps up front..
>>=20
>> I simply followed along the line of your suggestion here. In prior=20
>> versions I had exactly that but you rejected it as to verbose? :)
>>=20
>> https://lore.proxmox.com/pbs-devel/1774263381.bngcrer2th.astroid@yuna.no=
ne/
>>=20

my suggestions (for just the ct/107 lines above) would have been:

ct/107: 2025-07-16T09:14:01Z: start sync
ct/107: 2025-07-16T09:14:01Z pct.conf.blob: sync archive
ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: sync archive
ct/107: 2025-07-16T09:14:01Z root.ppxar.didx: downloaded 609.233  MiB (112.=
628 MiB/s)
ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: sync archive
ct/107: 2025-07-16T09:14:01Z root.mpxar.didx: downloaded 1.172 MiB  (17.838=
 MiB/s)
ct/107: 2025-07-16T09:14:01Z: sync done
ct/107: percentage done: 72.73% (8/11 groups)

though it might make sense to group that even more visually

ct/107: 2025-07-16T09:14:01Z: start sync
ct/107: 2025-07-16T09:14:01Z/pct.conf.blob: sync archive
ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: sync archive
ct/107: 2025-07-16T09:14:01Z/root.ppxar.didx: downloaded 609.233  MiB (112.=
628 MiB/s)
ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: sync archive
ct/107: 2025-07-16T09:14:01Z/root.mpxar.didx: downloaded 1.172 MiB  (17.838=
 MiB/s)
ct/107: 2025-07-16T09:14:01Z: sync done
ct/107: percentage done: 72.73% (8/11 groups)

or

[ct/107 2025-07-16T09:14:01Z] start sync
[ct/107 2025-07-16T09:14:01Z pct.conf.blob] sync archive
[ct/107 2025-07-16T09:14:01Z root.ppxar.didx] sync archive
[ct/107 2025-07-16T09:14:01Z root.ppxar.didx] downloaded 609.233  MiB (112.=
628 MiB/s)
[ct/107 2025-07-16T09:14:01Z root.mpxar.didx] sync archive
[ct/107 2025-07-16T09:14:01Z root.mpxar.didx] downloaded 1.172 MiB  (17.838=
 MiB/s)
[ct/107 2025-07-16T09:14:01Z] sync done
[ct/107] percentage done: 72.73% (8/11 groups)

but that is probably a matter of taste, so let's postpone this aspect
for now.

>>>
>>> and maybe(?) log the progress line using a different prefix? because
>>> right now the information that the group [ct/107] is finished is not
>>> really clear from the output, IMHO.
>>=20
>> That I can add, yes.

thanks!

>>=20
>>>
>>> the progress logging is also still broken (this is for a sync that take=
s
>>> a while, this is not log messages being buffered and re-ordered!):
>>>
>>> $ proxmox-backup-manager task log=20
>>> 'UPID:yuna:00070656:001E420F:00000002:69E610EB:syncjob:local\x3atest\x3=
atank\x3a\x3as\x2dbc01cba6\x2d805a:root@pam:' | grep -e namespace -e 'perce=
ntage done'
>>> Syncing datastore 'test', root namespace into datastore 'tank', root=20
>>> namespace
>>> Finished syncing root namespace, current progress: 0 groups, 0 snapshot=
s
>>> Syncing datastore 'test', namespace 'test' into datastore 'tank',=20
>>> namespace 'test'
>>> [host/exclusion-test]: percentage done: 5.26% (1/19 groups)
>>> [host/acltest]: percentage done: 5.26% (1/19 groups)
>>> [host/logtest]: percentage done: 5.26% (1/19 groups)
>>> [host/onemeg]: percentage done: 5.26% (1/19 groups)
>>> [host/fourmeg]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in=20
>>> group #1)
>>> [host/symlink]: percentage done: 2.63% (0/19 groups, 1/2 snapshots in=20
>>> group #1)
>>> [host/symlink]: percentage done: 5.26% (1/19 groups)
>>> [host/fourmeg]: percentage done: 5.26% (1/19 groups)
>>> [host/format-v2-test]: percentage done: 1.75% (0/19 groups, 1/3=20
>>> snapshots in group #1)
>>> [host/format-v2-test]: percentage done: 3.51% (0/19 groups, 2/3=20
>>> snapshots in group #1)
>>> [host/format-v2-test]: percentage done: 5.26% (1/19 groups)
>>> [host/incrementaltest2]: percentage done: 2.63% (0/19 groups, 1/2=20
>>> snapshots in group #1)
>>> [host/incrementaltest2]: percentage done: 5.26% (1/19 groups)
>>=20
>> There will always be cases where it does not work I guess, not sure what=
=20
>> I can do to make you happy here.
>=20
> Oh, sorry, might have been to quick on this one. You are saying this=20
> happens for a non-parallel sync?

no, this was a parallel sync, but the progress should be correct for
parallel syncs as well, shouldn't it?

here's the same log but less truncated:

[..]
2026-04-20T13:41:31+02:00: ----
2026-04-20T13:41:31+02:00: Syncing datastore 'test', namespace 'test' into =
datastore 'tank', namespace 'test'
2026-04-20T13:41:31+02:00: Found 19 groups to sync (out of 19 total)
2026-04-20T13:41:31+02:00: [ct/106]: sync group ct/106 failed - owner check=
 failed (test@pbs !=3D root@pam)
2026-04-20T13:41:31+02:00: [ct/999]: sync group ct/999 failed - owner check=
 failed (test@pbs !=3D root@pam)
2026-04-20T13:41:31+02:00: skipping snapshot host/test-another-mail/2024-03=
-14T08:44:25Z - in-progress backup
2026-04-20T13:41:31+02:00: [host/exclusion-test]: host/exclusion-test: skip=
ped: 1 snapshot(s) (2024-04-23T06:53:44Z) - older than the newest snapshot =
present on sync target
2026-04-20T13:41:31+02:00: [host/exclusion-test]: re-sync snapshot 2024-04-=
23T06:54:30Z
2026-04-20T13:41:31+02:00: [host/exclusion-test]: 2024-04-23T06:54:30Z: no =
data changes
2026-04-20T13:41:31+02:00: [host/exclusion-test]: percentage done: 5.26% (1=
/19 groups)
2026-04-20T13:41:31+02:00: [host/acltest]: re-sync snapshot 2024-10-07T11:1=
7:58Z
2026-04-20T13:41:31+02:00: [host/acltest]: 2024-10-07T11:17:58Z: no data ch=
anges
2026-04-20T13:41:31+02:00: [host/acltest]: percentage done: 5.26% (1/19 gro=
ups)
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: start sync
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.=
didx: sync archive
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: test.pxar.=
didx: downloaded 13.169 KiB (168.332 KiB/s)
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pc=
at1.didx: sync archive
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: catalog.pc=
at1.didx: downloaded 2.157 KiB (10.2 KiB/s)
2026-04-20T13:41:32+02:00: [host/logtest]: 2024-01-30T10:52:34Z: rsa-encryp=
ted.key.blob: sync archive
2026-04-20T13:41:33+02:00: [host/logtest]: 2024-01-30T10:52:34Z: sync done
2026-04-20T13:41:33+02:00: [host/logtest]: percentage done: 5.26% (1/19 gro=
ups)
2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: start sync
2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fi=
dx: sync archive
2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: test.img.fi=
dx: downloaded 2.012 KiB (5.704 KiB/s)
2026-04-20T13:41:33+02:00: [host/onemeg]: 2023-06-28T11:13:51Z: sync done
2026-04-20T13:41:33+02:00: [host/onemeg]: percentage done: 5.26% (1/19 grou=
ps)
2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: start sync
2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.f=
idx: sync archive
2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: test.img.f=
idx: downloaded 2.012 KiB (5.885 KiB/s)
2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T11:14:35Z: sync done
2026-04-20T13:41:33+02:00: [host/fourmeg]: percentage done: 2.63% (0/19 gro=
ups, 1/2 snapshots in group #1)
2026-04-20T13:41:33+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: start sync
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: start sync
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.px=
ar.didx: sync archive
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: symlink.px=
ar.didx: downloaded 145 B (802.587 B/s)
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pc=
at1.didx: sync archive
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: catalog.pc=
at1.didx: downloaded 57 B (279.566 B/s)
2026-04-20T13:41:34+02:00: [host/symlink]: 2023-06-07T06:48:26Z: sync done
2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: s=
tart sync
2026-04-20T13:41:37+02:00: [host/incrementaltest2]: 2023-09-25T13:43:25Z: t=
est.pxar.didx: sync archive
2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: sta=
rt sync
2026-04-20T13:41:37+02:00: [host/format-v3-test]: 2024-06-03T10:34:11Z: lin=
ux.ppxar.didx: sync archive
2026-04-20T13:41:37+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.f=
idx: sync archive
2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: star=
t sync
2026-04-20T13:41:37+02:00: [host/linuxtesttest]: 2024-07-17T13:38:31Z: linu=
x.pxar.didx: sync archive
2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: start sync
2026-04-20T13:41:37+02:00: [host/inctest]: 2023-11-13T13:22:57Z: test.pxar.=
didx: sync archive
2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: start syn=
c
2026-04-20T13:41:37+02:00: [host/inctest2]: 2023-11-13T13:53:27Z: test.pxar=
.didx: sync archive
2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: sta=
rt sync
2026-04-20T13:41:37+02:00: [host/format-v2-test]: 2024-06-07T12:08:37Z: lin=
ux.ppxar.didx: sync archive
2026-04-20T13:41:37+02:00: [host/foobar]: re-sync snapshot 2024-06-10T07:48=
:51Z due to corruption
2026-04-20T13:41:37+02:00: [host/foobar]: 2024-06-10T07:48:51Z: foobar.ppxa=
r.didx: sync archive
2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 2.63% (0/19 gro=
ups, 1/2 snapshots in group #1)
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: start sync
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.d=
idx: sync archive
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: foo.pxar.d=
idx: downloaded 0 B (0 B/s)
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pc=
at1.didx: sync archive
2026-04-20T13:41:37+02:00: [host/bookworm]: re-sync snapshot 2023-01-18T11:=
10:34Z due to corruption
2026-04-20T13:41:37+02:00: [host/bookworm]: 2023-01-18T11:10:34Z: repo.pxar=
.didx: sync archive
2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: start sync
2026-04-20T13:41:37+02:00: [host/test]: 2024-07-10T08:18:51Z: test.img.fidx=
: sync archive
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: catalog.pc=
at1.didx: downloaded 53 B (216.705 B/s)
2026-04-20T13:41:37+02:00: [host/symlink]: 2023-06-07T06:49:43Z: sync done
2026-04-20T13:41:37+02:00: [host/symlink]: percentage done: 5.26% (1/19 gro=
ups)
2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: test.img.f=
idx: downloaded 4 MiB (51.191 KiB/s)
2026-04-20T13:42:55+02:00: [host/fourmeg]: 2023-06-28T12:37:16Z: sync done
2026-04-20T13:42:55+02:00: [host/fourmeg]: percentage done: 5.26% (1/19 gro=
ups)

those percentage lines are just complete bogus?? and there are at least
two "breaks" of over a second where all buffered log lines must have
been flushed, so it's also not that the lines with lower perecentage
were re-ordered somehow..

it goes on and on and the percentage always stays at at most 5.26% since
the finished group count never goes higher than 1.. though I have to
admit I stopped it before it finished, so maybe it would have later gone
higher? I can re-run it to completion and check what it says before it's
done?