From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id D6E3A1FF136 for ; Mon, 23 Mar 2026 13:37:15 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2CE52169E1; Mon, 23 Mar 2026 13:37:35 +0100 (CET) Date: Mon, 23 Mar 2026 13:37:25 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH proxmox{,-backup} v5 00/12] fix #4182: concurrent group pull/push support for sync jobs To: Christian Ebner , pbs-devel@lists.proxmox.com References: <20260309162050.1047341-1-c.ebner@proxmox.com> In-Reply-To: <20260309162050.1047341-1-c.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1774263381.bngcrer2th.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1774269402924 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.055 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: YCIMSM4LHEGVCD62H2VONERR3NEGMTBI X-Message-ID-Hash: YCIMSM4LHEGVCD62H2VONERR3NEGMTBI X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On March 9, 2026 5:20 pm, Christian Ebner wrote: > Syncing contents from/to a remote source via a sync job suffers from > low throughput on high latency networks because of limitations by the > HTTP/2 connection, as described in [0]. To improve, syncing multiple > groups in parallel by establishing multiple reader instances has been > suggested. >=20 > This patch series implements the functionality by adding the sync job > configuration property `worker-threads`, allowing to define the > number of groups pull/push tokio tasks to be executed in parallel on > the runtime during each job. >=20 > Examplary configuration: > ``` > sync: s-8764c440-3a6c > ... > store datastore > sync-direction push > worker-threads 4 > ``` >=20 > Since log messages are now also written concurrently, prefix logs > related to groups, snapshots and archives with their respective > context prefix and add context to error messages. >=20 > Further, improve logging especially for sync jobs in push direction, > which only displayed limited information so far. I think we need to re-think this part here, because right now I get log output like this: 2026-03-23T11:54:43+01:00: Starting datastore sync job 'local:test:tank:foo= bar:s-bc01cba6-805a' 2026-03-23T11:54:43+01:00: sync datastore 'tank' from 'local/test' 2026-03-23T11:54:43+01:00: ---- 2026-03-23T11:54:43+01:00: Syncing datastore 'test', root namespace into da= tastore 'tank', namespace 'foobar' 2026-03-23T11:54:43+01:00: Found 0 groups to sync (out of 0 total) 2026-03-23T11:54:43+01:00: Finished syncing root namespace, current progres= s: 0 groups, 0 snapshots 2026-03-23T11:54:43+01:00: ---- 2026-03-23T11:54:43+01:00: Syncing datastore 'test', namespace 'test' into = datastore 'tank', namespace 'foobar/test' 2026-03-23T11:54:43+01:00: Found 19 groups to sync (out of 19 total) 2026-03-23T11:54:43+01:00: Group host/fourmeg: skipped: 1 snapshot(s) (2023= -06-28T11:14:35Z) - older than the newest snapshot present on sync target 2026-03-23T11:54:43+01:00: re-sync snapshot host/onemeg/2023-06-28T11:13:51= Z 2026-03-23T11:54:43+01:00: Group host/format-v2-test: skipped: 2 snapshot(s= ) (2024-06-07T12:08:37Z .. 2024-06-07T12:08:47Z) - older than the newest sn= apshot present on sync target 2026-03-23T11:54:43+01:00: Group host/symlink: skipped: 1 snapshot(s) (2023= -06-07T06:48:26Z) - older than the newest snapshot present on sync target 2026-03-23T11:54:43+01:00: Snapshot host/onemeg/2023-06-28T11:13:51Z: no da= ta changes 2026-03-23T11:54:43+01:00: Group host/onemeg: percentage done: 10.53% (2/19= groups) 2026-03-23T11:54:43+01:00: Group host/foobar: skipped: 1 snapshot(s) (2024-= 06-10T07:48:51Z) - older than the newest snapshot present on sync target 2026-03-23T11:54:43+01:00: Group host/incrementaltest2: skipped: 1 snapshot= (s) (2023-09-25T13:43:25Z) - older than the newest snapshot present on sync= target 2026-03-23T11:54:43+01:00: Group host/exclusion-test: skipped: 1 snapshot(s= ) (2024-04-23T06:53:44Z) - older than the newest snapshot present on sync t= arget 2026-03-23T11:54:43+01:00: re-sync snapshot host/fourmeg/2023-06-28T12:37:1= 6Z 2026-03-23T11:54:43+01:00: Group host/inctest: skipped: 2 snapshot(s) (2023= -11-13T13:22:57Z .. 2023-11-13T13:24:17Z) - older than the newest snapshot = present on sync target 2026-03-23T11:54:43+01:00: re-sync snapshot host/format-v2-test/2024-06-10T= 11:12:56Z 2026-03-23T11:54:43+01:00: re-sync snapshot host/symlink/2023-06-07T06:49:4= 3Z 2026-03-23T11:54:43+01:00: skipping snapshot host/test-another-mail/2024-03= -14T08:44:25Z - in-progress backup 2026-03-23T11:54:43+01:00: re-sync snapshot host/logtest/2024-01-30T10:52:3= 4Z 2026-03-23T11:54:43+01:00: Group host/format-v3-test: skipped: 3 snapshot(s= ) (2024-06-03T10:34:11Z .. 2024-06-04T11:27:03Z) - older than the newest sn= apshot present on sync target 2026-03-23T11:54:43+01:00: re-sync snapshot host/foobar/2024-06-10T07:48:57= Z 2026-03-23T11:54:43+01:00: Snapshot host/format-v2-test/2024-06-10T11:12:56= Z: no data changes 2026-03-23T11:54:43+01:00: Snapshot host/fourmeg/2023-06-28T12:37:16Z: no d= ata changes 2026-03-23T11:54:43+01:00: re-sync snapshot host/inctest2/2023-11-13T13:53:= 27Z 2026-03-23T11:54:43+01:00: Group host/format-v2-test: percentage done: 21.0= 5% (4/19 groups) 2026-03-23T11:54:43+01:00: Group host/fourmeg: percentage done: 21.05% (4/1= 9 groups) 2026-03-23T11:54:43+01:00: re-sync snapshot host/exclusion-test/2024-04-23T= 06:54:30Z 2026-03-23T11:54:43+01:00: Snapshot host/symlink/2023-06-07T06:49:43Z: no d= ata changes 2026-03-23T11:54:43+01:00: Snapshot host/logtest/2024-01-30T10:52:34Z: no d= ata changes 2026-03-23T11:54:43+01:00: Group host/test: skipped: 10 snapshot(s) (2024-0= 7-10T08:18:51Z .. 2025-10-02T12:58:20Z) - older than the newest snapshot pr= esent on sync target 2026-03-23T11:54:43+01:00: Group host/symlink: percentage done: 21.05% (4/1= 9 groups) 2026-03-23T11:54:43+01:00: re-sync snapshot host/bookworm/2023-01-18T11:10:= 34Z 2026-03-23T11:54:43+01:00: Group host/logtest: percentage done: 21.05% (4/1= 9 groups) 2026-03-23T11:54:43+01:00: Snapshot host/foobar/2024-06-10T07:48:57Z: no da= ta changes 2026-03-23T11:54:43+01:00: Group host/foobar: percentage done: 21.05% (4/19= groups) 2026-03-23T11:54:43+01:00: re-sync snapshot host/incrementaltest2/2023-09-2= 5T13:43:35Z 2026-03-23T11:54:43+01:00: Snapshot host/inctest2/2023-11-13T13:53:27Z: no = data changes 2026-03-23T11:54:43+01:00: re-sync snapshot ct/999/2023-03-15T08:00:13Z 2026-03-23T11:54:43+01:00: Group host/inctest2: percentage done: 21.05% (4/= 19 groups) 2026-03-23T11:54:43+01:00: re-sync snapshot host/inctest/2023-11-13T13:24:2= 6Z 2026-03-23T11:54:43+01:00: Snapshot host/exclusion-test/2024-04-23T06:54:30= Z: no data changes 2026-03-23T11:54:43+01:00: Snapshot host/bookworm/2023-01-18T11:10:34Z: no = data changes 2026-03-23T11:54:43+01:00: re-sync snapshot host/linuxtesttest/2024-07-17T1= 3:38:31Z 2026-03-23T11:54:43+01:00: Group host/exclusion-test: percentage done: 21.0= 5% (4/19 groups) 2026-03-23T11:54:43+01:00: Group host/bookworm: percentage done: 21.05% (4/= 19 groups) 2026-03-23T11:54:43+01:00: Snapshot ct/999/2023-03-15T08:00:13Z: no data ch= anges 2026-03-23T11:54:43+01:00: Group ct/999: percentage done: 21.05% (4/19 grou= ps) 2026-03-23T11:54:43+01:00: Snapshot host/incrementaltest2/2023-09-25T13:43:= 35Z: no data changes 2026-03-23T11:54:43+01:00: Group host/incrementaltest2: percentage done: 21= .05% (4/19 groups) 2026-03-23T11:54:43+01:00: Snapshot host/linuxtesttest/2024-07-17T13:38:31Z= : no data changes 2026-03-23T11:54:43+01:00: Group host/linuxtesttest: percentage done: 21.05= % (4/19 groups) 2026-03-23T11:54:43+01:00: Snapshot host/inctest/2023-11-13T13:24:26Z: no d= ata changes 2026-03-23T11:54:43+01:00: Group host/inctest: percentage done: 21.05% (4/1= 9 groups) 2026-03-23T11:54:43+01:00: re-sync snapshot host/format-v3-test/2024-06-04T= 09:13:52Z due to corruption 2026-03-23T11:54:43+01:00: re-sync snapshot host/acltest/2024-10-07T11:17:5= 8Z 2026-03-23T11:54:43+01:00: Snapshot host/format-v3-test/2024-06-04T09:13:52= Z: sync archive linux.ppxar.didx 2026-03-23T11:54:43+01:00: Snapshot host/acltest/2024-10-07T11:17:58Z: no d= ata changes 2026-03-23T11:54:43+01:00: Group host/acltest: percentage done: 21.05% (4/1= 9 groups) 2026-03-23T11:54:43+01:00: Group host/format-v3-test: percentage done: 18.4= 2% (3/19 groups, 1/2 snapshots in group #4) 2026-03-23T11:54:43+01:00: sync group host/format-v3-test failed - Index an= d chunk CryptMode don't match. 2026-03-23T11:54:43+01:00: Finished syncing namespace test, current progres= s: 18 groups, 0 snapshots 2026-03-23T11:54:43+01:00: TASK ERROR: sync failed with some errors. for a no-change sync, which is completely broken?=20 but even for longer-running transfers, it might be benefitial to buffer log lines and group those together by group which arrive in short succession? and also maybe try a different prefix? it's very noisy atm, something like {group}: [{snapshot timestamp}: [{archive name}:]] might work better flow-wise, since it allows scanning for the group right at the start of the line.. I also think a lot of the error paths are lacking the prefix and would need to be adapted as well. >=20 > [0] https://bugzilla.proxmox.com/show_bug.cgi?id=3D4182 >=20 > Change since version 4 (thanks @Max for review): > - Use dedicated tokio tasks to run in parallel on different runtime threa= ds, > not just multiple concurrent futures on the same thread. > - Rework store progress accounting logic to avoid mutex locks when possib= le, > use atomic counters instead. > - Expose setting also in the sync job edit window, not just the config. >=20 >=20 > proxmox: >=20 > Christian Ebner (1): > pbs api types: add `worker-threads` to sync job config >=20 > pbs-api-types/src/jobs.rs | 11 +++++++++++ > 1 file changed, 11 insertions(+) >=20 >=20 > proxmox-backup: >=20 > Christian Ebner (11): > client: backup writer: fix upload stats size and rate for push sync > api: config/sync: add optional `worker-threads` property > sync: pull: revert avoiding reinstantiation for encountered chunks map > sync: pull: factor out backup group locking and owner check > sync: pull: prepare pull parameters to be shared across parallel tasks > fix #4182: server: sync: allow pulling backup groups in parallel > server: pull: prefix log messages and add error context > sync: push: prepare push parameters to be shared across parallel tasks > server: sync: allow pushing groups concurrently > server: push: prefix log messages and add additional logging > ui: expose group worker setting in sync job edit window >=20 > pbs-client/src/backup_stats.rs | 20 +-- > pbs-client/src/backup_writer.rs | 4 +- > src/api2/config/sync.rs | 10 ++ > src/api2/pull.rs | 9 +- > src/api2/push.rs | 8 +- > src/server/pull.rs | 246 +++++++++++++++++++------------- > src/server/push.rs | 178 +++++++++++++++++------ > src/server/sync.rs | 90 +++++++++++- > www/window/SyncJobEdit.js | 11 ++ > 9 files changed, 411 insertions(+), 165 deletions(-) >=20 >=20 > Summary over all repositories: > 10 files changed, 422 insertions(+), 165 deletions(-) >=20 > --=20 > Generated by murpp 0.9.0 >=20 >=20 >=20 >=20 >=20