* [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs
@ 2026-04-17 9:26 Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox v6 01/15] pbs api types: add `worker-threads` to sync job config Christian Ebner
` (14 more replies)
0 siblings, 15 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Syncing contents from/to a remote source via a sync job suffers from
low throughput on high latency networks because of limitations by the
HTTP/2 connection, as described in [0]. To improve, syncing multiple
groups in parallel by establishing multiple reader instances has been
suggested.
This patch series implements the functionality by adding the sync job
configuration property `worker-threads`, allowing to define the
number of groups pull/push tokio tasks to be executed in parallel on
the runtime during each job.
Examplary configuration:
```
sync: s-8764c440-3a6c
ns
owner root@pam
remote local
remote-ns
remote-store push-target-store
remove-vanished false
store datastore
sync-direction push
worker-threads 4
```
Since log messages are now also written concurrently, prefix logs
related to groups, snapshots and archives with their respective
context prefix and add context to error messages.
To reduce interwoven log messages from log lines arriving in fast
succession from different group workers, implement a buffer logic to
keep up to 5 lines buffered with a timeout of 1 second. This helps to
follow log lines.
Further, improve logging especially for sync jobs in push direction,
which only displayed limited information so far.
[0] https://bugzilla.proxmox.com/show_bug.cgi?id=4182
Change since version 5 (thanks @Fabian):
- Implement buffered logger for better grouping of fast succession log lines
- Refactor group worker into standalone BoundedJoinSet implementation.
- Improve log output by using better prefixes
- Add missing error contexts
Change since version 4:
- Use dedicated tokio tasks to run in parallel on different runtime threads,
not just multiple concurrent futures on the same thread.
- Rework store progress accounting logic to avoid mutex locks when possible,
use atomic counters instead.
- Expose setting also in the sync job edit window, not just the config.
proxmox:
Christian Ebner (1):
pbs api types: add `worker-threads` to sync job config
pbs-api-types/src/jobs.rs | 11 +++++++++++
1 file changed, 11 insertions(+)
proxmox-backup:
Christian Ebner (14):
tools: group and sort module imports
tools: implement buffered logger for concurrent log messages
tools: add bounded join set to run concurrent tasks bound by limit
client: backup writer: fix upload stats size and rate for push sync
api: config/sync: add optional `worker-threads` property
sync: pull: revert avoiding reinstantiation for encountered chunks map
sync: pull: factor out backup group locking and owner check
sync: pull: prepare pull parameters to be shared across parallel tasks
fix #4182: server: sync: allow pulling backup groups in parallel
server: pull: prefix log messages and add error context
sync: push: prepare push parameters to be shared across parallel tasks
server: sync: allow pushing groups concurrently
server: push: prefix log messages and add additional logging
ui: expose group worker setting in sync job edit window
pbs-client/src/backup_stats.rs | 20 +-
pbs-client/src/backup_writer.rs | 4 +-
pbs-tools/Cargo.toml | 2 +
pbs-tools/src/bounded_join_set.rs | 69 +++++
pbs-tools/src/buffered_logger.rs | 216 ++++++++++++++
pbs-tools/src/lib.rs | 5 +-
src/api2/config/sync.rs | 10 +
src/api2/pull.rs | 9 +-
src/api2/push.rs | 8 +-
src/server/pull.rs | 453 +++++++++++++++++++++---------
src/server/push.rs | 349 ++++++++++++++++++-----
src/server/sync.rs | 40 ++-
www/window/SyncJobEdit.js | 11 +
13 files changed, 978 insertions(+), 218 deletions(-)
create mode 100644 pbs-tools/src/bounded_join_set.rs
create mode 100644 pbs-tools/src/buffered_logger.rs
Summary over all repositories:
14 files changed, 989 insertions(+), 218 deletions(-)
--
Generated by murpp 0.11.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox v6 01/15] pbs api types: add `worker-threads` to sync job config
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 02/15] tools: group and sort module imports Christian Ebner
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Allow to specify the number of concurrent worker threads used to sync
groups for sync jobs. Values can range from the current 1 to 32,
although higher number of threads will saturate with respect to
performance improvements.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
pbs-api-types/src/jobs.rs | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
index 7e6dfb94..c4e6dda6 100644
--- a/pbs-api-types/src/jobs.rs
+++ b/pbs-api-types/src/jobs.rs
@@ -88,6 +88,11 @@ pub const VERIFY_JOB_VERIFY_THREADS_SCHEMA: Schema = threads_schema(
4,
);
+pub const SYNC_WORKER_THREADS_SCHEMA: Schema = threads_schema(
+ "The number of worker threads to process groups in parallel.",
+ 1,
+);
+
#[api(
properties: {
"next-run": {
@@ -664,6 +669,10 @@ pub const UNMOUNT_ON_SYNC_DONE_SCHEMA: Schema =
type: SyncDirection,
optional: true,
},
+ "worker-threads": {
+ schema: SYNC_WORKER_THREADS_SCHEMA,
+ optional: true,
+ },
}
)]
#[derive(Serialize, Deserialize, Clone, Updater, PartialEq)]
@@ -709,6 +718,8 @@ pub struct SyncJobConfig {
pub unmount_on_done: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub sync_direction: Option<SyncDirection>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub worker_threads: Option<usize>,
}
impl SyncJobConfig {
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 02/15] tools: group and sort module imports
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox v6 01/15] pbs api types: add `worker-threads` to sync job config Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 03/15] tools: implement buffered logger for concurrent log messages Christian Ebner
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Makes it easier to find and insert new modules with some logical
consistency.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- not present in previous version
pbs-tools/src/lib.rs | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/pbs-tools/src/lib.rs b/pbs-tools/src/lib.rs
index af900c925..f41aef6df 100644
--- a/pbs-tools/src/lib.rs
+++ b/pbs-tools/src/lib.rs
@@ -1,3 +1,4 @@
+pub mod async_lru_cache;
pub mod cert;
pub mod crypt_config;
pub mod format;
@@ -6,8 +7,6 @@ pub mod lru_cache;
pub mod nom;
pub mod sha;
-pub mod async_lru_cache;
-
/// Set MMAP_THRESHOLD to a fixed value (128 KiB)
///
/// This avoids the "dynamic" mmap-threshold logic from glibc's malloc, which seems misguided and
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 03/15] tools: implement buffered logger for concurrent log messages
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox v6 01/15] pbs api types: add `worker-threads` to sync job config Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 02/15] tools: group and sort module imports Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 04/15] tools: add bounded join set to run concurrent tasks bound by limit Christian Ebner
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Implements a buffered logger instance which collects messages send
from different sender instances via an async tokio channel and
buffers them. Sender identify by label and provide a log level for
each log line to be buffered and flushed.
On collection, log lines are grouped by label and buffered in
sequence of arrival per label, up to the configured maximum number of
per group lines or periodically with the configured interval. The
interval timeout is reset when contents are flushed. In addition,
senders can request flushing at any given point.
When the timeout set based on the interval is reached, all labels
log buffers are flushed. There is no guarantee on the order of labels
when flushing.
Log output is written based on provided log line level and prefixed
by the label.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- not present in previous version
pbs-tools/Cargo.toml | 2 +
pbs-tools/src/buffered_logger.rs | 216 +++++++++++++++++++++++++++++++
pbs-tools/src/lib.rs | 1 +
3 files changed, 219 insertions(+)
create mode 100644 pbs-tools/src/buffered_logger.rs
diff --git a/pbs-tools/Cargo.toml b/pbs-tools/Cargo.toml
index 998e3077e..6b1d92fa6 100644
--- a/pbs-tools/Cargo.toml
+++ b/pbs-tools/Cargo.toml
@@ -17,10 +17,12 @@ openssl.workspace = true
serde_json.workspace = true
# rt-multi-thread is required for block_in_place
tokio = { workspace = true, features = [ "fs", "io-util", "rt", "rt-multi-thread", "sync" ] }
+tracing.workspace = true
proxmox-async.workspace = true
proxmox-io = { workspace = true, features = [ "tokio" ] }
proxmox-human-byte.workspace = true
+proxmox-log.workspace = true
proxmox-sys.workspace = true
proxmox-time.workspace = true
diff --git a/pbs-tools/src/buffered_logger.rs b/pbs-tools/src/buffered_logger.rs
new file mode 100644
index 000000000..39cf068cd
--- /dev/null
+++ b/pbs-tools/src/buffered_logger.rs
@@ -0,0 +1,216 @@
+//! Log aggregator to collect and group messages send from concurrent tasks via
+//! a tokio channel.
+
+use std::collections::hash_map::Entry;
+use std::collections::HashMap;
+use std::time::Duration;
+
+use anyhow::Error;
+use tokio::sync::mpsc;
+use tokio::time::{self, Instant};
+use tracing::{debug, error, info, trace, warn, Level};
+
+use proxmox_log::LogContext;
+
+/// Label to be used to group currently buffered messages when flushing.
+pub type SenderLabel = String;
+
+/// Requested action for the log collection task
+enum SenderRequest {
+ // new log line to be buffered
+ Message(LogLine),
+ // flush currently buffered log lines associated by sender label
+ Flush(SenderLabel),
+}
+
+/// Logger instance to buffer and group log output to keep concurrent logs readable
+///
+/// Receives the logs from an async input channel, buffers them grouped by input
+/// channel and flushes them after either reaching a timeout or capacity limit.
+pub struct BufferedLogger {
+ // buffer to aggregate log lines based on sender label
+ buffer_map: HashMap<SenderLabel, Vec<LogLine>>,
+ // maximum number of received lines for an individual sender instance before
+ // flushing
+ max_buffered_lines: usize,
+ // maximum aggregation duration of received lines for an individual sender
+ // instance before flushing
+ max_aggregation_time: Duration,
+ // channel to receive log messages
+ receiver: mpsc::Receiver<SenderRequest>,
+}
+
+/// Instance to create new sender instances by cloning the channel sender
+pub struct LogLineSenderBuilder {
+ // to clone new senders if requested
+ _sender: mpsc::Sender<SenderRequest>,
+}
+
+impl LogLineSenderBuilder {
+ /// Create new sender instance to send log messages, to be grouped by given label
+ ///
+ /// Label is not checked to be unique (no other instance with same label exists),
+ /// it is the callers responsibility to check so if required.
+ pub fn sender_with_label(&self, label: SenderLabel) -> LogLineSender {
+ LogLineSender {
+ label,
+ sender: self._sender.clone(),
+ }
+ }
+}
+
+/// Sender to publish new log messages to buffered log aggregator
+pub struct LogLineSender {
+ // label used to group log lines
+ label: SenderLabel,
+ // sender to publish new log lines to buffered log aggregator task
+ sender: mpsc::Sender<SenderRequest>,
+}
+
+impl LogLineSender {
+ /// Send a new log message with given level to the buffered logger task
+ pub async fn log(&self, level: Level, message: String) -> Result<(), Error> {
+ let line = LogLine {
+ label: self.label.clone(),
+ level,
+ message,
+ };
+ self.sender.send(SenderRequest::Message(line)).await?;
+ Ok(())
+ }
+
+ /// Flush all messages with sender's label
+ pub async fn flush(&self) -> Result<(), Error> {
+ self.sender
+ .send(SenderRequest::Flush(self.label.clone()))
+ .await?;
+ Ok(())
+ }
+}
+
+/// Log message entity
+struct LogLine {
+ /// label indentifiying the sender
+ label: SenderLabel,
+ /// Log level to use during flushing
+ level: Level,
+ /// log line to be buffered and flushed
+ message: String,
+}
+
+impl BufferedLogger {
+ /// New instance of a buffered logger
+ pub fn new(
+ max_buffered_lines: usize,
+ max_aggregation_time: Duration,
+ ) -> (Self, LogLineSenderBuilder) {
+ let (_sender, receiver) = mpsc::channel(100);
+
+ (
+ Self {
+ buffer_map: HashMap::new(),
+ max_buffered_lines,
+ max_aggregation_time,
+ receiver,
+ },
+ LogLineSenderBuilder { _sender },
+ )
+ }
+
+ /// Starts the collection loop spawned on a new tokio task
+ /// Finishes when all sender belonging to the channel have been dropped.
+ pub fn run_log_collection(mut self) {
+ let future = async move {
+ loop {
+ let deadline = Instant::now() + self.max_aggregation_time;
+ match time::timeout_at(deadline, self.receive_log_line()).await {
+ Ok(finished) => {
+ if finished {
+ break;
+ }
+ }
+ Err(_timeout) => self.flush_all_buffered(),
+ }
+ }
+ };
+ match LogContext::current() {
+ None => tokio::spawn(future),
+ Some(context) => tokio::spawn(context.scope(future)),
+ };
+ }
+
+ /// Collects new log lines, buffers and flushes them if max lines limit exceeded.
+ ///
+ /// Returns `true` if all the senders have been dropped and the task should no
+ /// longer wait for new messages and finish.
+ async fn receive_log_line(&mut self) -> bool {
+ if let Some(request) = self.receiver.recv().await {
+ match request {
+ SenderRequest::Flush(label) => {
+ if let Some(log_lines) = self.buffer_map.get_mut(&label) {
+ Self::log_with_label(&label, log_lines);
+ log_lines.clear();
+ }
+ }
+ SenderRequest::Message(log_line) => {
+ if self.max_buffered_lines == 0
+ || self.max_aggregation_time < Duration::from_secs(0)
+ {
+ // shortcut if no buffering should happen
+ Self::log_by_level(&log_line.label, &log_line);
+ }
+
+ match self.buffer_map.entry(log_line.label.clone()) {
+ Entry::Occupied(mut occupied) => {
+ let log_lines = occupied.get_mut();
+ if log_lines.len() + 1 > self.max_buffered_lines {
+ // reached limit for this label,
+ // flush all buffered and new log line
+ Self::log_with_label(&log_line.label, log_lines);
+ log_lines.clear();
+ Self::log_by_level(&log_line.label, &log_line);
+ } else {
+ // below limit, push to buffer to flush later
+ log_lines.push(log_line);
+ }
+ }
+ Entry::Vacant(vacant) => {
+ vacant.insert(vec![log_line]);
+ }
+ }
+ }
+ }
+ return false;
+ }
+
+ // no more senders, all LogLineSender's and LogLineSenderBuilder have been dropped
+ self.flush_all_buffered();
+ true
+ }
+
+ /// Flush all currently buffered contents without ordering, but grouped by label
+ fn flush_all_buffered(&mut self) {
+ for (label, log_lines) in self.buffer_map.iter() {
+ Self::log_with_label(label, log_lines);
+ }
+ self.buffer_map.clear();
+ }
+
+ /// Log given log lines prefixed by label
+ fn log_with_label(label: &str, log_lines: &[LogLine]) {
+ for log_line in log_lines {
+ Self::log_by_level(label, log_line);
+ }
+ }
+
+ /// Write the given log line prefixed by label
+ fn log_by_level(label: &str, log_line: &LogLine) {
+ match log_line.level {
+ Level::ERROR => error!("[{label}]: {}", log_line.message),
+ Level::WARN => warn!("[{label}]: {}", log_line.message),
+ Level::INFO => info!("[{label}]: {}", log_line.message),
+ Level::DEBUG => debug!("[{label}]: {}", log_line.message),
+ Level::TRACE => trace!("[{label}]: {}", log_line.message),
+ }
+ }
+}
diff --git a/pbs-tools/src/lib.rs b/pbs-tools/src/lib.rs
index f41aef6df..1e3972c92 100644
--- a/pbs-tools/src/lib.rs
+++ b/pbs-tools/src/lib.rs
@@ -1,4 +1,5 @@
pub mod async_lru_cache;
+pub mod buffered_logger;
pub mod cert;
pub mod crypt_config;
pub mod format;
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 04/15] tools: add bounded join set to run concurrent tasks bound by limit
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (2 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 03/15] tools: implement buffered logger for concurrent log messages Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 05/15] client: backup writer: fix upload stats size and rate for push sync Christian Ebner
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
The BoundedJoinSet allows to run tasks concurrently via a JoinSet,
but constrains the number of concurrent tasks to be run at once by an
upper limit.
In contrast to the ParallelHandler implementation, which is purely
sync implementation and does not provide easy handling for returned
results, rhis allows to execute tasks in an async context with straight
forward handling of results, as required for e.g. pulling/pushing of
backup groups in parallel for sync jobs. Also, log context is easily
preserved, which is of importance for task logging.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- not present in previous version, refactored logic from previous
GroupWorker implementation.
pbs-tools/src/bounded_join_set.rs | 69 +++++++++++++++++++++++++++++++
pbs-tools/src/lib.rs | 1 +
2 files changed, 70 insertions(+)
create mode 100644 pbs-tools/src/bounded_join_set.rs
diff --git a/pbs-tools/src/bounded_join_set.rs b/pbs-tools/src/bounded_join_set.rs
new file mode 100644
index 000000000..01b27b2a6
--- /dev/null
+++ b/pbs-tools/src/bounded_join_set.rs
@@ -0,0 +1,69 @@
+//! JoinSet with an upper bound of concurrent tasks.
+//!
+//! Allows to run up to the configured number of tasks concurrently in an async
+//! context.
+
+use std::future::Future;
+
+use tokio::task::{JoinError, JoinSet};
+
+use proxmox_log::LogContext;
+
+/// Run up to preconfigured number of futures concurrently on tokio tasks.
+pub struct BoundedJoinSet<T> {
+ // upper bound for concurrent task execution
+ max_tasks: usize,
+ // handles to currently active tasks
+ workers: JoinSet<T>,
+}
+
+impl<T: Send + 'static> BoundedJoinSet<T> {
+ /// Create a new join set with up to `max_task` concurrently executed tasks.
+ pub fn new(max_tasks: usize) -> Self {
+ Self {
+ max_tasks,
+ workers: JoinSet::new(),
+ }
+ }
+
+ /// Spawn the given task on the workers, waiting until there is capacity to do so.
+ ///
+ /// If there is no capacity, this will await until there is so, returning the results
+ /// for the finished task(s) providing the now free running slot in order of completion
+ /// or a `JoinError` if joining failed.
+ pub async fn spawn_task<F>(&mut self, task: F) -> Result<Vec<T>, JoinError>
+ where
+ F: Future<Output = T>,
+ F: Send + 'static,
+ {
+ let mut results = Vec::with_capacity(self.workers.len());
+
+ while self.workers.len() >= self.max_tasks {
+ // capacity reached, wait for an active task to complete
+ if let Some(result) = self.workers.join_next().await {
+ results.push(result?);
+ }
+ }
+
+ match LogContext::current() {
+ Some(context) => self.workers.spawn(context.scope(task)),
+ None => self.workers.spawn(task),
+ };
+
+ Ok(results)
+ }
+
+ /// Wait on all active tasks to run to completion.
+ ///
+ /// Returns the results for each task in order of completion or a `JoinError`
+ /// if joining failed.
+ pub async fn join_active(&mut self) -> Result<Vec<T>, JoinError> {
+ let mut results = Vec::with_capacity(self.workers.len());
+
+ while let Some(result) = self.workers.join_next().await {
+ results.push(result?);
+ }
+
+ Ok(results)
+ }
+}
diff --git a/pbs-tools/src/lib.rs b/pbs-tools/src/lib.rs
index 1e3972c92..dc55366b6 100644
--- a/pbs-tools/src/lib.rs
+++ b/pbs-tools/src/lib.rs
@@ -1,4 +1,5 @@
pub mod async_lru_cache;
+pub mod bounded_join_set;
pub mod buffered_logger;
pub mod cert;
pub mod crypt_config;
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 05/15] client: backup writer: fix upload stats size and rate for push sync
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (3 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 04/15] tools: add bounded join set to run concurrent tasks bound by limit Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 06/15] api: config/sync: add optional `worker-threads` property Christian Ebner
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Currently, the logical size of the uploaded chunks is used for size
and upload rate calculation in case of sync jobs in push direction,
leading to inflated values for the transferred size and rate.
Use the compressed chunk size instead. To get the required
information, return the more verbose `UploadStats` on
`upload_index_chunk_info` calls and use it's compressed size for the
transferred `bytes` of `SyncStats` instead. Since `UploadStats` is
now part of a pub api, increase it's scope as well.
This is then finally being used to display the upload size and
calculate the rate for the push sync job.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
pbs-client/src/backup_stats.rs | 20 ++++++++++----------
pbs-client/src/backup_writer.rs | 4 ++--
src/server/push.rs | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/pbs-client/src/backup_stats.rs b/pbs-client/src/backup_stats.rs
index f0563a001..edf7ef3c4 100644
--- a/pbs-client/src/backup_stats.rs
+++ b/pbs-client/src/backup_stats.rs
@@ -15,16 +15,16 @@ pub struct BackupStats {
}
/// Extended backup run statistics and archive checksum
-pub(crate) struct UploadStats {
- pub(crate) chunk_count: usize,
- pub(crate) chunk_reused: usize,
- pub(crate) chunk_injected: usize,
- pub(crate) size: usize,
- pub(crate) size_reused: usize,
- pub(crate) size_injected: usize,
- pub(crate) size_compressed: usize,
- pub(crate) duration: Duration,
- pub(crate) csum: [u8; 32],
+pub struct UploadStats {
+ pub chunk_count: usize,
+ pub chunk_reused: usize,
+ pub chunk_injected: usize,
+ pub size: usize,
+ pub size_reused: usize,
+ pub size_injected: usize,
+ pub size_compressed: usize,
+ pub duration: Duration,
+ pub csum: [u8; 32],
}
impl UploadStats {
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 49aff3fdd..4a4391c8b 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -309,7 +309,7 @@ impl BackupWriter {
archive_name: &BackupArchiveName,
stream: impl Stream<Item = Result<MergedChunkInfo, Error>>,
options: UploadOptions,
- ) -> Result<BackupStats, Error> {
+ ) -> Result<UploadStats, Error> {
let mut param = json!({ "archive-name": archive_name });
let (prefix, archive_size) = options.index_type.to_prefix_and_size();
if let Some(size) = archive_size {
@@ -391,7 +391,7 @@ impl BackupWriter {
.post(&format!("{prefix}_close"), Some(param))
.await?;
- Ok(upload_stats.to_backup_stats())
+ Ok(upload_stats)
}
pub async fn upload_stream(
diff --git a/src/server/push.rs b/src/server/push.rs
index 697b94f2f..494e0fbce 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -1059,8 +1059,8 @@ async fn push_index(
.await?;
Ok(SyncStats {
- chunk_count: upload_stats.chunk_count as usize,
- bytes: upload_stats.size as usize,
+ chunk_count: upload_stats.chunk_count,
+ bytes: upload_stats.size_compressed,
elapsed: upload_stats.duration,
removed: None,
})
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 06/15] api: config/sync: add optional `worker-threads` property
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (4 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 05/15] client: backup writer: fix upload stats size and rate for push sync Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 07/15] sync: pull: revert avoiding reinstantiation for encountered chunks map Christian Ebner
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Allow to configure from 1 up to 32 worker threads to perform
multiple group syncs in parallel.
The property is exposed via the sync job config and passed to
the pull/push parameters for the sync job to setup and execute the
thread pool accordingly.
Implements the schema definitions and includes the new property to
the `SyncJobConfig`, `PullParameters` and `PushParameters`.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
src/api2/config/sync.rs | 10 ++++++++++
src/api2/pull.rs | 9 ++++++++-
src/api2/push.rs | 8 +++++++-
src/server/pull.rs | 4 ++++
src/server/push.rs | 4 ++++
src/server/sync.rs | 1 +
6 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
index 67fa3182c..0f073ca54 100644
--- a/src/api2/config/sync.rs
+++ b/src/api2/config/sync.rs
@@ -344,6 +344,8 @@ pub enum DeletableProperty {
UnmountOnDone,
/// Delete the sync_direction property,
SyncDirection,
+ /// Delete the worker_threads property,
+ WorkerThreads,
}
#[api(
@@ -467,6 +469,9 @@ pub fn update_sync_job(
DeletableProperty::SyncDirection => {
data.sync_direction = None;
}
+ DeletableProperty::WorkerThreads => {
+ data.worker_threads = None;
+ }
}
}
}
@@ -526,6 +531,10 @@ pub fn update_sync_job(
data.sync_direction = Some(sync_direction);
}
+ if let Some(worker_threads) = update.worker_threads {
+ data.worker_threads = Some(worker_threads);
+ }
+
if update.limit.rate_in.is_some() {
data.limit.rate_in = update.limit.rate_in;
}
@@ -698,6 +707,7 @@ acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
run_on_mount: None,
unmount_on_done: None,
sync_direction: None, // use default
+ worker_threads: None,
};
// should work without ACLs
diff --git a/src/api2/pull.rs b/src/api2/pull.rs
index 4b1fd5e60..7cf165f91 100644
--- a/src/api2/pull.rs
+++ b/src/api2/pull.rs
@@ -11,7 +11,7 @@ use pbs_api_types::{
GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
RESYNC_CORRUPT_SCHEMA, SYNC_ENCRYPTED_ONLY_SCHEMA, SYNC_VERIFIED_ONLY_SCHEMA,
- TRANSFER_LAST_SCHEMA,
+ SYNC_WORKER_THREADS_SCHEMA, TRANSFER_LAST_SCHEMA,
};
use pbs_config::CachedUserInfo;
use proxmox_rest_server::WorkerTask;
@@ -91,6 +91,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
sync_job.encrypted_only,
sync_job.verified_only,
sync_job.resync_corrupt,
+ sync_job.worker_threads,
)
}
}
@@ -148,6 +149,10 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
schema: RESYNC_CORRUPT_SCHEMA,
optional: true,
},
+ "worker-threads": {
+ schema: SYNC_WORKER_THREADS_SCHEMA,
+ optional: true,
+ },
},
},
access: {
@@ -175,6 +180,7 @@ async fn pull(
encrypted_only: Option<bool>,
verified_only: Option<bool>,
resync_corrupt: Option<bool>,
+ worker_threads: Option<usize>,
rpcenv: &mut dyn RpcEnvironment,
) -> Result<String, Error> {
let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
@@ -215,6 +221,7 @@ async fn pull(
encrypted_only,
verified_only,
resync_corrupt,
+ worker_threads,
)?;
// fixme: set to_stdout to false?
diff --git a/src/api2/push.rs b/src/api2/push.rs
index e5edc13e0..f27f4ea1a 100644
--- a/src/api2/push.rs
+++ b/src/api2/push.rs
@@ -6,7 +6,7 @@ use pbs_api_types::{
GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
PRIV_DATASTORE_READ, PRIV_REMOTE_DATASTORE_BACKUP, PRIV_REMOTE_DATASTORE_PRUNE,
REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, SYNC_ENCRYPTED_ONLY_SCHEMA,
- SYNC_VERIFIED_ONLY_SCHEMA, TRANSFER_LAST_SCHEMA,
+ SYNC_VERIFIED_ONLY_SCHEMA, SYNC_WORKER_THREADS_SCHEMA, TRANSFER_LAST_SCHEMA,
};
use proxmox_rest_server::WorkerTask;
use proxmox_router::{Permission, Router, RpcEnvironment};
@@ -108,6 +108,10 @@ fn check_push_privs(
schema: TRANSFER_LAST_SCHEMA,
optional: true,
},
+ "worker-threads": {
+ schema: SYNC_WORKER_THREADS_SCHEMA,
+ optional: true,
+ },
},
},
access: {
@@ -133,6 +137,7 @@ async fn push(
verified_only: Option<bool>,
limit: RateLimitConfig,
transfer_last: Option<usize>,
+ worker_threads: Option<usize>,
rpcenv: &mut dyn RpcEnvironment,
) -> Result<String, Error> {
let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
@@ -164,6 +169,7 @@ async fn push(
verified_only,
limit,
transfer_last,
+ worker_threads,
)
.await?;
diff --git a/src/server/pull.rs b/src/server/pull.rs
index bd3e8bef4..ca17eb243 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -65,6 +65,8 @@ pub(crate) struct PullParameters {
verified_only: bool,
/// Whether to re-sync corrupted snapshots
resync_corrupt: bool,
+ /// Maximum number of worker threads to pull during sync job
+ worker_threads: Option<usize>,
}
impl PullParameters {
@@ -85,6 +87,7 @@ impl PullParameters {
encrypted_only: Option<bool>,
verified_only: Option<bool>,
resync_corrupt: Option<bool>,
+ worker_threads: Option<usize>,
) -> Result<Self, Error> {
if let Some(max_depth) = max_depth {
ns.check_max_depth(max_depth)?;
@@ -137,6 +140,7 @@ impl PullParameters {
encrypted_only,
verified_only,
resync_corrupt,
+ worker_threads,
})
}
}
diff --git a/src/server/push.rs b/src/server/push.rs
index 494e0fbce..44a204e6b 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -83,6 +83,8 @@ pub(crate) struct PushParameters {
verified_only: bool,
/// How many snapshots should be transferred at most (taking the newest N snapshots)
transfer_last: Option<usize>,
+ /// Maximum number of worker threads for push during sync job
+ worker_threads: Option<usize>,
}
impl PushParameters {
@@ -102,6 +104,7 @@ impl PushParameters {
verified_only: Option<bool>,
limit: RateLimitConfig,
transfer_last: Option<usize>,
+ worker_threads: Option<usize>,
) -> Result<Self, Error> {
if let Some(max_depth) = max_depth {
ns.check_max_depth(max_depth)?;
@@ -165,6 +168,7 @@ impl PushParameters {
encrypted_only,
verified_only,
transfer_last,
+ worker_threads,
})
}
diff --git a/src/server/sync.rs b/src/server/sync.rs
index aedf4a271..9e6aeb9b0 100644
--- a/src/server/sync.rs
+++ b/src/server/sync.rs
@@ -675,6 +675,7 @@ pub fn do_sync_job(
sync_job.verified_only,
sync_job.limit.clone(),
sync_job.transfer_last,
+ sync_job.worker_threads,
)
.await?;
push_store(push_params).await?
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 07/15] sync: pull: revert avoiding reinstantiation for encountered chunks map
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (5 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 06/15] api: config/sync: add optional `worker-threads` property Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 08/15] sync: pull: factor out backup group locking and owner check Christian Ebner
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
While keeping a store wide instance to avoid reinstantiation on each
group is desired when iteratively processing groups, this cannot work
when performing the sync of multiple groups in parallel.
This is in preparation for parallel group syncs and reverts commit
ecdec5bc ("sync: pull: avoid reinstantiation for encountered chunks
map").
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
src/server/pull.rs | 26 +++++---------------------
1 file changed, 5 insertions(+), 21 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index ca17eb243..45fe9f8b1 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -620,7 +620,6 @@ async fn pull_group(
source_namespace: &BackupNamespace,
group: &BackupGroup,
progress: &mut StoreProgress,
- encountered_chunks: Arc<Mutex<EncounteredChunks>>,
) -> Result<SyncStats, Error> {
let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
@@ -721,6 +720,9 @@ async fn pull_group(
transfer_last_skip_info.reset();
}
+ // start with 65536 chunks (up to 256 GiB)
+ let encountered_chunks = Arc::new(Mutex::new(EncounteredChunks::with_capacity(1024 * 64)));
+
let backup_group = params
.target
.store
@@ -984,9 +986,6 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
let mut synced_ns = HashSet::with_capacity(namespaces.len());
let mut sync_stats = SyncStats::default();
- // start with 65536 chunks (up to 256 GiB)
- let encountered_chunks = Arc::new(Mutex::new(EncounteredChunks::with_capacity(1024 * 64)));
-
for namespace in namespaces {
let source_store_ns_str = print_store_and_ns(params.source.get_store(), &namespace);
@@ -1008,7 +1007,7 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
}
}
- match pull_ns(&namespace, &mut params, encountered_chunks.clone()).await {
+ match pull_ns(&namespace, &mut params).await {
Ok((ns_progress, ns_sync_stats, ns_errors)) => {
errors |= ns_errors;
@@ -1066,7 +1065,6 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
async fn pull_ns(
namespace: &BackupNamespace,
params: &mut PullParameters,
- encountered_chunks: Arc<Mutex<EncounteredChunks>>,
) -> Result<(StoreProgress, SyncStats, bool), Error> {
let list: Vec<BackupGroup> = params.source.list_groups(namespace, ¶ms.owner).await?;
@@ -1125,16 +1123,7 @@ async fn pull_ns(
);
errors = true; // do not stop here, instead continue
} else {
- encountered_chunks.lock().unwrap().clear();
- match pull_group(
- params,
- namespace,
- &group,
- &mut progress,
- encountered_chunks.clone(),
- )
- .await
- {
+ match pull_group(params, namespace, &group, &mut progress).await {
Ok(stats) => sync_stats.add(stats),
Err(err) => {
info!("sync group {} failed - {err:#}", &group);
@@ -1255,9 +1244,4 @@ impl EncounteredChunks {
}
}
}
-
- /// Clear all entries
- fn clear(&mut self) {
- self.chunk_set.clear();
- }
}
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 08/15] sync: pull: factor out backup group locking and owner check
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (6 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 07/15] sync: pull: revert avoiding reinstantiation for encountered chunks map Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 09/15] sync: pull: prepare pull parameters to be shared across parallel tasks Christian Ebner
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Creates a dedicated entry point for parallel group pulling and
simplifies the backup group loop logic.
While locking and owner check could have been moved to pull_group()
as well, that function is already hard to parse as is. Logging of
errors is moved to the helper to facilitate it for parallel pulling.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- adapted to better fit subsequent introduction of log line sender
src/server/pull.rs | 76 +++++++++++++++++++++++++++-------------------
1 file changed, 44 insertions(+), 32 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 45fe9f8b1..7126a5102 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -1050,6 +1050,47 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
Ok(sync_stats)
}
+/// Get and exclusive lock on the backup group, check ownership matches
+/// sync job owner and pull group contents.
+async fn lock_and_pull_group(
+ params: &PullParameters,
+ group: &BackupGroup,
+ namespace: &BackupNamespace,
+ target_namespace: &BackupNamespace,
+ progress: &mut StoreProgress,
+) -> Result<SyncStats, Error> {
+ let (owner, _lock_guard) =
+ match params
+ .target
+ .store
+ .create_locked_backup_group(target_namespace, group, ¶ms.owner)
+ {
+ Ok(res) => res,
+ Err(err) => {
+ info!("sync group {group} failed - group lock failed: {err}");
+ info!("create_locked_backup_group failed");
+ return Err(err);
+ }
+ };
+
+ if params.owner != owner {
+ // only the owner is allowed to create additional snapshots
+ info!(
+ "sync group {group} failed - owner check failed ({} != {owner})",
+ params.owner
+ );
+ return Err(format_err!("owner check failed"));
+ }
+
+ match pull_group(params, namespace, group, progress).await {
+ Ok(stats) => Ok(stats),
+ Err(err) => {
+ info!("sync group {group} failed - {err:#}");
+ Err(err)
+ }
+ }
+}
+
/// Pulls a namespace according to `params`.
///
/// Pulling a namespace consists of the following steps:
@@ -1098,38 +1139,9 @@ async fn pull_ns(
progress.done_snapshots = 0;
progress.group_snapshots = 0;
- let (owner, _lock_guard) =
- match params
- .target
- .store
- .create_locked_backup_group(&target_ns, &group, ¶ms.owner)
- {
- Ok(result) => result,
- Err(err) => {
- info!("sync group {} failed - group lock failed: {err}", &group);
- errors = true;
- // do not stop here, instead continue
- info!("create_locked_backup_group failed");
- continue;
- }
- };
-
- // permission check
- if params.owner != owner {
- // only the owner is allowed to create additional snapshots
- info!(
- "sync group {} failed - owner check failed ({} != {owner})",
- &group, params.owner
- );
- errors = true; // do not stop here, instead continue
- } else {
- match pull_group(params, namespace, &group, &mut progress).await {
- Ok(stats) => sync_stats.add(stats),
- Err(err) => {
- info!("sync group {} failed - {err:#}", &group);
- errors = true; // do not stop here, instead continue
- }
- }
+ match lock_and_pull_group(params, &group, &namespace, &target_ns, &mut progress).await {
+ Ok(stats) => sync_stats.add(stats),
+ Err(_err) => errors = true,
}
}
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 09/15] sync: pull: prepare pull parameters to be shared across parallel tasks
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (7 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 08/15] sync: pull: factor out backup group locking and owner check Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 10/15] fix #4182: server: sync: allow pulling backup groups in parallel Christian Ebner
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
When performing parallel group syncs, the pull parameters must be
shared between all tasks which is not possible with regular
references due to lifetime and ownership issues. Pack them into an
atomic reference counter instead so they can easily be cloned when
required.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
src/server/pull.rs | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 7126a5102..5beca6b8d 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -380,7 +380,7 @@ async fn pull_single_archive<'a>(
/// -- if not, pull it from the remote
/// - Download log if not already existing
async fn pull_snapshot<'a>(
- params: &PullParameters,
+ params: Arc<PullParameters>,
reader: Arc<dyn SyncSourceReader + 'a>,
snapshot: &'a pbs_datastore::BackupDir,
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
@@ -558,7 +558,7 @@ async fn pull_snapshot<'a>(
/// The `reader` is configured to read from the source backup directory, while the
/// `snapshot` is pointing to the local datastore and target namespace.
async fn pull_snapshot_from<'a>(
- params: &PullParameters,
+ params: Arc<PullParameters>,
reader: Arc<dyn SyncSourceReader + 'a>,
snapshot: &'a pbs_datastore::BackupDir,
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
@@ -616,7 +616,7 @@ async fn pull_snapshot_from<'a>(
/// - remote snapshot access is checked by remote (twice: query and opening the backup reader)
/// - local group owner is already checked by pull_store
async fn pull_group(
- params: &PullParameters,
+ params: Arc<PullParameters>,
source_namespace: &BackupNamespace,
group: &BackupGroup,
progress: &mut StoreProgress,
@@ -797,7 +797,7 @@ async fn pull_group(
.reader(source_namespace, &from_snapshot)
.await?;
let result = pull_snapshot_from(
- params,
+ Arc::clone(¶ms),
reader,
&to_snapshot,
encountered_chunks.clone(),
@@ -985,6 +985,7 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
let (mut groups, mut snapshots) = (0, 0);
let mut synced_ns = HashSet::with_capacity(namespaces.len());
let mut sync_stats = SyncStats::default();
+ let params = Arc::new(params);
for namespace in namespaces {
let source_store_ns_str = print_store_and_ns(params.source.get_store(), &namespace);
@@ -1007,7 +1008,7 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
}
}
- match pull_ns(&namespace, &mut params).await {
+ match pull_ns(&namespace, Arc::clone(¶ms)).await {
Ok((ns_progress, ns_sync_stats, ns_errors)) => {
errors |= ns_errors;
@@ -1053,7 +1054,7 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
/// Get and exclusive lock on the backup group, check ownership matches
/// sync job owner and pull group contents.
async fn lock_and_pull_group(
- params: &PullParameters,
+ params: Arc<PullParameters>,
group: &BackupGroup,
namespace: &BackupNamespace,
target_namespace: &BackupNamespace,
@@ -1105,7 +1106,7 @@ async fn lock_and_pull_group(
/// - owner check for vanished groups done here
async fn pull_ns(
namespace: &BackupNamespace,
- params: &mut PullParameters,
+ params: Arc<PullParameters>,
) -> Result<(StoreProgress, SyncStats, bool), Error> {
let list: Vec<BackupGroup> = params.source.list_groups(namespace, ¶ms.owner).await?;
@@ -1139,7 +1140,15 @@ async fn pull_ns(
progress.done_snapshots = 0;
progress.group_snapshots = 0;
- match lock_and_pull_group(params, &group, &namespace, &target_ns, &mut progress).await {
+ match lock_and_pull_group(
+ Arc::clone(¶ms),
+ &group,
+ namespace,
+ &target_ns,
+ &mut progress,
+ )
+ .await
+ {
Ok(stats) => sync_stats.add(stats),
Err(_err) => errors = true,
}
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 10/15] fix #4182: server: sync: allow pulling backup groups in parallel
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (8 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 09/15] sync: pull: prepare pull parameters to be shared across parallel tasks Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context Christian Ebner
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Currently, a sync job sequentially pulls the backup groups and the
snapshots contained within them. It is therefore limited in download
speed by the single HTTP/2 connection of the source reader instance
in case of remote syncs. For high latency networks, this suffer from
limited download speed due to head of line blocking.
Improve the throughput by allowing to pull up to a configured number
of backup groups in parallel, by creating a bounded join set, allowing
to which concurrently pulls from the remote source up to the
configured number of tokio tasks. Since these are dedicated tasks,
they can run independent and in parallel on the tokio runtime.
Store progress output is now prefixed by the group as it depends on
the group being pulled since the snapshot count differs. To update
the output on a per group level, the shared group progress count is
passed as atomic counter, the store progress accounted globally as
well as per-group.
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=4182
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- uses BoundedJoinSet implementation, refactored accordingly
src/server/pull.rs | 70 ++++++++++++++++++++++++++++++++--------------
src/server/sync.rs | 33 ++++++++++++++++++++++
2 files changed, 82 insertions(+), 21 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 5beca6b8d..611441d2a 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -26,6 +26,7 @@ use pbs_datastore::index::IndexFile;
use pbs_datastore::manifest::{BackupManifest, FileInfo};
use pbs_datastore::read_chunk::AsyncReadChunk;
use pbs_datastore::{check_backup_owner, DataStore, DatastoreBackend, StoreProgress};
+use pbs_tools::bounded_join_set::BoundedJoinSet;
use pbs_tools::sha::sha256;
use super::sync::{
@@ -34,6 +35,7 @@ use super::sync::{
SkipReason, SyncSource, SyncSourceReader, SyncStats,
};
use crate::backup::{check_ns_modification_privs, check_ns_privs};
+use crate::server::sync::SharedGroupProgress;
use crate::tools::parallel_handler::ParallelHandler;
pub(crate) struct PullTarget {
@@ -619,7 +621,7 @@ async fn pull_group(
params: Arc<PullParameters>,
source_namespace: &BackupNamespace,
group: &BackupGroup,
- progress: &mut StoreProgress,
+ shared_group_progress: Arc<SharedGroupProgress>,
) -> Result<SyncStats, Error> {
let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
@@ -782,7 +784,8 @@ async fn pull_group(
}
}
- progress.group_snapshots = list.len() as u64;
+ let mut local_progress = StoreProgress::new(shared_group_progress.total_groups());
+ local_progress.group_snapshots = list.len() as u64;
let mut sync_stats = SyncStats::default();
@@ -805,8 +808,10 @@ async fn pull_group(
)
.await;
- progress.done_snapshots = pos as u64 + 1;
- info!("percentage done: {progress}");
+ // Update done groups progress by other parallel running pulls
+ local_progress.done_groups = shared_group_progress.load_done();
+ local_progress.done_snapshots = pos as u64 + 1;
+ info!("percentage done: group {group}: {local_progress}");
let stats = result?; // stop on error
sync_stats.add(stats);
@@ -1058,7 +1063,7 @@ async fn lock_and_pull_group(
group: &BackupGroup,
namespace: &BackupNamespace,
target_namespace: &BackupNamespace,
- progress: &mut StoreProgress,
+ shared_group_progress: Arc<SharedGroupProgress>,
) -> Result<SyncStats, Error> {
let (owner, _lock_guard) =
match params
@@ -1083,7 +1088,7 @@ async fn lock_and_pull_group(
return Err(format_err!("owner check failed"));
}
- match pull_group(params, namespace, group, progress).await {
+ match pull_group(params, namespace, group, shared_group_progress).await {
Ok(stats) => Ok(stats),
Err(err) => {
info!("sync group {group} failed - {err:#}");
@@ -1135,25 +1140,48 @@ async fn pull_ns(
let target_ns = namespace.map_prefix(¶ms.source.get_ns(), ¶ms.target.ns)?;
- for (done, group) in list.into_iter().enumerate() {
- progress.done_groups = done as u64;
- progress.done_snapshots = 0;
- progress.group_snapshots = 0;
+ let shared_group_progress = Arc::new(SharedGroupProgress::with_total_groups(list.len()));
+ let mut group_workers = BoundedJoinSet::new(params.worker_threads.unwrap_or(1));
- match lock_and_pull_group(
- Arc::clone(¶ms),
- &group,
- namespace,
- &target_ns,
- &mut progress,
- )
- .await
- {
- Ok(stats) => sync_stats.add(stats),
- Err(_err) => errors = true,
+ let mut process_results = |results| {
+ for result in results {
+ match result {
+ Ok(stats) => {
+ sync_stats.add(stats);
+ progress.done_groups = shared_group_progress.increment_done();
+ }
+ Err(_err) => errors = true,
+ }
}
+ };
+
+ for group in list.into_iter() {
+ let namespace = namespace.clone();
+ let target_ns = target_ns.clone();
+ let params = Arc::clone(¶ms);
+ let group_progress_cloned = Arc::clone(&shared_group_progress);
+ let results = group_workers
+ .spawn_task(async move {
+ lock_and_pull_group(
+ Arc::clone(¶ms),
+ &group,
+ &namespace,
+ &target_ns,
+ group_progress_cloned,
+ )
+ .await
+ })
+ .await
+ .map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
+ process_results(results);
}
+ let results = group_workers
+ .join_active()
+ .await
+ .map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
+ process_results(results);
+
if params.remove_vanished {
let result: Result<(), Error> = proxmox_lang::try_block!({
for local_group in params.target.store.iter_backup_groups(target_ns.clone())? {
diff --git a/src/server/sync.rs b/src/server/sync.rs
index 9e6aeb9b0..e88418442 100644
--- a/src/server/sync.rs
+++ b/src/server/sync.rs
@@ -4,6 +4,7 @@ use std::collections::HashMap;
use std::io::{Seek, Write};
use std::ops::Deref;
use std::path::{Path, PathBuf};
+use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Duration;
@@ -12,6 +13,7 @@ use futures::{future::FutureExt, select};
use hyper::http::StatusCode;
use pbs_config::BackupLockGuard;
use serde_json::json;
+use tokio::task::JoinSet;
use tracing::{info, warn};
use proxmox_human_byte::HumanByte;
@@ -792,3 +794,34 @@ pub(super) fn exclude_not_verified_or_encrypted(
false
}
+
+/// Track group progress during parallel push/pull in sync jobs
+pub(crate) struct SharedGroupProgress {
+ done: AtomicUsize,
+ total: usize,
+}
+
+impl SharedGroupProgress {
+ /// Create a new instance to track group progress with expected total number of groups
+ pub(crate) fn with_total_groups(total: usize) -> Self {
+ Self {
+ done: AtomicUsize::new(0),
+ total,
+ }
+ }
+
+ /// Return current counter value for done groups
+ pub(crate) fn load_done(&self) -> u64 {
+ self.done.load(Ordering::Acquire) as u64
+ }
+
+ /// Increment counter for done groups and return new value
+ pub(crate) fn increment_done(&self) -> u64 {
+ self.done.fetch_add(1, Ordering::AcqRel) as u64 + 1
+ }
+
+ /// Return the number of total backup groups
+ pub(crate) fn total_groups(&self) -> u64 {
+ self.total as u64
+ }
+}
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (9 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 10/15] fix #4182: server: sync: allow pulling backup groups in parallel Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 12/15] sync: push: prepare push parameters to be shared across parallel tasks Christian Ebner
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Pulling groups and therefore also snapshots in parallel leads to
unordered log outputs, making it mostly impossible to relate a log
message to a backup snapshot/group.
Therefore, prefix pull job log messages by the corresponding group or
snapshot and set the error context accordingly.
Also, reword some messages, inline variables in format strings and
start log lines with capital letters to get consistent output.
By using the buffered logger implementation and buffer up to 5 lines
with a timeout of 1 second, subsequent log lines arriving in fast
succession are kept together, reducing the mixing of lines.
Example output for a sequential pull job:
```
...
[ct/100]: 2025-11-17T10:11:42Z: start sync
[ct/100]: 2025-11-17T10:11:42Z: pct.conf.blob: sync archive
[ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: sync archive
[ct/100]: 2025-11-17T10:11:42Z: root.ppxar.didx: downloaded 16.785 MiB (280.791 MiB/s)
[ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: sync archive
[ct/100]: 2025-11-17T10:11:42Z: root.mpxar.didx: downloaded 65.703 KiB (29.1 MiB/s)
[ct/100]: 2025-11-17T10:11:42Z: sync done
[ct/100]: percentage done: 9.09% (1/11 groups)
[ct/101]: 2026-03-31T12:20:16Z: start sync
[ct/101]: 2026-03-31T12:20:16Z: pct.conf.blob: sync archive
[ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: sync archive
[ct/101]: 2026-03-31T12:20:16Z: root.pxar.didx: downloaded 199.806 MiB (311.91 MiB/s)
[ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: sync archive
[ct/101]: 2026-03-31T12:20:16Z: catalog.pcat1.didx: downloaded 180.379 KiB (22.748 MiB/s)
[ct/101]: 2026-03-31T12:20:16Z: sync done
...
```
Example output for a parallel pull job:
```
...
[ct/107]: 2025-07-16T09:14:01Z: start sync
[ct/107]: 2025-07-16T09:14:01Z: pct.conf.blob: sync archive
[ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: sync archive
[vm/108]: 2025-09-19T07:37:19Z: start sync
[vm/108]: 2025-09-19T07:37:19Z: qemu-server.conf.blob: sync archive
[vm/108]: 2025-09-19T07:37:19Z: drive-scsi0.img.fidx: sync archive
[ct/107]: 2025-07-16T09:14:01Z: root.ppxar.didx: downloaded 609.233 MiB (112.628 MiB/s)
[ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: sync archive
[ct/107]: 2025-07-16T09:14:01Z: root.mpxar.didx: downloaded 1.172 MiB (17.838 MiB/s)
[ct/107]: 2025-07-16T09:14:01Z: sync done
[ct/107]: percentage done: 72.73% (8/11 groups)
[vm/108]: 2025-09-19T07:37:19Z: drive-scsi0.img.fidx: downloaded 1.196 GiB (156.892 MiB/s)
[vm/108]: 2025-09-19T07:37:19Z: sync done
...
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- uses BufferedLogger implementation, refactored accordingly
- improve log line prefixes
- add missing error contexts
src/server/pull.rs | 314 +++++++++++++++++++++++++++++++++------------
src/server/sync.rs | 8 +-
2 files changed, 237 insertions(+), 85 deletions(-)
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 611441d2a..f7aae4d59 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -5,11 +5,11 @@ use std::collections::{HashMap, HashSet};
use std::io::Seek;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
-use std::time::SystemTime;
+use std::time::{Duration, SystemTime};
use anyhow::{bail, format_err, Context, Error};
use proxmox_human_byte::HumanByte;
-use tracing::{info, warn};
+use tracing::{info, Level};
use pbs_api_types::{
print_store_and_ns, ArchiveType, Authid, BackupArchiveName, BackupDir, BackupGroup,
@@ -27,6 +27,7 @@ use pbs_datastore::manifest::{BackupManifest, FileInfo};
use pbs_datastore::read_chunk::AsyncReadChunk;
use pbs_datastore::{check_backup_owner, DataStore, DatastoreBackend, StoreProgress};
use pbs_tools::bounded_join_set::BoundedJoinSet;
+use pbs_tools::buffered_logger::{BufferedLogger, LogLineSender};
use pbs_tools::sha::sha256;
use super::sync::{
@@ -153,6 +154,8 @@ async fn pull_index_chunks<I: IndexFile>(
index: I,
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
backend: &DatastoreBackend,
+ archive_prefix: &str,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
use futures::stream::{self, StreamExt, TryStreamExt};
@@ -247,11 +250,16 @@ async fn pull_index_chunks<I: IndexFile>(
let bytes = bytes.load(Ordering::SeqCst);
let chunk_count = chunk_count.load(Ordering::SeqCst);
- info!(
- "downloaded {} ({}/s)",
- HumanByte::from(bytes),
- HumanByte::new_binary(bytes as f64 / elapsed.as_secs_f64()),
- );
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "{archive_prefix}: downloaded {} ({}/s)",
+ HumanByte::from(bytes),
+ HumanByte::new_binary(bytes as f64 / elapsed.as_secs_f64()),
+ ),
+ )
+ .await?;
Ok(SyncStats {
chunk_count,
@@ -292,6 +300,7 @@ async fn pull_single_archive<'a>(
archive_info: &'a FileInfo,
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
backend: &DatastoreBackend,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
let archive_name = &archive_info.filename;
let mut path = snapshot.full_path();
@@ -302,72 +311,104 @@ async fn pull_single_archive<'a>(
let mut sync_stats = SyncStats::default();
- info!("sync archive {archive_name}");
+ let archive_prefix = format!("{}: {archive_name}", snapshot.backup_time_string());
+
+ log_sender
+ .log(Level::INFO, format!("{archive_prefix}: sync archive"))
+ .await?;
- reader.load_file_into(archive_name, &tmp_path).await?;
+ reader
+ .load_file_into(archive_name, &tmp_path)
+ .await
+ .with_context(|| archive_prefix.clone())?;
- let mut tmpfile = std::fs::OpenOptions::new().read(true).open(&tmp_path)?;
+ let mut tmpfile = std::fs::OpenOptions::new()
+ .read(true)
+ .open(&tmp_path)
+ .with_context(|| archive_prefix.clone())?;
match ArchiveType::from_path(archive_name)? {
ArchiveType::DynamicIndex => {
let index = DynamicIndexReader::new(tmpfile).map_err(|err| {
- format_err!("unable to read dynamic index {:?} - {}", tmp_path, err)
+ format_err!("{archive_prefix}: unable to read dynamic index {tmp_path:?} - {err}")
})?;
let (csum, size) = index.compute_csum();
- verify_archive(archive_info, &csum, size)?;
+ verify_archive(archive_info, &csum, size).with_context(|| archive_prefix.clone())?;
if reader.skip_chunk_sync(snapshot.datastore().name()) {
- info!("skipping chunk sync for same datastore");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{archive_prefix}: skipping chunk sync for same datastore"),
+ )
+ .await?;
} else {
let stats = pull_index_chunks(
reader
.chunk_reader(archive_info.crypt_mode)
- .context("failed to get chunk reader")?,
+ .context("failed to get chunk reader")
+ .with_context(|| archive_prefix.clone())?,
snapshot.datastore().clone(),
index,
encountered_chunks,
backend,
+ &archive_prefix,
+ Arc::clone(&log_sender),
)
- .await?;
+ .await
+ .with_context(|| archive_prefix.clone())?;
sync_stats.add(stats);
}
}
ArchiveType::FixedIndex => {
let index = FixedIndexReader::new(tmpfile).map_err(|err| {
- format_err!("unable to read fixed index '{:?}' - {}", tmp_path, err)
+ format_err!("{archive_name}: unable to read fixed index '{tmp_path:?}' - {err}")
})?;
let (csum, size) = index.compute_csum();
- verify_archive(archive_info, &csum, size)?;
+ verify_archive(archive_info, &csum, size).with_context(|| archive_prefix.clone())?;
if reader.skip_chunk_sync(snapshot.datastore().name()) {
- info!("skipping chunk sync for same datastore");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{archive_prefix}: skipping chunk sync for same datastore"),
+ )
+ .await?;
} else {
let stats = pull_index_chunks(
reader
.chunk_reader(archive_info.crypt_mode)
- .context("failed to get chunk reader")?,
+ .context("failed to get chunk reader")
+ .with_context(|| archive_prefix.clone())?,
snapshot.datastore().clone(),
index,
encountered_chunks,
backend,
+ &archive_prefix,
+ Arc::clone(&log_sender),
)
- .await?;
+ .await
+ .with_context(|| archive_prefix.clone())?;
sync_stats.add(stats);
}
}
ArchiveType::Blob => {
- tmpfile.rewind()?;
- let (csum, size) = sha256(&mut tmpfile)?;
- verify_archive(archive_info, &csum, size)?;
+ proxmox_lang::try_block!({
+ tmpfile.rewind()?;
+ let (csum, size) = sha256(&mut tmpfile)?;
+ verify_archive(archive_info, &csum, size)
+ })
+ .with_context(|| archive_prefix.clone())?;
}
}
if let Err(err) = std::fs::rename(&tmp_path, &path) {
- bail!("Atomic rename file {:?} failed - {}", path, err);
+ bail!("{archive_prefix}: Atomic rename file {path:?} failed - {err}");
}
backend
.upload_index_to_backend(snapshot, archive_name)
- .await?;
+ .await
+ .with_context(|| archive_prefix.clone())?;
Ok(sync_stats)
}
@@ -388,13 +429,24 @@ async fn pull_snapshot<'a>(
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
corrupt: bool,
is_new: bool,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
+ let prefix = snapshot.backup_time_string().to_owned();
if is_new {
- info!("sync snapshot {}", snapshot.dir());
+ log_sender
+ .log(Level::INFO, format!("{prefix}: start sync"))
+ .await?;
} else if corrupt {
- info!("re-sync snapshot {} due to corruption", snapshot.dir());
+ log_sender
+ .log(
+ Level::INFO,
+ format!("re-sync snapshot {prefix} due to corruption"),
+ )
+ .await?;
} else {
- info!("re-sync snapshot {}", snapshot.dir());
+ log_sender
+ .log(Level::INFO, format!("re-sync snapshot {prefix}"))
+ .await?;
}
let mut sync_stats = SyncStats::default();
@@ -409,7 +461,8 @@ async fn pull_snapshot<'a>(
let tmp_manifest_blob;
if let Some(data) = reader
.load_file_into(MANIFEST_BLOB_NAME.as_ref(), &tmp_manifest_name)
- .await?
+ .await
+ .with_context(|| prefix.clone())?
{
tmp_manifest_blob = data;
} else {
@@ -419,28 +472,34 @@ async fn pull_snapshot<'a>(
if manifest_name.exists() && !corrupt {
let manifest_blob = proxmox_lang::try_block!({
let mut manifest_file = std::fs::File::open(&manifest_name).map_err(|err| {
- format_err!("unable to open local manifest {manifest_name:?} - {err}")
+ format_err!("{prefix}: unable to open local manifest {manifest_name:?} - {err}")
})?;
- let manifest_blob = DataBlob::load_from_reader(&mut manifest_file)?;
+ let manifest_blob =
+ DataBlob::load_from_reader(&mut manifest_file).with_context(|| prefix.clone())?;
Ok(manifest_blob)
})
.map_err(|err: Error| {
- format_err!("unable to read local manifest {manifest_name:?} - {err}")
+ format_err!("{prefix}: unable to read local manifest {manifest_name:?} - {err}")
})?;
if manifest_blob.raw_data() == tmp_manifest_blob.raw_data() {
if !client_log_name.exists() {
- reader.try_download_client_log(&client_log_name).await?;
+ reader
+ .try_download_client_log(&client_log_name)
+ .await
+ .with_context(|| prefix.clone())?;
};
- info!("no data changes");
+ log_sender
+ .log(Level::INFO, format!("{prefix}: no data changes"))
+ .await?;
let _ = std::fs::remove_file(&tmp_manifest_name);
return Ok(sync_stats); // nothing changed
}
}
let manifest_data = tmp_manifest_blob.raw_data().to_vec();
- let manifest = BackupManifest::try_from(tmp_manifest_blob)?;
+ let manifest = BackupManifest::try_from(tmp_manifest_blob).with_context(|| prefix.clone())?;
if ignore_not_verified_or_encrypted(
&manifest,
@@ -464,35 +523,54 @@ async fn pull_snapshot<'a>(
path.push(&item.filename);
if !corrupt && path.exists() {
- let filename: BackupArchiveName = item.filename.as_str().try_into()?;
+ let filename: BackupArchiveName = item
+ .filename
+ .as_str()
+ .try_into()
+ .with_context(|| prefix.clone())?;
match filename.archive_type() {
ArchiveType::DynamicIndex => {
- let index = DynamicIndexReader::open(&path)?;
+ let index = DynamicIndexReader::open(&path).with_context(|| prefix.clone())?;
let (csum, size) = index.compute_csum();
match manifest.verify_file(&filename, &csum, size) {
Ok(_) => continue,
Err(err) => {
- info!("detected changed file {path:?} - {err}");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: detected changed file {path:?} - {err}"),
+ )
+ .await?;
}
}
}
ArchiveType::FixedIndex => {
- let index = FixedIndexReader::open(&path)?;
+ let index = FixedIndexReader::open(&path).with_context(|| prefix.clone())?;
let (csum, size) = index.compute_csum();
match manifest.verify_file(&filename, &csum, size) {
Ok(_) => continue,
Err(err) => {
- info!("detected changed file {path:?} - {err}");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: detected changed file {path:?} - {err}"),
+ )
+ .await?;
}
}
}
ArchiveType::Blob => {
- let mut tmpfile = std::fs::File::open(&path)?;
- let (csum, size) = sha256(&mut tmpfile)?;
+ let mut tmpfile = std::fs::File::open(&path).with_context(|| prefix.clone())?;
+ let (csum, size) = sha256(&mut tmpfile).with_context(|| prefix.clone())?;
match manifest.verify_file(&filename, &csum, size) {
Ok(_) => continue,
Err(err) => {
- info!("detected changed file {path:?} - {err}");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: detected changed file {path:?} - {err}"),
+ )
+ .await?;
}
}
}
@@ -505,13 +583,14 @@ async fn pull_snapshot<'a>(
item,
encountered_chunks.clone(),
backend,
+ Arc::clone(&log_sender),
)
.await?;
sync_stats.add(stats);
}
if let Err(err) = std::fs::rename(&tmp_manifest_name, &manifest_name) {
- bail!("Atomic rename file {:?} failed - {}", manifest_name, err);
+ bail!("{prefix}: Atomic rename file {manifest_name:?} failed - {err}");
}
if let DatastoreBackend::S3(s3_client) = backend {
let object_key = pbs_datastore::s3::object_key_from_path(
@@ -524,33 +603,40 @@ async fn pull_snapshot<'a>(
let _is_duplicate = s3_client
.upload_replace_with_retry(object_key, data)
.await
- .context("failed to upload manifest to s3 backend")?;
+ .context("failed to upload manifest to s3 backend")
+ .with_context(|| prefix.clone())?;
}
if !client_log_name.exists() {
- reader.try_download_client_log(&client_log_name).await?;
+ reader
+ .try_download_client_log(&client_log_name)
+ .await
+ .with_context(|| prefix.clone())?;
if client_log_name.exists() {
if let DatastoreBackend::S3(s3_client) = backend {
let object_key = pbs_datastore::s3::object_key_from_path(
&snapshot.relative_path(),
CLIENT_LOG_BLOB_NAME.as_ref(),
)
- .context("invalid archive object key")?;
+ .context("invalid archive object key")
+ .with_context(|| prefix.clone())?;
let data = tokio::fs::read(&client_log_name)
.await
- .context("failed to read log file contents")?;
+ .context("failed to read log file contents")
+ .with_context(|| prefix.clone())?;
let contents = hyper::body::Bytes::from(data);
let _is_duplicate = s3_client
.upload_replace_with_retry(object_key, contents)
.await
- .context("failed to upload client log to s3 backend")?;
+ .context("failed to upload client log to s3 backend")
+ .with_context(|| prefix.clone())?;
}
}
};
snapshot
.cleanup_unreferenced_files(&manifest)
- .map_err(|err| format_err!("failed to cleanup unreferenced files - {err}"))?;
+ .map_err(|err| format_err!("{prefix}: failed to cleanup unreferenced files - {err}"))?;
Ok(sync_stats)
}
@@ -565,10 +651,14 @@ async fn pull_snapshot_from<'a>(
snapshot: &'a pbs_datastore::BackupDir,
encountered_chunks: Arc<Mutex<EncounteredChunks>>,
corrupt: bool,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
+ let prefix = format!("{}", snapshot.backup_time_string());
+
let (_path, is_new, _snap_lock) = snapshot
.datastore()
- .create_locked_backup_dir(snapshot.backup_ns(), snapshot.as_ref())?;
+ .create_locked_backup_dir(snapshot.backup_ns(), snapshot.as_ref())
+ .context(prefix.clone())?;
let result = pull_snapshot(
params,
@@ -577,6 +667,7 @@ async fn pull_snapshot_from<'a>(
encountered_chunks,
corrupt,
is_new,
+ Arc::clone(&log_sender),
)
.await;
@@ -589,11 +680,20 @@ async fn pull_snapshot_from<'a>(
snapshot.as_ref(),
true,
) {
- info!("cleanup error - {cleanup_err}");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: cleanup error - {cleanup_err}"),
+ )
+ .await?;
}
return Err(err);
}
- Ok(_) => info!("sync snapshot {} done", snapshot.dir()),
+ Ok(_) => {
+ log_sender
+ .log(Level::INFO, format!("{prefix}: sync done"))
+ .await?
+ }
}
}
@@ -622,7 +722,9 @@ async fn pull_group(
source_namespace: &BackupNamespace,
group: &BackupGroup,
shared_group_progress: Arc<SharedGroupProgress>,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
+ let prefix = format!("{group}");
let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
@@ -714,11 +816,15 @@ async fn pull_group(
.collect();
if already_synced_skip_info.count > 0 {
- info!("{already_synced_skip_info}");
+ log_sender
+ .log(Level::INFO, format!("{prefix}: {already_synced_skip_info}"))
+ .await?;
already_synced_skip_info.reset();
}
if transfer_last_skip_info.count > 0 {
- info!("{transfer_last_skip_info}");
+ log_sender
+ .log(Level::INFO, format!("{prefix}: {transfer_last_skip_info}"))
+ .await?;
transfer_last_skip_info.reset();
}
@@ -730,8 +836,8 @@ async fn pull_group(
.store
.backup_group(target_ns.clone(), group.clone());
if let Some(info) = backup_group.last_backup(true).unwrap_or(None) {
- let mut reusable_chunks = encountered_chunks.lock().unwrap();
if let Err(err) = proxmox_lang::try_block!({
+ let mut reusable_chunks = encountered_chunks.lock().unwrap();
let _snapshot_guard = info
.backup_dir
.lock_shared()
@@ -780,7 +886,12 @@ async fn pull_group(
}
Ok::<(), Error>(())
}) {
- warn!("Failed to collect reusable chunk from last backup: {err:#?}");
+ log_sender
+ .log(
+ Level::WARN,
+ format!("Failed to collect reusable chunk from last backup: {err:#?}"),
+ )
+ .await?;
}
}
@@ -805,13 +916,16 @@ async fn pull_group(
&to_snapshot,
encountered_chunks.clone(),
corrupt,
+ Arc::clone(&log_sender),
)
.await;
// Update done groups progress by other parallel running pulls
local_progress.done_groups = shared_group_progress.load_done();
local_progress.done_snapshots = pos as u64 + 1;
- info!("percentage done: group {group}: {local_progress}");
+ log_sender
+ .log(Level::INFO, format!("percentage done: {local_progress}"))
+ .await?;
let stats = result?; // stop on error
sync_stats.add(stats);
@@ -829,13 +943,23 @@ async fn pull_group(
continue;
}
if snapshot.is_protected() {
- info!(
- "don't delete vanished snapshot {} (protected)",
- snapshot.dir()
- );
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "{prefix}: don't delete vanished snapshot {} (protected)",
+ snapshot.dir(),
+ ),
+ )
+ .await?;
continue;
}
- info!("delete vanished snapshot {}", snapshot.dir());
+ log_sender
+ .log(
+ Level::INFO,
+ format!("delete vanished snapshot {}", snapshot.dir()),
+ )
+ .await?;
params
.target
.store
@@ -1035,10 +1159,7 @@ pub(crate) async fn pull_store(mut params: PullParameters) -> Result<SyncStats,
}
Err(err) => {
errors = true;
- info!(
- "Encountered errors while syncing namespace {} - {err}",
- &namespace,
- );
+ info!("Encountered errors while syncing namespace {namespace} - {err}");
}
};
}
@@ -1064,6 +1185,7 @@ async fn lock_and_pull_group(
namespace: &BackupNamespace,
target_namespace: &BackupNamespace,
shared_group_progress: Arc<SharedGroupProgress>,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
let (owner, _lock_guard) =
match params
@@ -1073,25 +1195,47 @@ async fn lock_and_pull_group(
{
Ok(res) => res,
Err(err) => {
- info!("sync group {group} failed - group lock failed: {err}");
- info!("create_locked_backup_group failed");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("sync group {group} failed - group lock failed: {err}"),
+ )
+ .await?;
+ log_sender
+ .log(Level::INFO, "create_locked_backup_group failed".to_string())
+ .await?;
return Err(err);
}
};
if params.owner != owner {
// only the owner is allowed to create additional snapshots
- info!(
- "sync group {group} failed - owner check failed ({} != {owner})",
- params.owner
- );
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "sync group {group} failed - owner check failed ({} != {owner})",
+ params.owner,
+ ),
+ )
+ .await?;
return Err(format_err!("owner check failed"));
}
- match pull_group(params, namespace, group, shared_group_progress).await {
+ match pull_group(
+ params,
+ namespace,
+ group,
+ shared_group_progress,
+ Arc::clone(&log_sender),
+ )
+ .await
+ {
Ok(stats) => Ok(stats),
Err(err) => {
- info!("sync group {group} failed - {err:#}");
+ log_sender
+ .log(Level::INFO, format!("sync group {group} failed - {err:#}"))
+ .await?;
Err(err)
}
}
@@ -1124,7 +1268,7 @@ async fn pull_ns(
list.sort_unstable();
info!(
- "found {} groups to sync (out of {unfiltered_count} total)",
+ "Found {} groups to sync (out of {unfiltered_count} total)",
list.len()
);
@@ -1143,6 +1287,10 @@ async fn pull_ns(
let shared_group_progress = Arc::new(SharedGroupProgress::with_total_groups(list.len()));
let mut group_workers = BoundedJoinSet::new(params.worker_threads.unwrap_or(1));
+ let (buffered_logger, sender_builder) = BufferedLogger::new(5, Duration::from_secs(1));
+ // runs until sender_builder and all senders build from it are being dropped
+ buffered_logger.run_log_collection();
+
let mut process_results = |results| {
for result in results {
match result {
@@ -1160,16 +1308,20 @@ async fn pull_ns(
let target_ns = target_ns.clone();
let params = Arc::clone(¶ms);
let group_progress_cloned = Arc::clone(&shared_group_progress);
+ let log_sender = Arc::new(sender_builder.sender_with_label(group.to_string()));
let results = group_workers
.spawn_task(async move {
- lock_and_pull_group(
+ let result = lock_and_pull_group(
Arc::clone(¶ms),
&group,
&namespace,
&target_ns,
group_progress_cloned,
+ Arc::clone(&log_sender),
)
- .await
+ .await;
+ let _ = log_sender.flush().await;
+ result
})
.await
.map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
@@ -1197,7 +1349,7 @@ async fn pull_ns(
if !local_group.apply_filters(¶ms.group_filter) {
continue;
}
- info!("delete vanished group '{local_group}'");
+ info!("Delete vanished group '{local_group}'");
let delete_stats_result = params
.target
.store
@@ -1206,7 +1358,7 @@ async fn pull_ns(
match delete_stats_result {
Ok(stats) => {
if !stats.all_removed() {
- info!("kept some protected snapshots of group '{local_group}'");
+ info!("Kept some protected snapshots of group '{local_group}'");
sync_stats.add(SyncStats::from(RemovedVanishedStats {
snapshots: stats.removed_snapshots(),
groups: 0,
@@ -1229,7 +1381,7 @@ async fn pull_ns(
Ok(())
});
if let Err(err) = result {
- info!("error during cleanup: {err}");
+ info!("Error during cleanup: {err}");
errors = true;
};
}
diff --git a/src/server/sync.rs b/src/server/sync.rs
index e88418442..17ed4839f 100644
--- a/src/server/sync.rs
+++ b/src/server/sync.rs
@@ -13,7 +13,6 @@ use futures::{future::FutureExt, select};
use hyper::http::StatusCode;
use pbs_config::BackupLockGuard;
use serde_json::json;
-use tokio::task::JoinSet;
use tracing::{info, warn};
use proxmox_human_byte::HumanByte;
@@ -136,13 +135,13 @@ impl SyncSourceReader for RemoteSourceReader {
Some(HttpError { code, message }) => match *code {
StatusCode::NOT_FOUND => {
info!(
- "skipping snapshot {} - vanished since start of sync",
+ "Snapshot {}: skipped because vanished since start of sync",
&self.dir
);
return Ok(None);
}
_ => {
- bail!("HTTP error {code} - {message}");
+ bail!("Snapshot {}: HTTP error {code} - {message}", &self.dir);
}
},
None => {
@@ -176,7 +175,8 @@ impl SyncSourceReader for RemoteSourceReader {
bail!("Atomic rename file {to_path:?} failed - {err}");
}
info!(
- "got backup log file {client_log_name}",
+ "Snapshot {snapshot}: got backup log file {client_log_name}",
+ snapshot = &self.dir,
client_log_name = client_log_name.deref()
);
}
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 12/15] sync: push: prepare push parameters to be shared across parallel tasks
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (10 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 13/15] server: sync: allow pushing groups concurrently Christian Ebner
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
When performing parallel group syncs, the push parameters must be
shared between all tasks which is not possible with regular
references due to lifetime and ownership issues. Pack them into an
atomic reference counter instead so they can easily be cloned when
required.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
src/server/push.rs | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 44a204e6b..14395fe61 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -405,6 +405,7 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
let (mut groups, mut snapshots) = (0, 0);
let mut stats = SyncStats::default();
+ let params = Arc::new(params);
for source_namespace in &source_namespaces {
let source_store_and_ns = print_store_and_ns(params.source.store.name(), source_namespace);
let target_namespace = params.map_to_target(source_namespace)?;
@@ -428,7 +429,7 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
continue;
}
- match push_namespace(source_namespace, ¶ms).await {
+ match push_namespace(source_namespace, Arc::clone(¶ms)).await {
Ok((sync_progress, sync_stats, sync_errors)) => {
errors |= sync_errors;
stats.add(sync_stats);
@@ -523,11 +524,11 @@ pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats,
/// Iterate over all backup groups in the namespace and push them to the target.
pub(crate) async fn push_namespace(
namespace: &BackupNamespace,
- params: &PushParameters,
+ params: Arc<PushParameters>,
) -> Result<(StoreProgress, SyncStats, bool), Error> {
let target_namespace = params.map_to_target(namespace)?;
// Check if user is allowed to perform backups on remote datastore
- check_ns_remote_datastore_privs(params, &target_namespace, PRIV_REMOTE_DATASTORE_BACKUP)
+ check_ns_remote_datastore_privs(¶ms, &target_namespace, PRIV_REMOTE_DATASTORE_BACKUP)
.context("Pushing to remote namespace not allowed")?;
let mut list: Vec<BackupGroup> = params
@@ -555,7 +556,7 @@ pub(crate) async fn push_namespace(
let mut stats = SyncStats::default();
let (owned_target_groups, not_owned_target_groups) =
- fetch_target_groups(params, &target_namespace).await?;
+ fetch_target_groups(¶ms, &target_namespace).await?;
for (done, group) in list.into_iter().enumerate() {
progress.done_groups = done as u64;
@@ -571,7 +572,7 @@ pub(crate) async fn push_namespace(
}
synced_groups.insert(group.clone());
- match push_group(params, namespace, &group, &mut progress).await {
+ match push_group(Arc::clone(¶ms), namespace, &group, &mut progress).await {
Ok(sync_stats) => stats.add(sync_stats),
Err(err) => {
warn!("Encountered errors: {err:#}");
@@ -591,7 +592,7 @@ pub(crate) async fn push_namespace(
continue;
}
- match remove_target_group(params, &target_namespace, &target_group).await {
+ match remove_target_group(¶ms, &target_namespace, &target_group).await {
Ok(delete_stats) => {
info!("Removed vanished group {target_group} from remote");
if delete_stats.protected_snapshots() > 0 {
@@ -673,7 +674,7 @@ async fn forget_target_snapshot(
/// - Iterate the snapshot list and push each snapshot individually
/// - (Optional): Remove vanished groups if `remove_vanished` flag is set
pub(crate) async fn push_group(
- params: &PushParameters,
+ params: Arc<PushParameters>,
namespace: &BackupNamespace,
group: &BackupGroup,
progress: &mut StoreProgress,
@@ -692,7 +693,7 @@ pub(crate) async fn push_group(
}
let target_namespace = params.map_to_target(namespace)?;
- let mut target_snapshots = fetch_target_snapshots(params, &target_namespace, group).await?;
+ let mut target_snapshots = fetch_target_snapshots(¶ms, &target_namespace, group).await?;
target_snapshots.sort_unstable_by_key(|a| a.backup.time);
let last_snapshot_time = target_snapshots
@@ -749,8 +750,13 @@ pub(crate) async fn push_group(
let mut stats = SyncStats::default();
let mut fetch_previous_manifest = !target_snapshots.is_empty();
for (pos, source_snapshot) in snapshots.into_iter().enumerate() {
- let result =
- push_snapshot(params, namespace, &source_snapshot, fetch_previous_manifest).await;
+ let result = push_snapshot(
+ ¶ms,
+ namespace,
+ &source_snapshot,
+ fetch_previous_manifest,
+ )
+ .await;
fetch_previous_manifest = true;
progress.done_snapshots = pos as u64 + 1;
@@ -773,7 +779,7 @@ pub(crate) async fn push_group(
);
continue;
}
- match forget_target_snapshot(params, &target_namespace, &snapshot.backup).await {
+ match forget_target_snapshot(¶ms, &target_namespace, &snapshot.backup).await {
Ok(()) => {
info!(
"Removed vanished snapshot {name} from remote",
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 13/15] server: sync: allow pushing groups concurrently
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (11 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 12/15] sync: push: prepare push parameters to be shared across parallel tasks Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 14/15] server: push: prefix log messages and add additional logging Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 15/15] ui: expose group worker setting in sync job edit window Christian Ebner
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Improve the throughput over high latency connections for sync jobs in
push direction by allowing to push up to a configured number of
backup groups concurrently. Just like for pull sync jobs, use an
bounded join set to run up to the configured number of group worker
tokio tasks in parallel, each connecting and pushing a group to
the reomte target.
The store progress and sync group housekeeping are placed behind a
atomic reference counted mutex to allow for concurrent access of
status updates.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- uses BoundedJoinSet implementation, refactored accordingly
src/server/push.rs | 102 ++++++++++++++++++++++++++++++++++-----------
1 file changed, 77 insertions(+), 25 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 14395fe61..9b7fb4522 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -27,6 +27,7 @@ use pbs_datastore::fixed_index::FixedIndexReader;
use pbs_datastore::index::IndexFile;
use pbs_datastore::read_chunk::AsyncReadChunk;
use pbs_datastore::{DataStore, StoreProgress};
+use pbs_tools::bounded_join_set::BoundedJoinSet;
use super::sync::{
check_namespace_depth_limit, exclude_not_verified_or_encrypted,
@@ -34,6 +35,7 @@ use super::sync::{
SyncSource, SyncStats,
};
use crate::api2::config::remote;
+use crate::server::sync::SharedGroupProgress;
/// Target for backups to be pushed to
pub(crate) struct PushTarget {
@@ -551,41 +553,62 @@ pub(crate) async fn push_namespace(
let mut errors = false;
// Remember synced groups, remove others when the remove vanished flag is set
- let mut synced_groups = HashSet::new();
+ let synced_groups = Arc::new(Mutex::new(HashSet::new()));
let mut progress = StoreProgress::new(list.len() as u64);
let mut stats = SyncStats::default();
let (owned_target_groups, not_owned_target_groups) =
fetch_target_groups(¶ms, &target_namespace).await?;
+ let not_owned_target_groups = Arc::new(not_owned_target_groups);
- for (done, group) in list.into_iter().enumerate() {
- progress.done_groups = done as u64;
- progress.done_snapshots = 0;
- progress.group_snapshots = 0;
+ let mut group_workers = BoundedJoinSet::new(params.worker_threads.unwrap_or(1));
+ let shared_group_progress = Arc::new(SharedGroupProgress::with_total_groups(list.len()));
- if not_owned_target_groups.contains(&group) {
- warn!(
- "Group '{group}' not owned by remote user '{}' on target, skipping upload",
- params.target.remote_user(),
- );
- continue;
- }
- synced_groups.insert(group.clone());
-
- match push_group(Arc::clone(¶ms), namespace, &group, &mut progress).await {
- Ok(sync_stats) => stats.add(sync_stats),
- Err(err) => {
- warn!("Encountered errors: {err:#}");
- warn!("Failed to push group {group} to remote!");
- errors = true;
+ let mut process_results = |results| {
+ for result in results {
+ match result {
+ Ok(sync_stats) => {
+ stats.add(sync_stats);
+ progress.done_groups = shared_group_progress.increment_done();
+ }
+ Err(()) => errors = true,
}
}
+ };
+
+ for group in list.into_iter() {
+ let namespace = namespace.clone();
+ let params = Arc::clone(¶ms);
+ let not_owned_target_groups = Arc::clone(¬_owned_target_groups);
+ let synced_groups = Arc::clone(&synced_groups);
+ let group_progress_cloned = Arc::clone(&shared_group_progress);
+ let results = group_workers
+ .spawn_task(async move {
+ push_group_do(
+ params,
+ &namespace,
+ &group,
+ group_progress_cloned,
+ synced_groups,
+ not_owned_target_groups,
+ )
+ .await
+ })
+ .await
+ .map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
+ process_results(results);
}
+ let results = group_workers
+ .join_active()
+ .await
+ .map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
+ process_results(results);
+
if params.remove_vanished {
// only ever allow to prune owned groups on target
for target_group in owned_target_groups {
- if synced_groups.contains(&target_group) {
+ if synced_groups.lock().unwrap().contains(&target_group) {
continue;
}
if !target_group.apply_filters(¶ms.group_filter) {
@@ -664,6 +687,32 @@ async fn forget_target_snapshot(
Ok(())
}
+async fn push_group_do(
+ params: Arc<PushParameters>,
+ namespace: &BackupNamespace,
+ group: &BackupGroup,
+ shared_group_progress: Arc<SharedGroupProgress>,
+ synced_groups: Arc<Mutex<HashSet<BackupGroup>>>,
+ not_owned_target_groups: Arc<HashSet<BackupGroup>>,
+) -> Result<SyncStats, ()> {
+ if not_owned_target_groups.contains(group) {
+ warn!(
+ "Group '{group}' not owned by remote user '{}' on target, skipping upload",
+ params.target.remote_user(),
+ );
+ shared_group_progress.increment_done();
+ return Ok(SyncStats::default());
+ }
+
+ synced_groups.lock().unwrap().insert(group.clone());
+ push_group(params, namespace, group, Arc::clone(&shared_group_progress))
+ .await
+ .map_err(|err| {
+ warn!("Group {group}: Encountered errors: {err:#}");
+ warn!("Failed to push group {group} to remote!");
+ })
+}
+
/// Push group including all snaphshots to target
///
/// Iterate over all snapshots in the group and push them to the target.
@@ -677,7 +726,7 @@ pub(crate) async fn push_group(
params: Arc<PushParameters>,
namespace: &BackupNamespace,
group: &BackupGroup,
- progress: &mut StoreProgress,
+ shared_group_progress: Arc<SharedGroupProgress>,
) -> Result<SyncStats, Error> {
let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
@@ -745,7 +794,8 @@ pub(crate) async fn push_group(
transfer_last_skip_info.reset();
}
- progress.group_snapshots = snapshots.len() as u64;
+ let mut local_progress = StoreProgress::new(shared_group_progress.total_groups());
+ local_progress.group_snapshots = snapshots.len() as u64;
let mut stats = SyncStats::default();
let mut fetch_previous_manifest = !target_snapshots.is_empty();
@@ -759,8 +809,10 @@ pub(crate) async fn push_group(
.await;
fetch_previous_manifest = true;
- progress.done_snapshots = pos as u64 + 1;
- info!("Percentage done: {progress}");
+ // Update done groups progress by other parallel running pushes
+ local_progress.done_groups = shared_group_progress.load_done();
+ local_progress.done_snapshots = pos as u64 + 1;
+ info!("Percentage done: group {group}: {local_progress}");
// stop on error
let sync_stats = result?;
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 14/15] server: push: prefix log messages and add additional logging
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (12 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 13/15] server: sync: allow pushing groups concurrently Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 15/15] ui: expose group worker setting in sync job edit window Christian Ebner
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Pushing groups and therefore also snapshots in parallel leads to
unordered log outputs, making it mostly impossible to relate a log
message to a backup snapshot/group.
Therefore, prefix push job log messages by the corresponding group or
snapshot and use the buffered logger implementation to buffer up to 5
lines subsequent lines with a timeout of 1 second. This reduces
interwoven log messages stemming from different groups.
Also, be more verbose for push syncs, adding additional log output
for the groups, snapshots and archives being pushed.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- uses BufferedLogger implementation, refactored accordingly
- improve log line prefixes
- add missing error contexts
src/server/push.rs | 245 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 199 insertions(+), 46 deletions(-)
diff --git a/src/server/push.rs b/src/server/push.rs
index 9b7fb4522..520bdd250 100644
--- a/src/server/push.rs
+++ b/src/server/push.rs
@@ -2,12 +2,13 @@
use std::collections::HashSet;
use std::sync::{Arc, Mutex};
+use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
use futures::stream::{self, StreamExt, TryStreamExt};
use tokio::sync::mpsc;
use tokio_stream::wrappers::ReceiverStream;
-use tracing::{info, warn};
+use tracing::{info, warn, Level};
use pbs_api_types::{
print_store_and_ns, ApiVersion, ApiVersionInfo, ArchiveType, Authid, BackupArchiveName,
@@ -28,6 +29,9 @@ use pbs_datastore::index::IndexFile;
use pbs_datastore::read_chunk::AsyncReadChunk;
use pbs_datastore::{DataStore, StoreProgress};
use pbs_tools::bounded_join_set::BoundedJoinSet;
+use pbs_tools::buffered_logger::{BufferedLogger, LogLineSender};
+
+use proxmox_human_byte::HumanByte;
use super::sync::{
check_namespace_depth_limit, exclude_not_verified_or_encrypted,
@@ -564,6 +568,10 @@ pub(crate) async fn push_namespace(
let mut group_workers = BoundedJoinSet::new(params.worker_threads.unwrap_or(1));
let shared_group_progress = Arc::new(SharedGroupProgress::with_total_groups(list.len()));
+ let (buffered_logger, sender_builder) = BufferedLogger::new(5, Duration::from_secs(1));
+ // runs until sender_builder and all senders build from it are being dropped
+ buffered_logger.run_log_collection();
+
let mut process_results = |results| {
for result in results {
match result {
@@ -571,7 +579,7 @@ pub(crate) async fn push_namespace(
stats.add(sync_stats);
progress.done_groups = shared_group_progress.increment_done();
}
- Err(()) => errors = true,
+ Err(_err) => errors = true,
}
}
};
@@ -582,17 +590,21 @@ pub(crate) async fn push_namespace(
let not_owned_target_groups = Arc::clone(¬_owned_target_groups);
let synced_groups = Arc::clone(&synced_groups);
let group_progress_cloned = Arc::clone(&shared_group_progress);
+ let log_sender = Arc::new(sender_builder.sender_with_label(group.to_string()));
let results = group_workers
.spawn_task(async move {
- push_group_do(
+ let result = push_group_do(
params,
&namespace,
&group,
group_progress_cloned,
synced_groups,
not_owned_target_groups,
+ Arc::clone(&log_sender),
)
- .await
+ .await;
+ let _ = log_sender.flush().await;
+ result
})
.await
.map_err(|err| format_err!("failed to join on worker task: {err:#}"))?;
@@ -694,23 +706,46 @@ async fn push_group_do(
shared_group_progress: Arc<SharedGroupProgress>,
synced_groups: Arc<Mutex<HashSet<BackupGroup>>>,
not_owned_target_groups: Arc<HashSet<BackupGroup>>,
-) -> Result<SyncStats, ()> {
+ log_sender: Arc<LogLineSender>,
+) -> Result<SyncStats, Error> {
if not_owned_target_groups.contains(group) {
- warn!(
- "Group '{group}' not owned by remote user '{}' on target, skipping upload",
- params.target.remote_user(),
- );
+ log_sender
+ .log(
+ Level::WARN,
+ format!(
+ "Group '{group}' not owned by remote user '{}' on target, skipping upload",
+ params.target.remote_user(),
+ ),
+ )
+ .await?;
shared_group_progress.increment_done();
return Ok(SyncStats::default());
}
synced_groups.lock().unwrap().insert(group.clone());
- push_group(params, namespace, group, Arc::clone(&shared_group_progress))
- .await
- .map_err(|err| {
- warn!("Group {group}: Encountered errors: {err:#}");
- warn!("Failed to push group {group} to remote!");
- })
+ match push_group(
+ params,
+ namespace,
+ group,
+ Arc::clone(&shared_group_progress),
+ Arc::clone(&log_sender),
+ )
+ .await
+ {
+ Ok(res) => Ok(res),
+ Err(err) => {
+ log_sender
+ .log(Level::WARN, format!("Encountered errors: {err:#}"))
+ .await?;
+ log_sender
+ .log(
+ Level::WARN,
+ format!("Failed to push group {group} to remote!"),
+ )
+ .await?;
+ Err(err)
+ }
+ }
}
/// Push group including all snaphshots to target
@@ -727,6 +762,7 @@ pub(crate) async fn push_group(
namespace: &BackupNamespace,
group: &BackupGroup,
shared_group_progress: Arc<SharedGroupProgress>,
+ log_sender: Arc<LogLineSender>,
) -> Result<SyncStats, Error> {
let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
@@ -738,7 +774,12 @@ pub(crate) async fn push_group(
snapshots.sort_unstable_by_key(|a| a.backup.time);
if snapshots.is_empty() {
- info!("Group '{group}' contains no snapshots to sync to remote");
+ log_sender
+ .log(
+ Level::INFO,
+ format!("Group '{group}' contains no snapshots to sync to remote"),
+ )
+ .await?;
}
let target_namespace = params.map_to_target(namespace)?;
@@ -786,11 +827,15 @@ pub(crate) async fn push_group(
.collect();
if already_synced_skip_info.count > 0 {
- info!("{already_synced_skip_info}");
+ log_sender
+ .log(Level::INFO, already_synced_skip_info.to_string())
+ .await?;
already_synced_skip_info.reset();
}
if transfer_last_skip_info.count > 0 {
- info!("{transfer_last_skip_info}");
+ log_sender
+ .log(Level::INFO, transfer_last_skip_info.to_string())
+ .await?;
transfer_last_skip_info.reset();
}
@@ -800,11 +845,18 @@ pub(crate) async fn push_group(
let mut stats = SyncStats::default();
let mut fetch_previous_manifest = !target_snapshots.is_empty();
for (pos, source_snapshot) in snapshots.into_iter().enumerate() {
+ let prefix = proxmox_time::epoch_to_rfc3339_utc(source_snapshot.time)
+ .context("invalid timestamp")?;
+ log_sender
+ .log(Level::INFO, format!("{prefix}: start sync"))
+ .await?;
let result = push_snapshot(
¶ms,
namespace,
&source_snapshot,
fetch_previous_manifest,
+ Arc::clone(&log_sender),
+ &prefix,
)
.await;
fetch_previous_manifest = true;
@@ -812,10 +864,18 @@ pub(crate) async fn push_group(
// Update done groups progress by other parallel running pushes
local_progress.done_groups = shared_group_progress.load_done();
local_progress.done_snapshots = pos as u64 + 1;
- info!("Percentage done: group {group}: {local_progress}");
// stop on error
let sync_stats = result?;
+ log_sender
+ .log(Level::INFO, format!("{prefix}: sync done"))
+ .await?;
+ log_sender
+ .log(
+ Level::INFO,
+ format!("Percentage done: group {group}: {local_progress}"),
+ )
+ .await?;
stats.add(sync_stats);
}
@@ -825,25 +885,42 @@ pub(crate) async fn push_group(
continue;
}
if snapshot.protected {
- info!(
- "Kept protected snapshot {name} on remote",
- name = snapshot.backup
- );
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "Kept protected snapshot {name} on remote",
+ name = snapshot.backup
+ ),
+ )
+ .await?;
continue;
}
match forget_target_snapshot(¶ms, &target_namespace, &snapshot.backup).await {
Ok(()) => {
- info!(
- "Removed vanished snapshot {name} from remote",
- name = snapshot.backup
- );
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "Removed vanished snapshot {name} from remote",
+ name = snapshot.backup
+ ),
+ )
+ .await?;
}
Err(err) => {
- warn!("Encountered errors: {err:#}");
- warn!(
- "Failed to remove vanished snapshot {name} from remote!",
- name = snapshot.backup
- );
+ log_sender
+ .log(Level::WARN, format!("Encountered errors: {err:#}"))
+ .await?;
+ log_sender
+ .log(
+ Level::WARN,
+ format!(
+ "Failed to remove vanished snapshot {name} from remote!",
+ name = snapshot.backup
+ ),
+ )
+ .await?;
}
}
stats.add(SyncStats::from(RemovedVanishedStats {
@@ -868,24 +945,40 @@ pub(crate) async fn push_snapshot(
namespace: &BackupNamespace,
snapshot: &BackupDir,
fetch_previous_manifest: bool,
+ log_sender: Arc<LogLineSender>,
+ prefix: &String,
) -> Result<SyncStats, Error> {
let mut stats = SyncStats::default();
- let target_ns = params.map_to_target(namespace)?;
+ let target_ns = params
+ .map_to_target(namespace)
+ .with_context(|| prefix.clone())?;
let backup_dir = params
.source
.store
- .backup_dir(namespace.clone(), snapshot.clone())?;
+ .backup_dir(namespace.clone(), snapshot.clone())
+ .with_context(|| prefix.clone())?;
// Reader locks the snapshot
- let reader = params.source.reader(namespace, snapshot).await?;
+ let reader = params
+ .source
+ .reader(namespace, snapshot)
+ .await
+ .with_context(|| prefix.clone())?;
// Does not lock the manifest, but the reader already assures a locked snapshot
let source_manifest = match backup_dir.load_manifest() {
Ok((manifest, _raw_size)) => manifest,
Err(err) => {
// No manifest in snapshot or failed to read, warn and skip
- log::warn!("Encountered errors: {err:#}");
- log::warn!("Failed to load manifest for '{snapshot}'!");
+ log_sender
+ .log(
+ Level::WARN,
+ format!("{prefix}: Encountered errors: {err:#}"),
+ )
+ .await?;
+ log_sender
+ .log(Level::WARN, format!("{prefix}: Failed to load manifest!"))
+ .await?;
return Ok(stats);
}
};
@@ -912,14 +1005,22 @@ pub(crate) async fn push_snapshot(
no_cache: false,
},
)
- .await?;
+ .await
+ .with_context(|| prefix.clone())?;
let mut previous_manifest = None;
// Use manifest of previous snapshots in group on target for chunk upload deduplication
if fetch_previous_manifest {
match backup_writer.download_previous_manifest().await {
Ok(manifest) => previous_manifest = Some(Arc::new(manifest)),
- Err(err) => log::info!("Could not download previous manifest - {err}"),
+ Err(err) => {
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: Could not download previous manifest - {err}"),
+ )
+ .await?
+ }
}
};
@@ -948,12 +1049,32 @@ pub(crate) async fn push_snapshot(
path.push(&entry.filename);
if path.try_exists()? {
let archive_name = BackupArchiveName::from_path(&entry.filename)?;
+ log_sender
+ .log(
+ Level::INFO,
+ format!("{prefix}: sync archive {archive_name}"),
+ )
+ .await?;
+ let archive_prefix = format!("{prefix}: {archive_name}");
match archive_name.archive_type() {
ArchiveType::Blob => {
let file = std::fs::File::open(&path)?;
let backup_stats = backup_writer
.upload_blob(file, archive_name.as_ref())
.await?;
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "{archive_prefix}: uploaded {} ({}/s)",
+ HumanByte::from(backup_stats.size),
+ HumanByte::new_binary(
+ backup_stats.size as f64 / backup_stats.duration.as_secs_f64()
+ ),
+ ),
+ )
+ .await
+ .with_context(|| archive_prefix.clone())?;
stats.add(SyncStats {
chunk_count: backup_stats.chunk_count as usize,
bytes: backup_stats.size as usize,
@@ -972,7 +1093,7 @@ pub(crate) async fn push_snapshot(
)
.await;
}
- let index = DynamicIndexReader::open(&path)?;
+ let index = DynamicIndexReader::open(&path).with_context(|| prefix.clone())?;
let chunk_reader = reader
.chunk_reader(entry.chunk_crypt_mode())
.context("failed to get chunk reader")?;
@@ -984,7 +1105,20 @@ pub(crate) async fn push_snapshot(
IndexType::Dynamic,
known_chunks.clone(),
)
- .await?;
+ .await
+ .with_context(|| archive_prefix.clone())?;
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "{archive_prefix}: uploaded {} ({}/s)",
+ HumanByte::from(sync_stats.bytes),
+ HumanByte::new_binary(
+ sync_stats.bytes as f64 / sync_stats.elapsed.as_secs_f64()
+ ),
+ ),
+ )
+ .await?;
stats.add(sync_stats);
}
ArchiveType::FixedIndex => {
@@ -1001,7 +1135,8 @@ pub(crate) async fn push_snapshot(
let index = FixedIndexReader::open(&path)?;
let chunk_reader = reader
.chunk_reader(entry.chunk_crypt_mode())
- .context("failed to get chunk reader")?;
+ .context("failed to get chunk reader")
+ .with_context(|| archive_prefix.clone())?;
let size = index.index_bytes();
let sync_stats = push_index(
&archive_name,
@@ -1011,7 +1146,20 @@ pub(crate) async fn push_snapshot(
IndexType::Fixed(Some(size)),
known_chunks.clone(),
)
- .await?;
+ .await
+ .with_context(|| archive_prefix.clone())?;
+ log_sender
+ .log(
+ Level::INFO,
+ format!(
+ "{archive_prefix}: uploaded {} ({}/s)",
+ HumanByte::from(sync_stats.bytes),
+ HumanByte::new_binary(
+ sync_stats.bytes as f64 / sync_stats.elapsed.as_secs_f64()
+ ),
+ ),
+ )
+ .await?;
stats.add(sync_stats);
}
}
@@ -1032,7 +1180,8 @@ pub(crate) async fn push_snapshot(
client_log_name.as_ref(),
upload_options.clone(),
)
- .await?;
+ .await
+ .with_context(|| prefix.clone())?;
}
// Rewrite manifest for pushed snapshot, recreating manifest from source on target
@@ -1044,8 +1193,12 @@ pub(crate) async fn push_snapshot(
MANIFEST_BLOB_NAME.as_ref(),
upload_options,
)
- .await?;
- backup_writer.finish().await?;
+ .await
+ .with_context(|| prefix.clone())?;
+ backup_writer
+ .finish()
+ .await
+ .with_context(|| prefix.clone())?;
stats.add(SyncStats {
chunk_count: backup_stats.chunk_count as usize,
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH proxmox-backup v6 15/15] ui: expose group worker setting in sync job edit window
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
` (13 preceding siblings ...)
2026-04-17 9:26 ` [PATCH proxmox-backup v6 14/15] server: push: prefix log messages and add additional logging Christian Ebner
@ 2026-04-17 9:26 ` Christian Ebner
14 siblings, 0 replies; 16+ messages in thread
From: Christian Ebner @ 2026-04-17 9:26 UTC (permalink / raw)
To: pbs-devel
Allows to configure the number of parallel group works via the web
interface.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 5:
- no changes
www/window/SyncJobEdit.js | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/www/window/SyncJobEdit.js b/www/window/SyncJobEdit.js
index 074c7855a..26c82bc71 100644
--- a/www/window/SyncJobEdit.js
+++ b/www/window/SyncJobEdit.js
@@ -448,6 +448,17 @@ Ext.define('PBS.window.SyncJobEdit', {
deleteEmpty: '{!isCreate}',
},
},
+ {
+ xtype: 'proxmoxintegerfield',
+ name: 'worker-threads',
+ fieldLabel: gettext('# of Group Workers'),
+ emptyText: '1',
+ minValue: 1,
+ maxValue: 32,
+ cbind: {
+ deleteEmpty: '{!isCreate}',
+ },
+ },
{
xtype: 'proxmoxcheckbox',
fieldLabel: gettext('Re-sync Corrupt'),
--
2.47.3
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-04-17 9:27 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-17 9:26 [PATCH proxmox{,-backup} v6 00/15] fix #4182: concurrent group pull/push support for sync jobs Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox v6 01/15] pbs api types: add `worker-threads` to sync job config Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 02/15] tools: group and sort module imports Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 03/15] tools: implement buffered logger for concurrent log messages Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 04/15] tools: add bounded join set to run concurrent tasks bound by limit Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 05/15] client: backup writer: fix upload stats size and rate for push sync Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 06/15] api: config/sync: add optional `worker-threads` property Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 07/15] sync: pull: revert avoiding reinstantiation for encountered chunks map Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 08/15] sync: pull: factor out backup group locking and owner check Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 09/15] sync: pull: prepare pull parameters to be shared across parallel tasks Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 10/15] fix #4182: server: sync: allow pulling backup groups in parallel Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 11/15] server: pull: prefix log messages and add error context Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 12/15] sync: push: prepare push parameters to be shared across parallel tasks Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 13/15] server: sync: allow pushing groups concurrently Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 14/15] server: push: prefix log messages and add additional logging Christian Ebner
2026-04-17 9:26 ` [PATCH proxmox-backup v6 15/15] ui: expose group worker setting in sync job edit window Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox