From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend
Date: Thu, 25 Sep 2025 14:41:19 +0200 [thread overview]
Message-ID: <1758803678.fo1e90c9lf.astroid@yuna.none> (raw)
In-Reply-To: <20250924145612.188579-2-c.ebner@proxmox.com>
On September 24, 2025 4:56 pm, Christian Ebner wrote:
> Closing of the fixed or dynamic index files with s3 backend will call
> async code, which must be avoided because of possible deadlocks [0].
> Therefore, perform all changes on the shared backup state and drop the
> guard before uploading the fixed index file to the s3 backend.
>
> Account for active backend operations and check consistency, since it
> must be assured that all active backend operations are finished before
> the finish call can succeed.
>
> [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use
>
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6750
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> src/api2/backup/environment.rs | 77 +++++++++++++++++++++++-----------
> 1 file changed, 53 insertions(+), 24 deletions(-)
>
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index d5e6869cd..e535891a4 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -82,6 +82,7 @@ struct SharedBackupState {
> finished: bool,
> uid_counter: usize,
> file_counter: usize, // successfully uploaded files
> + active_backend_operations: usize,
> dynamic_writers: HashMap<usize, DynamicWriterState>,
> fixed_writers: HashMap<usize, FixedWriterState>,
> known_chunks: KnownChunksMap,
> @@ -135,6 +136,7 @@ impl BackupEnvironment {
> finished: false,
> uid_counter: 0,
> file_counter: 0,
> + active_backend_operations: 0,
> dynamic_writers: HashMap::new(),
> fixed_writers: HashMap::new(),
> known_chunks: HashMap::new(),
> @@ -483,15 +485,10 @@ impl BackupEnvironment {
> );
> }
>
> - // For S3 backends, upload the index file to the object store after closing
> - if let DatastoreBackend::S3(s3_client) = &self.backend {
> - self.s3_upload_index(s3_client, &data.name)
> - .context("failed to upload dynamic index to s3 backend")?;
> - self.log(format!(
> - "Uploaded dynamic index file to s3 backend: {}",
> - data.name
> - ))
> - }
> + state.file_counter += 1;
> + state.backup_size += size;
> + state.backup_stat = state.backup_stat + data.upload_stat;
> + state.active_backend_operations += 1;
>
> self.log_upload_stat(
> &data.name,
> @@ -502,9 +499,21 @@ impl BackupEnvironment {
> &data.upload_stat,
> );
>
> - state.file_counter += 1;
> - state.backup_size += size;
> - state.backup_stat = state.backup_stat + data.upload_stat;
> + // never hold mutex guard during s3 upload due to possible deadlocks
> + drop(state);
> +
> + // For S3 backends, upload the index file to the object store after closing
> + if let DatastoreBackend::S3(s3_client) = &self.backend {
> + self.s3_upload_index(s3_client, &data.name)
> + .context("failed to upload dynamic index to s3 backend")?;
> + self.log(format!(
> + "Uploaded dynamic index file to s3 backend: {}",
> + data.name
> + ))
> + }
> +
> + let mut state = self.state.lock().unwrap();
> + state.active_backend_operations -= 1;
these two hunks are okay, although we could also reuse the registered
writers map to encode whether a writer is active, being processed/closed
or doesn't exist? would allow more fine-grained logging..
> Ok(())
> }
> @@ -567,15 +576,10 @@ impl BackupEnvironment {
> );
> }
>
> - // For S3 backends, upload the index file to the object store after closing
> - if let DatastoreBackend::S3(s3_client) = &self.backend {
> - self.s3_upload_index(s3_client, &data.name)
> - .context("failed to upload fixed index to s3 backend")?;
> - self.log(format!(
> - "Uploaded fixed index file to object store: {}",
> - data.name
> - ))
> - }
> + state.file_counter += 1;
> + state.backup_size += size;
> + state.backup_stat = state.backup_stat + data.upload_stat;
> + state.active_backend_operations += 1;
>
> self.log_upload_stat(
> &data.name,
> @@ -586,9 +590,21 @@ impl BackupEnvironment {
> &data.upload_stat,
> );
>
> - state.file_counter += 1;
> - state.backup_size += size;
> - state.backup_stat = state.backup_stat + data.upload_stat;
> + // never hold mutex guard during s3 upload due to possible deadlocks
> + drop(state);
> +
> + // For S3 backends, upload the index file to the object store after closing
> + if let DatastoreBackend::S3(s3_client) = &self.backend {
> + self.s3_upload_index(s3_client, &data.name)
> + .context("failed to upload fixed index to s3 backend")?;
> + self.log(format!(
> + "Uploaded fixed index file to object store: {}",
> + data.name
> + ))
> + }
> +
> + let mut state = self.state.lock().unwrap();
> + state.active_backend_operations -= 1;
>
> Ok(())
> }
> @@ -645,6 +661,13 @@ impl BackupEnvironment {
> bail!("found open index writer - unable to finish backup");
> }
>
> + if state.active_backend_operations != 0 {
> + bail!(
> + "backup task still has {} active operations.",
> + state.active_backend_operations,
> + );
> + }
> +
> if state.file_counter == 0 {
> bail!("backup does not contain valid files (file count == 0)");
> }
> @@ -753,6 +776,12 @@ impl BackupEnvironment {
> if !state.finished {
> bail!("backup ended but finished flag is not set.");
> }
> + if state.active_backend_operations != 0 {
> + bail!(
> + "backup ended but {} active backend operations.",
> + state.active_backend_operations,
> + );
> + }
there's now an inconsistency between ensure_finished(), which checks both
conditions, and finished() (used to determine whether a connection being
interrupted is benign or not!), which just checks the finished flag..
> Ok(())
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-09-25 12:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-24 14:56 [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups Christian Ebner
2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner
2025-09-25 12:41 ` Fabian Grünbichler [this message]
2025-09-25 13:08 ` Christian Ebner
2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner
2025-09-25 12:46 ` Fabian Grünbichler
2025-09-25 13:20 ` Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1758803678.fo1e90c9lf.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox