From: Christian Ebner <c.ebner@proxmox.com>
To: "Proxmox Backup Server development discussion"
<pbs-devel@lists.proxmox.com>,
"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend
Date: Thu, 25 Sep 2025 15:08:52 +0200 [thread overview]
Message-ID: <ed58f340-a19c-490c-a9ba-a85cbb4fd3d7@proxmox.com> (raw)
In-Reply-To: <1758803678.fo1e90c9lf.astroid@yuna.none>
On 9/25/25 2:41 PM, Fabian Grünbichler wrote:
> On September 24, 2025 4:56 pm, Christian Ebner wrote:
>> Closing of the fixed or dynamic index files with s3 backend will call
>> async code, which must be avoided because of possible deadlocks [0].
>> Therefore, perform all changes on the shared backup state and drop the
>> guard before uploading the fixed index file to the s3 backend.
>>
>> Account for active backend operations and check consistency, since it
>> must be assured that all active backend operations are finished before
>> the finish call can succeed.
>>
>> [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6750
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> src/api2/backup/environment.rs | 77 +++++++++++++++++++++++-----------
>> 1 file changed, 53 insertions(+), 24 deletions(-)
>>
>> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
>> index d5e6869cd..e535891a4 100644
>> --- a/src/api2/backup/environment.rs
>> +++ b/src/api2/backup/environment.rs
>> @@ -82,6 +82,7 @@ struct SharedBackupState {
>> finished: bool,
>> uid_counter: usize,
>> file_counter: usize, // successfully uploaded files
>> + active_backend_operations: usize,
>> dynamic_writers: HashMap<usize, DynamicWriterState>,
>> fixed_writers: HashMap<usize, FixedWriterState>,
>> known_chunks: KnownChunksMap,
>> @@ -135,6 +136,7 @@ impl BackupEnvironment {
>> finished: false,
>> uid_counter: 0,
>> file_counter: 0,
>> + active_backend_operations: 0,
>> dynamic_writers: HashMap::new(),
>> fixed_writers: HashMap::new(),
>> known_chunks: HashMap::new(),
>> @@ -483,15 +485,10 @@ impl BackupEnvironment {
>> );
>> }
>>
>> - // For S3 backends, upload the index file to the object store after closing
>> - if let DatastoreBackend::S3(s3_client) = &self.backend {
>> - self.s3_upload_index(s3_client, &data.name)
>> - .context("failed to upload dynamic index to s3 backend")?;
>> - self.log(format!(
>> - "Uploaded dynamic index file to s3 backend: {}",
>> - data.name
>> - ))
>> - }
>> + state.file_counter += 1;
>> + state.backup_size += size;
>> + state.backup_stat = state.backup_stat + data.upload_stat;
>> + state.active_backend_operations += 1;
>>
>> self.log_upload_stat(
>> &data.name,
>> @@ -502,9 +499,21 @@ impl BackupEnvironment {
>> &data.upload_stat,
>> );
>>
>> - state.file_counter += 1;
>> - state.backup_size += size;
>> - state.backup_stat = state.backup_stat + data.upload_stat;
>> + // never hold mutex guard during s3 upload due to possible deadlocks
>> + drop(state);
>> +
>> + // For S3 backends, upload the index file to the object store after closing
>> + if let DatastoreBackend::S3(s3_client) = &self.backend {
>> + self.s3_upload_index(s3_client, &data.name)
>> + .context("failed to upload dynamic index to s3 backend")?;
>> + self.log(format!(
>> + "Uploaded dynamic index file to s3 backend: {}",
>> + data.name
>> + ))
>> + }
>> +
>> + let mut state = self.state.lock().unwrap();
>> + state.active_backend_operations -= 1;
>
> these two hunks are okay, although we could also reuse the registered
> writers map to encode whether a writer is active, being processed/closed
> or doesn't exist? would allow more fine-grained logging..
Nice idea! So by setting an enum variant for the respective writer
state, we can indeed keep the binding to the writer and improve logging.
Will incorporate this in a v2.
>> Ok(())
>> }
>> @@ -567,15 +576,10 @@ impl BackupEnvironment {
>> );
>> }
>>
>> - // For S3 backends, upload the index file to the object store after closing
>> - if let DatastoreBackend::S3(s3_client) = &self.backend {
>> - self.s3_upload_index(s3_client, &data.name)
>> - .context("failed to upload fixed index to s3 backend")?;
>> - self.log(format!(
>> - "Uploaded fixed index file to object store: {}",
>> - data.name
>> - ))
>> - }
>> + state.file_counter += 1;
>> + state.backup_size += size;
>> + state.backup_stat = state.backup_stat + data.upload_stat;
>> + state.active_backend_operations += 1;
>>
>> self.log_upload_stat(
>> &data.name,
>> @@ -586,9 +590,21 @@ impl BackupEnvironment {
>> &data.upload_stat,
>> );
>>
>> - state.file_counter += 1;
>> - state.backup_size += size;
>> - state.backup_stat = state.backup_stat + data.upload_stat;
>> + // never hold mutex guard during s3 upload due to possible deadlocks
>> + drop(state);
>> +
>> + // For S3 backends, upload the index file to the object store after closing
>> + if let DatastoreBackend::S3(s3_client) = &self.backend {
>> + self.s3_upload_index(s3_client, &data.name)
>> + .context("failed to upload fixed index to s3 backend")?;
>> + self.log(format!(
>> + "Uploaded fixed index file to object store: {}",
>> + data.name
>> + ))
>> + }
>> +
>> + let mut state = self.state.lock().unwrap();
>> + state.active_backend_operations -= 1;
>>
>> Ok(())
>> }
>> @@ -645,6 +661,13 @@ impl BackupEnvironment {
>> bail!("found open index writer - unable to finish backup");
>> }
>>
>> + if state.active_backend_operations != 0 {
>> + bail!(
>> + "backup task still has {} active operations.",
>> + state.active_backend_operations,
>> + );
>> + }
>> +
>> if state.file_counter == 0 {
>> bail!("backup does not contain valid files (file count == 0)");
>> }
>> @@ -753,6 +776,12 @@ impl BackupEnvironment {
>> if !state.finished {
>> bail!("backup ended but finished flag is not set.");
>> }
>> + if state.active_backend_operations != 0 {
>> + bail!(
>> + "backup ended but {} active backend operations.",
>> + state.active_backend_operations,
>> + );
>> + }
>
> there's now an inconsistency between ensure_finished(), which checks both
> conditions, and finished() (used to determine whether a connection being
> interrupted is benign or not!), which just checks the finished flag..
Indeed, both will have to check both cases now, although will need to
check how this will play out with the suggested changes to the other patch.
>
>> Ok(())
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-09-25 13:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-24 14:56 [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups Christian Ebner
2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner
2025-09-25 12:41 ` Fabian Grünbichler
2025-09-25 13:08 ` Christian Ebner [this message]
2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner
2025-09-25 12:46 ` Fabian Grünbichler
2025-09-25 13:20 ` Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed58f340-a19c-490c-a9ba-a85cbb4fd3d7@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox