* [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups @ 2025-09-24 14:56 Christian Ebner 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner 0 siblings, 2 replies; 7+ messages in thread From: Christian Ebner @ 2025-09-24 14:56 UTC (permalink / raw) To: pbs-devel These patches aim to fix a deadlock which can occur during backup jobs to datastores backed by S3 backend. The deadlock most likely is caused by the mutex guard for the backup shared state being held while entering the tokio::task::block_in_place context and executing async code, which however can lead to deadlocks as described in [0]. Therefore, these patches avoid holding the mutex guard for the shared backup state while performing the s3 backend operations, by prematurely dropping it. To avoid inconsistencies, account and check for active backend operations when closing fixed/dynamic index files and on finish api calls. Sending this as an RFC with the intend to provide the patches for a potential test build to verify it resolves the issue for affected users. [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use Link to the bugtracker issue: https://bugzilla.proxmox.com/show_bug.cgi?id=6750 Another report in the community forum: https://forum.proxmox.com/threads/171422/ proxmox-backup: Christian Ebner (2): fix #6750: api: avoid possible deadlock on datastores with s3 backend api: backup: never hold mutex guard when doing manifest update src/api2/backup/environment.rs | 89 +++++++++++++++++++++++----------- 1 file changed, 61 insertions(+), 28 deletions(-) Summary over all repositories: 1 files changed, 61 insertions(+), 28 deletions(-) -- Generated by git-murpp 0.8.1 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend 2025-09-24 14:56 [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups Christian Ebner @ 2025-09-24 14:56 ` Christian Ebner 2025-09-25 12:41 ` Fabian Grünbichler 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner 1 sibling, 1 reply; 7+ messages in thread From: Christian Ebner @ 2025-09-24 14:56 UTC (permalink / raw) To: pbs-devel Closing of the fixed or dynamic index files with s3 backend will call async code, which must be avoided because of possible deadlocks [0]. Therefore, perform all changes on the shared backup state and drop the guard before uploading the fixed index file to the s3 backend. Account for active backend operations and check consistency, since it must be assured that all active backend operations are finished before the finish call can succeed. [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6750 Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- src/api2/backup/environment.rs | 77 +++++++++++++++++++++++----------- 1 file changed, 53 insertions(+), 24 deletions(-) diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs index d5e6869cd..e535891a4 100644 --- a/src/api2/backup/environment.rs +++ b/src/api2/backup/environment.rs @@ -82,6 +82,7 @@ struct SharedBackupState { finished: bool, uid_counter: usize, file_counter: usize, // successfully uploaded files + active_backend_operations: usize, dynamic_writers: HashMap<usize, DynamicWriterState>, fixed_writers: HashMap<usize, FixedWriterState>, known_chunks: KnownChunksMap, @@ -135,6 +136,7 @@ impl BackupEnvironment { finished: false, uid_counter: 0, file_counter: 0, + active_backend_operations: 0, dynamic_writers: HashMap::new(), fixed_writers: HashMap::new(), known_chunks: HashMap::new(), @@ -483,15 +485,10 @@ impl BackupEnvironment { ); } - // For S3 backends, upload the index file to the object store after closing - if let DatastoreBackend::S3(s3_client) = &self.backend { - self.s3_upload_index(s3_client, &data.name) - .context("failed to upload dynamic index to s3 backend")?; - self.log(format!( - "Uploaded dynamic index file to s3 backend: {}", - data.name - )) - } + state.file_counter += 1; + state.backup_size += size; + state.backup_stat = state.backup_stat + data.upload_stat; + state.active_backend_operations += 1; self.log_upload_stat( &data.name, @@ -502,9 +499,21 @@ impl BackupEnvironment { &data.upload_stat, ); - state.file_counter += 1; - state.backup_size += size; - state.backup_stat = state.backup_stat + data.upload_stat; + // never hold mutex guard during s3 upload due to possible deadlocks + drop(state); + + // For S3 backends, upload the index file to the object store after closing + if let DatastoreBackend::S3(s3_client) = &self.backend { + self.s3_upload_index(s3_client, &data.name) + .context("failed to upload dynamic index to s3 backend")?; + self.log(format!( + "Uploaded dynamic index file to s3 backend: {}", + data.name + )) + } + + let mut state = self.state.lock().unwrap(); + state.active_backend_operations -= 1; Ok(()) } @@ -567,15 +576,10 @@ impl BackupEnvironment { ); } - // For S3 backends, upload the index file to the object store after closing - if let DatastoreBackend::S3(s3_client) = &self.backend { - self.s3_upload_index(s3_client, &data.name) - .context("failed to upload fixed index to s3 backend")?; - self.log(format!( - "Uploaded fixed index file to object store: {}", - data.name - )) - } + state.file_counter += 1; + state.backup_size += size; + state.backup_stat = state.backup_stat + data.upload_stat; + state.active_backend_operations += 1; self.log_upload_stat( &data.name, @@ -586,9 +590,21 @@ impl BackupEnvironment { &data.upload_stat, ); - state.file_counter += 1; - state.backup_size += size; - state.backup_stat = state.backup_stat + data.upload_stat; + // never hold mutex guard during s3 upload due to possible deadlocks + drop(state); + + // For S3 backends, upload the index file to the object store after closing + if let DatastoreBackend::S3(s3_client) = &self.backend { + self.s3_upload_index(s3_client, &data.name) + .context("failed to upload fixed index to s3 backend")?; + self.log(format!( + "Uploaded fixed index file to object store: {}", + data.name + )) + } + + let mut state = self.state.lock().unwrap(); + state.active_backend_operations -= 1; Ok(()) } @@ -645,6 +661,13 @@ impl BackupEnvironment { bail!("found open index writer - unable to finish backup"); } + if state.active_backend_operations != 0 { + bail!( + "backup task still has {} active operations.", + state.active_backend_operations, + ); + } + if state.file_counter == 0 { bail!("backup does not contain valid files (file count == 0)"); } @@ -753,6 +776,12 @@ impl BackupEnvironment { if !state.finished { bail!("backup ended but finished flag is not set."); } + if state.active_backend_operations != 0 { + bail!( + "backup ended but {} active backend operations.", + state.active_backend_operations, + ); + } Ok(()) } -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner @ 2025-09-25 12:41 ` Fabian Grünbichler 2025-09-25 13:08 ` Christian Ebner 0 siblings, 1 reply; 7+ messages in thread From: Fabian Grünbichler @ 2025-09-25 12:41 UTC (permalink / raw) To: Proxmox Backup Server development discussion On September 24, 2025 4:56 pm, Christian Ebner wrote: > Closing of the fixed or dynamic index files with s3 backend will call > async code, which must be avoided because of possible deadlocks [0]. > Therefore, perform all changes on the shared backup state and drop the > guard before uploading the fixed index file to the s3 backend. > > Account for active backend operations and check consistency, since it > must be assured that all active backend operations are finished before > the finish call can succeed. > > [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use > > Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6750 > Signed-off-by: Christian Ebner <c.ebner@proxmox.com> > --- > src/api2/backup/environment.rs | 77 +++++++++++++++++++++++----------- > 1 file changed, 53 insertions(+), 24 deletions(-) > > diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs > index d5e6869cd..e535891a4 100644 > --- a/src/api2/backup/environment.rs > +++ b/src/api2/backup/environment.rs > @@ -82,6 +82,7 @@ struct SharedBackupState { > finished: bool, > uid_counter: usize, > file_counter: usize, // successfully uploaded files > + active_backend_operations: usize, > dynamic_writers: HashMap<usize, DynamicWriterState>, > fixed_writers: HashMap<usize, FixedWriterState>, > known_chunks: KnownChunksMap, > @@ -135,6 +136,7 @@ impl BackupEnvironment { > finished: false, > uid_counter: 0, > file_counter: 0, > + active_backend_operations: 0, > dynamic_writers: HashMap::new(), > fixed_writers: HashMap::new(), > known_chunks: HashMap::new(), > @@ -483,15 +485,10 @@ impl BackupEnvironment { > ); > } > > - // For S3 backends, upload the index file to the object store after closing > - if let DatastoreBackend::S3(s3_client) = &self.backend { > - self.s3_upload_index(s3_client, &data.name) > - .context("failed to upload dynamic index to s3 backend")?; > - self.log(format!( > - "Uploaded dynamic index file to s3 backend: {}", > - data.name > - )) > - } > + state.file_counter += 1; > + state.backup_size += size; > + state.backup_stat = state.backup_stat + data.upload_stat; > + state.active_backend_operations += 1; > > self.log_upload_stat( > &data.name, > @@ -502,9 +499,21 @@ impl BackupEnvironment { > &data.upload_stat, > ); > > - state.file_counter += 1; > - state.backup_size += size; > - state.backup_stat = state.backup_stat + data.upload_stat; > + // never hold mutex guard during s3 upload due to possible deadlocks > + drop(state); > + > + // For S3 backends, upload the index file to the object store after closing > + if let DatastoreBackend::S3(s3_client) = &self.backend { > + self.s3_upload_index(s3_client, &data.name) > + .context("failed to upload dynamic index to s3 backend")?; > + self.log(format!( > + "Uploaded dynamic index file to s3 backend: {}", > + data.name > + )) > + } > + > + let mut state = self.state.lock().unwrap(); > + state.active_backend_operations -= 1; these two hunks are okay, although we could also reuse the registered writers map to encode whether a writer is active, being processed/closed or doesn't exist? would allow more fine-grained logging.. > Ok(()) > } > @@ -567,15 +576,10 @@ impl BackupEnvironment { > ); > } > > - // For S3 backends, upload the index file to the object store after closing > - if let DatastoreBackend::S3(s3_client) = &self.backend { > - self.s3_upload_index(s3_client, &data.name) > - .context("failed to upload fixed index to s3 backend")?; > - self.log(format!( > - "Uploaded fixed index file to object store: {}", > - data.name > - )) > - } > + state.file_counter += 1; > + state.backup_size += size; > + state.backup_stat = state.backup_stat + data.upload_stat; > + state.active_backend_operations += 1; > > self.log_upload_stat( > &data.name, > @@ -586,9 +590,21 @@ impl BackupEnvironment { > &data.upload_stat, > ); > > - state.file_counter += 1; > - state.backup_size += size; > - state.backup_stat = state.backup_stat + data.upload_stat; > + // never hold mutex guard during s3 upload due to possible deadlocks > + drop(state); > + > + // For S3 backends, upload the index file to the object store after closing > + if let DatastoreBackend::S3(s3_client) = &self.backend { > + self.s3_upload_index(s3_client, &data.name) > + .context("failed to upload fixed index to s3 backend")?; > + self.log(format!( > + "Uploaded fixed index file to object store: {}", > + data.name > + )) > + } > + > + let mut state = self.state.lock().unwrap(); > + state.active_backend_operations -= 1; > > Ok(()) > } > @@ -645,6 +661,13 @@ impl BackupEnvironment { > bail!("found open index writer - unable to finish backup"); > } > > + if state.active_backend_operations != 0 { > + bail!( > + "backup task still has {} active operations.", > + state.active_backend_operations, > + ); > + } > + > if state.file_counter == 0 { > bail!("backup does not contain valid files (file count == 0)"); > } > @@ -753,6 +776,12 @@ impl BackupEnvironment { > if !state.finished { > bail!("backup ended but finished flag is not set."); > } > + if state.active_backend_operations != 0 { > + bail!( > + "backup ended but {} active backend operations.", > + state.active_backend_operations, > + ); > + } there's now an inconsistency between ensure_finished(), which checks both conditions, and finished() (used to determine whether a connection being interrupted is benign or not!), which just checks the finished flag.. > Ok(()) > } > > -- > 2.47.3 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend 2025-09-25 12:41 ` Fabian Grünbichler @ 2025-09-25 13:08 ` Christian Ebner 0 siblings, 0 replies; 7+ messages in thread From: Christian Ebner @ 2025-09-25 13:08 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Fabian Grünbichler On 9/25/25 2:41 PM, Fabian Grünbichler wrote: > On September 24, 2025 4:56 pm, Christian Ebner wrote: >> Closing of the fixed or dynamic index files with s3 backend will call >> async code, which must be avoided because of possible deadlocks [0]. >> Therefore, perform all changes on the shared backup state and drop the >> guard before uploading the fixed index file to the s3 backend. >> >> Account for active backend operations and check consistency, since it >> must be assured that all active backend operations are finished before >> the finish call can succeed. >> >> [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use >> >> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6750 >> Signed-off-by: Christian Ebner <c.ebner@proxmox.com> >> --- >> src/api2/backup/environment.rs | 77 +++++++++++++++++++++++----------- >> 1 file changed, 53 insertions(+), 24 deletions(-) >> >> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs >> index d5e6869cd..e535891a4 100644 >> --- a/src/api2/backup/environment.rs >> +++ b/src/api2/backup/environment.rs >> @@ -82,6 +82,7 @@ struct SharedBackupState { >> finished: bool, >> uid_counter: usize, >> file_counter: usize, // successfully uploaded files >> + active_backend_operations: usize, >> dynamic_writers: HashMap<usize, DynamicWriterState>, >> fixed_writers: HashMap<usize, FixedWriterState>, >> known_chunks: KnownChunksMap, >> @@ -135,6 +136,7 @@ impl BackupEnvironment { >> finished: false, >> uid_counter: 0, >> file_counter: 0, >> + active_backend_operations: 0, >> dynamic_writers: HashMap::new(), >> fixed_writers: HashMap::new(), >> known_chunks: HashMap::new(), >> @@ -483,15 +485,10 @@ impl BackupEnvironment { >> ); >> } >> >> - // For S3 backends, upload the index file to the object store after closing >> - if let DatastoreBackend::S3(s3_client) = &self.backend { >> - self.s3_upload_index(s3_client, &data.name) >> - .context("failed to upload dynamic index to s3 backend")?; >> - self.log(format!( >> - "Uploaded dynamic index file to s3 backend: {}", >> - data.name >> - )) >> - } >> + state.file_counter += 1; >> + state.backup_size += size; >> + state.backup_stat = state.backup_stat + data.upload_stat; >> + state.active_backend_operations += 1; >> >> self.log_upload_stat( >> &data.name, >> @@ -502,9 +499,21 @@ impl BackupEnvironment { >> &data.upload_stat, >> ); >> >> - state.file_counter += 1; >> - state.backup_size += size; >> - state.backup_stat = state.backup_stat + data.upload_stat; >> + // never hold mutex guard during s3 upload due to possible deadlocks >> + drop(state); >> + >> + // For S3 backends, upload the index file to the object store after closing >> + if let DatastoreBackend::S3(s3_client) = &self.backend { >> + self.s3_upload_index(s3_client, &data.name) >> + .context("failed to upload dynamic index to s3 backend")?; >> + self.log(format!( >> + "Uploaded dynamic index file to s3 backend: {}", >> + data.name >> + )) >> + } >> + >> + let mut state = self.state.lock().unwrap(); >> + state.active_backend_operations -= 1; > > these two hunks are okay, although we could also reuse the registered > writers map to encode whether a writer is active, being processed/closed > or doesn't exist? would allow more fine-grained logging.. Nice idea! So by setting an enum variant for the respective writer state, we can indeed keep the binding to the writer and improve logging. Will incorporate this in a v2. >> Ok(()) >> } >> @@ -567,15 +576,10 @@ impl BackupEnvironment { >> ); >> } >> >> - // For S3 backends, upload the index file to the object store after closing >> - if let DatastoreBackend::S3(s3_client) = &self.backend { >> - self.s3_upload_index(s3_client, &data.name) >> - .context("failed to upload fixed index to s3 backend")?; >> - self.log(format!( >> - "Uploaded fixed index file to object store: {}", >> - data.name >> - )) >> - } >> + state.file_counter += 1; >> + state.backup_size += size; >> + state.backup_stat = state.backup_stat + data.upload_stat; >> + state.active_backend_operations += 1; >> >> self.log_upload_stat( >> &data.name, >> @@ -586,9 +590,21 @@ impl BackupEnvironment { >> &data.upload_stat, >> ); >> >> - state.file_counter += 1; >> - state.backup_size += size; >> - state.backup_stat = state.backup_stat + data.upload_stat; >> + // never hold mutex guard during s3 upload due to possible deadlocks >> + drop(state); >> + >> + // For S3 backends, upload the index file to the object store after closing >> + if let DatastoreBackend::S3(s3_client) = &self.backend { >> + self.s3_upload_index(s3_client, &data.name) >> + .context("failed to upload fixed index to s3 backend")?; >> + self.log(format!( >> + "Uploaded fixed index file to object store: {}", >> + data.name >> + )) >> + } >> + >> + let mut state = self.state.lock().unwrap(); >> + state.active_backend_operations -= 1; >> >> Ok(()) >> } >> @@ -645,6 +661,13 @@ impl BackupEnvironment { >> bail!("found open index writer - unable to finish backup"); >> } >> >> + if state.active_backend_operations != 0 { >> + bail!( >> + "backup task still has {} active operations.", >> + state.active_backend_operations, >> + ); >> + } >> + >> if state.file_counter == 0 { >> bail!("backup does not contain valid files (file count == 0)"); >> } >> @@ -753,6 +776,12 @@ impl BackupEnvironment { >> if !state.finished { >> bail!("backup ended but finished flag is not set."); >> } >> + if state.active_backend_operations != 0 { >> + bail!( >> + "backup ended but {} active backend operations.", >> + state.active_backend_operations, >> + ); >> + } > > there's now an inconsistency between ensure_finished(), which checks both > conditions, and finished() (used to determine whether a connection being > interrupted is benign or not!), which just checks the finished flag.. Indeed, both will have to check both cases now, although will need to check how this will play out with the suggested changes to the other patch. > >> Ok(()) >> } >> >> -- >> 2.47.3 >> >> >> >> _______________________________________________ >> pbs-devel mailing list >> pbs-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel >> >> >> > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update 2025-09-24 14:56 [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups Christian Ebner 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner @ 2025-09-24 14:56 ` Christian Ebner 2025-09-25 12:46 ` Fabian Grünbichler 1 sibling, 1 reply; 7+ messages in thread From: Christian Ebner @ 2025-09-24 14:56 UTC (permalink / raw) To: pbs-devel An manifest update with s3 backend will call async code, which must be avoided because of possible deadlocks [0]. Therefore, perform all changes on the shared backup state and drop the guard before updating the manifest, which performs the backend specific update. Dropping the guard prematurely is fine, as the state has already been set to be finished, so no other api calls belonging to the same backup task cannot perform further changes anyways. [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- src/api2/backup/environment.rs | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs index e535891a4..073027c51 100644 --- a/src/api2/backup/environment.rs +++ b/src/api2/backup/environment.rs @@ -682,8 +682,15 @@ impl BackupEnvironment { } } - // check for valid manifest and store stats let stats = serde_json::to_value(state.backup_stat)?; + + // marks the backup state as finished, so no other api calls can modify its state anymore + state.finished = true; + + // never hold mutex guard during s3 upload due to possible deadlocks + drop(state); + + // check for valid manifest and store stats self.backup_dir .update_manifest(&self.backend, |manifest| { manifest.unprotected["chunk_upload_stats"] = stats; @@ -692,9 +699,6 @@ impl BackupEnvironment { self.datastore.try_ensure_sync_level()?; - // marks the backup as successful - state.finished = true; - Ok(()) } -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner @ 2025-09-25 12:46 ` Fabian Grünbichler 2025-09-25 13:20 ` Christian Ebner 0 siblings, 1 reply; 7+ messages in thread From: Fabian Grünbichler @ 2025-09-25 12:46 UTC (permalink / raw) To: Proxmox Backup Server development discussion On September 24, 2025 4:56 pm, Christian Ebner wrote: > An manifest update with s3 backend will call async code, which must > be avoided because of possible deadlocks [0]. Therefore, perform all > changes on the shared backup state and drop the guard before > updating the manifest, which performs the backend specific update. > Dropping the guard prematurely is fine, as the state has already been > set to be finished, so no other api calls belonging to the same > backup task cannot perform further changes anyways. > > [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use > > Signed-off-by: Christian Ebner <c.ebner@proxmox.com> > --- > src/api2/backup/environment.rs | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs > index e535891a4..073027c51 100644 > --- a/src/api2/backup/environment.rs > +++ b/src/api2/backup/environment.rs > @@ -682,8 +682,15 @@ impl BackupEnvironment { > } > } > > - // check for valid manifest and store stats > let stats = serde_json::to_value(state.backup_stat)?; > + > + // marks the backup state as finished, so no other api calls can modify its state anymore > + state.finished = true; marking it as finished (which prevents cleanup in case the client connection disappears!) > + // never hold mutex guard during s3 upload due to possible deadlocks > + drop(state); > + > + // check for valid manifest and store stats > self.backup_dir > .update_manifest(&self.backend, |manifest| { > manifest.unprotected["chunk_upload_stats"] = stats; > @@ -692,9 +699,6 @@ impl BackupEnvironment { > > self.datastore.try_ensure_sync_level()?; before this has been called seems kind of dangerous? why not update the manifest up front, then lock the state etc.? or lock do_some_checks mark_as_finishing (new state that needs to be checked in some places) drop state update_manifest lock do_checks_again mark_as_finished ? that way it should be race-free but still safe.. > - // marks the backup as successful > - state.finished = true; > - > Ok(()) > } > > -- > 2.47.3 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update 2025-09-25 12:46 ` Fabian Grünbichler @ 2025-09-25 13:20 ` Christian Ebner 0 siblings, 0 replies; 7+ messages in thread From: Christian Ebner @ 2025-09-25 13:20 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Fabian Grünbichler On 9/25/25 2:46 PM, Fabian Grünbichler wrote: > On September 24, 2025 4:56 pm, Christian Ebner wrote: >> An manifest update with s3 backend will call async code, which must >> be avoided because of possible deadlocks [0]. Therefore, perform all >> changes on the shared backup state and drop the guard before >> updating the manifest, which performs the backend specific update. >> Dropping the guard prematurely is fine, as the state has already been >> set to be finished, so no other api calls belonging to the same >> backup task cannot perform further changes anyways. >> >> [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use >> >> Signed-off-by: Christian Ebner <c.ebner@proxmox.com> >> --- >> src/api2/backup/environment.rs | 12 ++++++++---- >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs >> index e535891a4..073027c51 100644 >> --- a/src/api2/backup/environment.rs >> +++ b/src/api2/backup/environment.rs >> @@ -682,8 +682,15 @@ impl BackupEnvironment { >> } >> } >> >> - // check for valid manifest and store stats >> let stats = serde_json::to_value(state.backup_stat)?; >> + >> + // marks the backup state as finished, so no other api calls can modify its state anymore >> + state.finished = true; > > marking it as finished (which prevents cleanup in case the client > connection disappears!) > >> + // never hold mutex guard during s3 upload due to possible deadlocks >> + drop(state); >> + >> + // check for valid manifest and store stats >> self.backup_dir >> .update_manifest(&self.backend, |manifest| { >> manifest.unprotected["chunk_upload_stats"] = stats; >> @@ -692,9 +699,6 @@ impl BackupEnvironment { >> >> self.datastore.try_ensure_sync_level()?; > > before this has been called seems kind of dangerous? True, since this also allows the client to vanish as you mentioned. > > why not update the manifest up front, then lock the state etc.? or The manifest update requires the backup stats from the state, but one wants to disallow other (incorrect) API calls to mess with the state in-between as well (e.g. no more appending of chunks). But you are right, this requires a new, inter-mitten state as suggested below. I did miss that the finished state allows the client connection to vanish and the snapshot still be considered okay. > > lock > do_some_checks > mark_as_finishing (new state that needs to be checked in some places) > drop state > update_manifest > lock > do_checks_again > mark_as_finished > > ? that way it should be race-free but still safe.. Yes, this sounds better, might encode this using more telling enum variants instead of boolean flags for the finished states. Thanks for review! > >> - // marks the backup as successful >> - state.finished = true; >> - >> Ok(()) >> } >> >> -- >> 2.47.3 >> >> >> >> _______________________________________________ >> pbs-devel mailing list >> pbs-devel@lists.proxmox.com >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel >> >> >> > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-25 13:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-09-24 14:56 [pbs-devel] [RFC proxmox-backup 0/2] fix #6750: fix possible deadlock for s3 backed datastore backups Christian Ebner 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 1/2] fix #6750: api: avoid possible deadlock on datastores with s3 backend Christian Ebner 2025-09-25 12:41 ` Fabian Grünbichler 2025-09-25 13:08 ` Christian Ebner 2025-09-24 14:56 ` [pbs-devel] [PATCH proxmox-backup 2/2] api: backup: never hold mutex guard when doing manifest update Christian Ebner 2025-09-25 12:46 ` Fabian Grünbichler 2025-09-25 13:20 ` Christian Ebner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox