* [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh
@ 2025-11-12 16:36 Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/6] api: datastore: fix typo in helper function name Christian Ebner
` (7 more replies)
0 siblings, 8 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
Datastore contents located on the s3 backend can be refreshed on the local
datastore cache by running an s3-refresh, which will set the maintenance mode
s3-refresh to block any read/write operation and fetch the contents to a
temporary directory before moving them in place if the fetching was successful.
Setting the maintenance mode has however no effect on already active operations,
which can cause inconsistencies once the s3 refresh moves the temporary folders
in place.
Fix this by actively waiting for ongoing read/write operations to finish before
starting with the s3-refresh and locking the datastore config while the refresh
is ongoing, so the maintenance mode cannot be altered.
Changes since version 1 (thanks @Fabian for review):
- reuse and check for active operations based on unmount logic
- reuse and check for correct maintenance mode based on unmount logic
proxmox-backup:
Christian Ebner (6):
api: datastore: fix typo in helper function name
api: admin: make expected maintenance type helper generic over type
api: admin: factor out busy waiting on active operations
api: admin: factor out locking and maintenance mode clearing
datastore: s3 refresh: set/unset maintenance mode in api handler
api: datastore: wait for active operations to clear before s3 refresh
pbs-datastore/src/datastore.rs | 31 +------
src/api2/admin/datastore.rs | 153 +++++++++++++++++++++++++--------
2 files changed, 118 insertions(+), 66 deletions(-)
Summary over all repositories:
2 files changed, 118 insertions(+), 66 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 1/6] api: datastore: fix typo in helper function name
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/6] api: admin: make expected maintenance type helper generic over type Christian Ebner
` (6 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
substituting `maintanance` with `maintenance`
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version
src/api2/admin/datastore.rs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 6e269ef95..e81985169 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2548,7 +2548,7 @@ pub fn mount(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Er
Ok(json!(upid))
}
-fn expect_maintanance_unmounting(
+fn expect_maintenance_unmounting(
store: &str,
) -> Result<(pbs_config::BackupLockGuard, DataStoreConfig), Error> {
let lock = pbs_config::datastore::lock_config()?;
@@ -2590,7 +2590,7 @@ fn do_unmount_device(
let mut aborted = false;
while active_operations.read + active_operations.write > 0 {
if let Some(worker) = worker {
- if worker.abort_requested() || expect_maintanance_unmounting(&datastore.name).is_err() {
+ if worker.abort_requested() || expect_maintenance_unmounting(&datastore.name).is_err() {
aborted = true;
break;
}
@@ -2608,7 +2608,7 @@ fn do_unmount_device(
}
if aborted || worker.is_some_and(|w| w.abort_requested()) {
- let _ = expect_maintanance_unmounting(&datastore.name)
+ let _ = expect_maintenance_unmounting(&datastore.name)
.inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
.and_then(|(lock, config)| {
unset_maintenance(lock, config)
@@ -2616,7 +2616,7 @@ fn do_unmount_device(
});
bail!("aborted, due to user request");
} else {
- let (lock, config) = expect_maintanance_unmounting(&datastore.name)?;
+ let (lock, config) = expect_maintenance_unmounting(&datastore.name)?;
crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
unset_maintenance(lock, config)
.map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 2/6] api: admin: make expected maintenance type helper generic over type
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/6] api: datastore: fix typo in helper function name Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations Christian Ebner
` (5 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
By making the helper generic over the type, it can be reused to check
also for other maintenance type enum variants. The error message in
case of incorrect or non-set maintenance mode now relies on the
`MaintenanceType`'s string serialization.
In preparation for reuse with s3 refresh.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version
src/api2/admin/datastore.rs | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index e81985169..6e66b5cf0 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2548,8 +2548,9 @@ pub fn mount(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Er
Ok(json!(upid))
}
-fn expect_maintenance_unmounting(
+fn expect_maintenance_type(
store: &str,
+ maintenance_type: MaintenanceType,
) -> Result<(pbs_config::BackupLockGuard, DataStoreConfig), Error> {
let lock = pbs_config::datastore::lock_config()?;
let (section_config, _digest) = pbs_config::datastore::config()?;
@@ -2557,9 +2558,9 @@ fn expect_maintenance_unmounting(
if store_config
.get_maintenance_mode()
- .is_none_or(|m| m.ty != MaintenanceType::Unmount)
+ .is_none_or(|m| m.ty != maintenance_type)
{
- bail!("maintenance mode is not 'Unmount'");
+ bail!("maintenance mode is not '{maintenance_type}'");
}
Ok((lock, store_config))
@@ -2590,7 +2591,9 @@ fn do_unmount_device(
let mut aborted = false;
while active_operations.read + active_operations.write > 0 {
if let Some(worker) = worker {
- if worker.abort_requested() || expect_maintenance_unmounting(&datastore.name).is_err() {
+ if worker.abort_requested()
+ || expect_maintenance_type(&datastore.name, MaintenanceType::Unmount).is_err()
+ {
aborted = true;
break;
}
@@ -2608,7 +2611,7 @@ fn do_unmount_device(
}
if aborted || worker.is_some_and(|w| w.abort_requested()) {
- let _ = expect_maintenance_unmounting(&datastore.name)
+ let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
.inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
.and_then(|(lock, config)| {
unset_maintenance(lock, config)
@@ -2616,7 +2619,7 @@ fn do_unmount_device(
});
bail!("aborted, due to user request");
} else {
- let (lock, config) = expect_maintenance_unmounting(&datastore.name)?;
+ let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
unset_maintenance(lock, config)
.map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/6] api: datastore: fix typo in helper function name Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/6] api: admin: make expected maintenance type helper generic over type Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-13 8:15 ` Fabian Grünbichler
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing Christian Ebner
` (4 subsequent siblings)
7 siblings, 1 reply; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
Move the logic to wait on no more active operations on a given
datastore into a dedicated helper function, to be reused
by s3-refresh.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version
src/api2/admin/datastore.rs | 54 ++++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 6e66b5cf0..7daccf9fd 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2586,29 +2586,21 @@ fn do_unmount_device(
}
let mount_point = datastore.absolute_path();
- let mut active_operations = task_tracking::get_active_operations(&datastore.name)?;
let mut old_status = String::new();
- let mut aborted = false;
- while active_operations.read + active_operations.write > 0 {
- if let Some(worker) = worker {
- if worker.abort_requested()
- || expect_maintenance_type(&datastore.name, MaintenanceType::Unmount).is_err()
- {
- aborted = true;
- break;
- }
+ let aborted = wait_on_active_operations(
+ &datastore.name,
+ worker,
+ MaintenanceType::Unmount,
+ |reads, writes| {
let status = format!(
- "cannot unmount yet, still {} read and {} write operations active",
- active_operations.read, active_operations.write
+ "cannot unmount yet, still {reads} read and {writes} write operations active",
);
if status != old_status {
info!("{status}");
old_status = status;
}
- }
- std::thread::sleep(std::time::Duration::from_secs(1));
- active_operations = task_tracking::get_active_operations(&datastore.name)?;
- }
+ },
+ )?;
if aborted || worker.is_some_and(|w| w.abort_requested()) {
let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
@@ -2725,6 +2717,36 @@ pub async fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Resul
Ok(json!(upid))
}
+/// Wait for no more active operations on the given datastore.
+/// If a worker task context is provided, the given callback will be executed for each busy wait
+/// iteration.
+///
+/// Returns with Ok(true) if the worker was aborted or the expected maintenance mode was not set,
+/// Ok(false) if no more operations are active.
+fn wait_on_active_operations(
+ store: &str,
+ worker: Option<&dyn WorkerTaskContext>,
+ maintenance_expected: MaintenanceType,
+ mut status_msg_callback: impl FnMut(i64, i64),
+) -> Result<bool, Error> {
+ let mut active_operations = task_tracking::get_active_operations(&store)?;
+
+ while active_operations.read + active_operations.write > 0 {
+ if let Some(worker) = worker {
+ if worker.abort_requested()
+ || expect_maintenance_type(&store, maintenance_expected).is_err()
+ {
+ return Ok(true);
+ }
+ status_msg_callback(active_operations.read, active_operations.write);
+ }
+ std::thread::sleep(std::time::Duration::from_secs(1));
+ active_operations = task_tracking::get_active_operations(&store)?;
+ }
+
+ Ok(false)
+}
+
#[sortable]
const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
(
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
` (2 preceding siblings ...)
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-13 8:18 ` Fabian Grünbichler
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 5/6] datastore: s3 refresh: set/unset maintenance mode in api handler Christian Ebner
` (3 subsequent siblings)
7 siblings, 1 reply; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
Provide a helper which allows to either clear the maintenance mode if
the worker was aborted, or call the provided callback while holding
the datastore config lock.
In preparation for reusing the same logic for the s3 refresh.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version
src/api2/admin/datastore.rs | 50 +++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 16 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 7daccf9fd..8d58b5059 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2584,7 +2584,6 @@ fn do_unmount_device(
if datastore.backing_device.is_none() {
bail!("can't unmount non-removable datastore");
}
- let mount_point = datastore.absolute_path();
let mut old_status = String::new();
let aborted = wait_on_active_operations(
@@ -2602,21 +2601,14 @@ fn do_unmount_device(
},
)?;
- if aborted || worker.is_some_and(|w| w.abort_requested()) {
- let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
- .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
- .and_then(|(lock, config)| {
- unset_maintenance(lock, config)
- .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
- });
- bail!("aborted, due to user request");
- } else {
- let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
- crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
- unset_maintenance(lock, config)
- .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
- }
- Ok(())
+ let mount_point = datastore.absolute_path();
+ clear_or_run_maintenance_locked(
+ &datastore.name,
+ worker,
+ MaintenanceType::Unmount,
+ aborted,
+ || crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point)),
+ )
}
#[api(
@@ -2747,6 +2739,32 @@ fn wait_on_active_operations(
Ok(false)
}
+// Either clear the current maintenance mode if the worker was aborted or run the provided callback
+// while keeping the datastore config lock, so the mode cannot be altered. Clears the maintenance
+// mode after successful callback execution.
+fn clear_or_run_maintenance_locked(
+ store: &str,
+ worker: Option<&dyn WorkerTaskContext>,
+ maintenance_expected: MaintenanceType,
+ aborted: bool,
+ callback: impl Fn() -> Result<(), Error>,
+) -> Result<(), Error> {
+ if aborted || worker.is_some_and(|w| w.abort_requested()) {
+ let _ = expect_maintenance_type(store, maintenance_expected)
+ .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
+ .and_then(|(lock, config)| {
+ unset_maintenance(lock, config)
+ .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
+ });
+ bail!("aborted, due to user request");
+ } else {
+ let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
+ callback()?;
+ unset_maintenance(lock, config)
+ .map_err(|e| format_err!("could not reset maintenance mode: {e}"))
+ }
+}
+
#[sortable]
const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
(
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 5/6] datastore: s3 refresh: set/unset maintenance mode in api handler
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
` (3 preceding siblings ...)
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh Christian Ebner
` (2 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
Instead of setting the maintenance mode in the datastores s3 refresh
helper method, do this in the api handler directly, reusing the pre-
defined helpers. Since this is now mostly a sync task, adapt the api
handler to be a sync function and run the task on a dedicated thread.
This is in preparation for fixing the s3 refresh to only be started
after still ongoing active operations complete.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- reuse already present maintenance mode helpers from unmount operation
pbs-datastore/src/datastore.rs | 31 ++-----------------------------
src/api2/admin/datastore.rs | 26 ++++++++++++++++++++++----
2 files changed, 24 insertions(+), 33 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index b9d4ed2d2..031fa958b 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -2305,23 +2305,12 @@ impl DataStore {
*OLD_LOCKING
}
- /// Set the datastore's maintenance mode to `S3Refresh`, fetch from S3 object store, clear and
- /// replace the local cache store contents. Once finished disable the maintenance mode again.
- /// Returns with error for other datastore backends without setting the maintenance mode.
+ /// Fetch contents from S3 object store, clear and replace the local cache store contents.
+ /// Returns with error for non-S3 datastore backends.
pub async fn s3_refresh(self: &Arc<Self>) -> Result<(), Error> {
match self.backend()? {
DatastoreBackend::Filesystem => bail!("store '{}' not backed by S3", self.name()),
DatastoreBackend::S3(s3_client) => {
- let self_clone = Arc::clone(self);
- tokio::task::spawn_blocking(move || {
- self_clone.maintenance_mode(Some(MaintenanceMode {
- ty: MaintenanceType::S3Refresh,
- message: None,
- }))
- })
- .await?
- .context("failed to set maintenance mode")?;
-
let tmp_base = proxmox_sys::fs::make_tmp_dir(self.base_path(), None)
.context("failed to create temporary content folder in {store_base}")?;
@@ -2335,27 +2324,11 @@ impl DataStore {
let _ = std::fs::remove_dir_all(&tmp_base);
return Err(err);
}
-
- let self_clone = Arc::clone(self);
- tokio::task::spawn_blocking(move || self_clone.maintenance_mode(None))
- .await?
- .context("failed to clear maintenance mode")?;
}
}
Ok(())
}
- // Set or clear the datastores maintenance mode by locking and updating the datastore config
- fn maintenance_mode(&self, maintenance_mode: Option<MaintenanceMode>) -> Result<(), Error> {
- let _lock = pbs_config::datastore::lock_config()?;
- let (mut section_config, _digest) = pbs_config::datastore::config()?;
- let mut datastore: DataStoreConfig = section_config.lookup("datastore", self.name())?;
- datastore.set_maintenance_mode(maintenance_mode)?;
- section_config.set_data(self.name(), "datastore", &datastore)?;
- pbs_config::datastore::save_config(§ion_config)?;
- Ok(())
- }
-
// Fetch the contents (metadata, no chunks) of the datastore from the S3 object store to the
// provided temporaray directory
async fn fetch_tmp_contents(&self, tmp_base: &Path, s3_client: &S3Client) -> Result<(), Error> {
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 8d58b5059..91189d7ae 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2693,17 +2693,35 @@ pub async fn unmount(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<V
},
)]
/// Refresh datastore contents from S3 to local cache store.
-pub async fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Error> {
+pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Error> {
+ let _lock = pbs_config::datastore::lock_config()?;
+ let (mut section_config, _digest) = pbs_config::datastore::config()?;
+ let mut datastore_config: DataStoreConfig = section_config.lookup("datastore", &store)?;
+
+ datastore_config.set_maintenance_mode(Some(MaintenanceMode {
+ ty: MaintenanceType::S3Refresh,
+ message: None,
+ }))?;
+
+ section_config.set_data(&store, "datastore", &datastore_config)?;
+ pbs_config::datastore::save_config(§ion_config)?;
+ drop(_lock);
+
let datastore = DataStore::lookup_datastore(&store, Some(Operation::Lookup))?;
let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
let to_stdout = rpcenv.env_type() == RpcEnvironmentType::CLI;
- let upid = WorkerTask::spawn(
+ let upid = WorkerTask::new_thread(
"s3-refresh",
- Some(store),
+ Some(store.clone()),
auth_id.to_string(),
to_stdout,
- move |_worker| async move { datastore.s3_refresh().await },
+ move |_worker| {
+ proxmox_async::runtime::block_on(datastore.s3_refresh())?;
+
+ let (_lock, config) = expect_maintenance_type(&store, MaintenanceType::S3Refresh)?;
+ unset_maintenance(_lock, config).context("failed to clear maintenance mode")
+ },
)?;
Ok(json!(upid))
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
` (4 preceding siblings ...)
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 5/6] datastore: s3 refresh: set/unset maintenance mode in api handler Christian Ebner
@ 2025-11-12 16:36 ` Christian Ebner
2025-11-13 8:15 ` Fabian Grünbichler
2025-11-13 8:20 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/6] wait for active operations to finish " Fabian Grünbichler
2025-11-13 14:23 ` [pbs-devel] superseded: " Christian Ebner
7 siblings, 1 reply; 17+ messages in thread
From: Christian Ebner @ 2025-11-12 16:36 UTC (permalink / raw)
To: pbs-devel
Currently, the s3 refresh does not take into consideration already
ongoing active operations, only blocking new ones.
This will however lead to inconsistencies if there are ongoing read
or write operations. Therefore, actively wait for ongoing operatioins
to complete before running the actual refresh and keep the datastore
config locked so the maintenance mode cannot be altered.
If an abort was requested while waiting, clear the maintenance mode
as well.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- use refactored helpers from identical unmount logic
src/api2/admin/datastore.rs | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 91189d7ae..93e085be3 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2716,11 +2716,29 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
Some(store.clone()),
auth_id.to_string(),
to_stdout,
- move |_worker| {
- proxmox_async::runtime::block_on(datastore.s3_refresh())?;
-
- let (_lock, config) = expect_maintenance_type(&store, MaintenanceType::S3Refresh)?;
- unset_maintenance(_lock, config).context("failed to clear maintenance mode")
+ move |worker| {
+ let mut old_status = String::new();
+ let aborted = wait_on_active_operations(
+ &store,
+ Some(&worker),
+ MaintenanceType::S3Refresh,
+ |reads, writes| {
+ let status = format!(
+ "waiting for active operations to finsish: read {reads}, write {writes}",
+ );
+ if status != old_status {
+ info!("{status}");
+ old_status = status;
+ }
+ },
+ )?;
+ clear_or_run_maintenance_locked(
+ &store,
+ Some(&worker),
+ MaintenanceType::S3Refresh,
+ aborted,
+ || proxmox_async::runtime::block_on(datastore.s3_refresh()),
+ )
},
)?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh Christian Ebner
@ 2025-11-13 8:15 ` Fabian Grünbichler
2025-11-13 9:03 ` Christian Ebner
0 siblings, 1 reply; 17+ messages in thread
From: Fabian Grünbichler @ 2025-11-13 8:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 12, 2025 5:36 pm, Christian Ebner wrote:
> Currently, the s3 refresh does not take into consideration already
> ongoing active operations, only blocking new ones.
>
> This will however lead to inconsistencies if there are ongoing read
> or write operations. Therefore, actively wait for ongoing operatioins
> to complete before running the actual refresh and keep the datastore
> config locked so the maintenance mode cannot be altered.
>
> If an abort was requested while waiting, clear the maintenance mode
> as well.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - use refactored helpers from identical unmount logic
>
> src/api2/admin/datastore.rs | 28 +++++++++++++++++++++++-----
> 1 file changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 91189d7ae..93e085be3 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -2716,11 +2716,29 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
> Some(store.clone()),
> auth_id.to_string(),
> to_stdout,
> - move |_worker| {
> - proxmox_async::runtime::block_on(datastore.s3_refresh())?;
> -
> - let (_lock, config) = expect_maintenance_type(&store, MaintenanceType::S3Refresh)?;
> - unset_maintenance(_lock, config).context("failed to clear maintenance mode")
> + move |worker| {
> + let mut old_status = String::new();
> + let aborted = wait_on_active_operations(
> + &store,
> + Some(&worker),
> + MaintenanceType::S3Refresh,
> + |reads, writes| {
> + let status = format!(
> + "waiting for active operations to finsish: read {reads}, write {writes}",
> + );
> + if status != old_status {
> + info!("{status}");
> + old_status = status;
> + }
> + },
> + )?;
> + clear_or_run_maintenance_locked(
> + &store,
> + Some(&worker),
> + MaintenanceType::S3Refresh,
> + aborted,
> + || proxmox_async::runtime::block_on(datastore.s3_refresh()),
> + )
there is one more call to s3_refresh when (re)creating a datastore, that
seems to me is running completely unprotected..
> },
> )?;
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations Christian Ebner
@ 2025-11-13 8:15 ` Fabian Grünbichler
2025-11-13 8:38 ` Christian Ebner
0 siblings, 1 reply; 17+ messages in thread
From: Fabian Grünbichler @ 2025-11-13 8:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 12, 2025 5:36 pm, Christian Ebner wrote:
> Move the logic to wait on no more active operations on a given
> datastore into a dedicated helper function, to be reused
> by s3-refresh.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
>
> src/api2/admin/datastore.rs | 54 ++++++++++++++++++++++++++-----------
> 1 file changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 6e66b5cf0..7daccf9fd 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -2586,29 +2586,21 @@ fn do_unmount_device(
> }
> let mount_point = datastore.absolute_path();
>
> - let mut active_operations = task_tracking::get_active_operations(&datastore.name)?;
> let mut old_status = String::new();
> - let mut aborted = false;
> - while active_operations.read + active_operations.write > 0 {
> - if let Some(worker) = worker {
> - if worker.abort_requested()
> - || expect_maintenance_type(&datastore.name, MaintenanceType::Unmount).is_err()
> - {
> - aborted = true;
> - break;
> - }
> + let aborted = wait_on_active_operations(
> + &datastore.name,
> + worker,
> + MaintenanceType::Unmount,
> + |reads, writes| {
> let status = format!(
> - "cannot unmount yet, still {} read and {} write operations active",
> - active_operations.read, active_operations.write
> + "cannot unmount yet, still {reads} read and {writes} write operations active",
> );
> if status != old_status {
> info!("{status}");
> old_status = status;
> }
> - }
> - std::thread::sleep(std::time::Duration::from_secs(1));
> - active_operations = task_tracking::get_active_operations(&datastore.name)?;
> - }
> + },
> + )?;
>
> if aborted || worker.is_some_and(|w| w.abort_requested()) {
> let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
> @@ -2725,6 +2717,36 @@ pub async fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Resul
> Ok(json!(upid))
> }
>
> +/// Wait for no more active operations on the given datastore.
> +/// If a worker task context is provided, the given callback will be executed for each busy wait
> +/// iteration.
> +///
> +/// Returns with Ok(true) if the worker was aborted or the expected maintenance mode was not set,
> +/// Ok(false) if no more operations are active.
> +fn wait_on_active_operations(
> + store: &str,
> + worker: Option<&dyn WorkerTaskContext>,
this can drop the Option, all callers actually run in a worker context..
> + maintenance_expected: MaintenanceType,
> + mut status_msg_callback: impl FnMut(i64, i64),
> +) -> Result<bool, Error> {
> + let mut active_operations = task_tracking::get_active_operations(&store)?;
> +
> + while active_operations.read + active_operations.write > 0 {
> + if let Some(worker) = worker {
> + if worker.abort_requested()
> + || expect_maintenance_type(&store, maintenance_expected).is_err()
> + {
> + return Ok(true);
> + }
> + status_msg_callback(active_operations.read, active_operations.write);
> + }
> + std::thread::sleep(std::time::Duration::from_secs(1));
> + active_operations = task_tracking::get_active_operations(&store)?;
> + }
> +
> + Ok(false)
> +}
> +
> #[sortable]
> const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
> (
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing Christian Ebner
@ 2025-11-13 8:18 ` Fabian Grünbichler
2025-11-13 8:43 ` Christian Ebner
0 siblings, 1 reply; 17+ messages in thread
From: Fabian Grünbichler @ 2025-11-13 8:18 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 12, 2025 5:36 pm, Christian Ebner wrote:
> Provide a helper which allows to either clear the maintenance mode if
> the worker was aborted, or call the provided callback while holding
> the datastore config lock.
>
> In preparation for reusing the same logic for the s3 refresh.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
>
> src/api2/admin/datastore.rs | 50 +++++++++++++++++++++++++------------
> 1 file changed, 34 insertions(+), 16 deletions(-)
>
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 7daccf9fd..8d58b5059 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -2584,7 +2584,6 @@ fn do_unmount_device(
> if datastore.backing_device.is_none() {
> bail!("can't unmount non-removable datastore");
> }
> - let mount_point = datastore.absolute_path();
>
> let mut old_status = String::new();
> let aborted = wait_on_active_operations(
> @@ -2602,21 +2601,14 @@ fn do_unmount_device(
> },
> )?;
>
> - if aborted || worker.is_some_and(|w| w.abort_requested()) {
> - let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
> - .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
> - .and_then(|(lock, config)| {
> - unset_maintenance(lock, config)
> - .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
> - });
> - bail!("aborted, due to user request");
> - } else {
> - let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
> - crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
> - unset_maintenance(lock, config)
> - .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
> - }
> - Ok(())
> + let mount_point = datastore.absolute_path();
> + clear_or_run_maintenance_locked(
> + &datastore.name,
> + worker,
> + MaintenanceType::Unmount,
> + aborted,
> + || crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point)),
> + )
> }
>
> #[api(
> @@ -2747,6 +2739,32 @@ fn wait_on_active_operations(
> Ok(false)
> }
>
> +// Either clear the current maintenance mode if the worker was aborted or run the provided callback
> +// while keeping the datastore config lock, so the mode cannot be altered. Clears the maintenance
> +// mode after successful callback execution.
> +fn clear_or_run_maintenance_locked(
> + store: &str,
> + worker: Option<&dyn WorkerTaskContext>,
this can also drop the Option ;)
but given that we now have two helpers with two almost identical call
sites, could we not make it a single helper?
e.g., `run_maintenance_locked` that
- waits for operations to finish while checking worker status and
maintenance type (with status updates via a passed in format string or
callback)
- obtains the lock
- runs the actual maintenance task via a provided closure/callback/..
- clears the maintenance mode and drops the lock
> + maintenance_expected: MaintenanceType,
> + aborted: bool,
> + callback: impl Fn() -> Result<(), Error>,
> +) -> Result<(), Error> {
> + if aborted || worker.is_some_and(|w| w.abort_requested()) {
> + let _ = expect_maintenance_type(store, maintenance_expected)
> + .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
> + .and_then(|(lock, config)| {
> + unset_maintenance(lock, config)
> + .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
> + });
> + bail!("aborted, due to user request");
> + } else {
> + let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
> + callback()?;
> + unset_maintenance(lock, config)
> + .map_err(|e| format_err!("could not reset maintenance mode: {e}"))
> + }
> +}
> +
> #[sortable]
> const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
> (
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
` (5 preceding siblings ...)
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh Christian Ebner
@ 2025-11-13 8:20 ` Fabian Grünbichler
2025-11-13 14:23 ` [pbs-devel] superseded: " Christian Ebner
7 siblings, 0 replies; 17+ messages in thread
From: Fabian Grünbichler @ 2025-11-13 8:20 UTC (permalink / raw)
To: pbs-devel, Christian Ebner
On Wed, 12 Nov 2025 17:36:18 +0100, Christian Ebner wrote:
> Datastore contents located on the s3 backend can be refreshed on the local
> datastore cache by running an s3-refresh, which will set the maintenance mode
> s3-refresh to block any read/write operation and fetch the contents to a
> temporary directory before moving them in place if the fetching was successful.
>
> Setting the maintenance mode has however no effect on already active operations,
> which can cause inconsistencies once the s3 refresh moves the temporary folders
> in place.
>
> [...]
Applied the first two patches, thanks!
[1/6] api: datastore: fix typo in helper function name
commit: acfa4a59ee1a97feabdf7c2e4e2ec046274d45d4
[2/6] api: admin: make expected maintenance type helper generic over type
commit: 0eb02a15e33f70431ddad6867a96a277062962e2
Left the following for a v3 with minor feedback:
[3/6] api: admin: factor out busy waiting on active operations
(no commit info)
[4/6] api: admin: factor out locking and maintenance mode clearing
(no commit info)
[5/6] datastore: s3 refresh: set/unset maintenance mode in api handler
(no commit info)
[6/6] api: datastore: wait for active operations to clear before s3 refresh
(no commit info)
Best regards,
--
Fabian Grünbichler <f.gruenbichler@proxmox.com>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations
2025-11-13 8:15 ` Fabian Grünbichler
@ 2025-11-13 8:38 ` Christian Ebner
0 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-13 8:38 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/13/25 9:16 AM, Fabian Grünbichler wrote:
> On November 12, 2025 5:36 pm, Christian Ebner wrote:
>> Move the logic to wait on no more active operations on a given
>> datastore into a dedicated helper function, to be reused
>> by s3-refresh.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>> src/api2/admin/datastore.rs | 54 ++++++++++++++++++++++++++-----------
>> 1 file changed, 38 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 6e66b5cf0..7daccf9fd 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -2586,29 +2586,21 @@ fn do_unmount_device(
>> }
>> let mount_point = datastore.absolute_path();
>>
>> - let mut active_operations = task_tracking::get_active_operations(&datastore.name)?;
>> let mut old_status = String::new();
>> - let mut aborted = false;
>> - while active_operations.read + active_operations.write > 0 {
>> - if let Some(worker) = worker {
>> - if worker.abort_requested()
>> - || expect_maintenance_type(&datastore.name, MaintenanceType::Unmount).is_err()
>> - {
>> - aborted = true;
>> - break;
>> - }
>> + let aborted = wait_on_active_operations(
>> + &datastore.name,
>> + worker,
>> + MaintenanceType::Unmount,
>> + |reads, writes| {
>> let status = format!(
>> - "cannot unmount yet, still {} read and {} write operations active",
>> - active_operations.read, active_operations.write
>> + "cannot unmount yet, still {reads} read and {writes} write operations active",
>> );
>> if status != old_status {
>> info!("{status}");
>> old_status = status;
>> }
>> - }
>> - std::thread::sleep(std::time::Duration::from_secs(1));
>> - active_operations = task_tracking::get_active_operations(&datastore.name)?;
>> - }
>> + },
>> + )?;
>>
>> if aborted || worker.is_some_and(|w| w.abort_requested()) {
>> let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
>> @@ -2725,6 +2717,36 @@ pub async fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Resul
>> Ok(json!(upid))
>> }
>>
>> +/// Wait for no more active operations on the given datastore.
>> +/// If a worker task context is provided, the given callback will be executed for each busy wait
>> +/// iteration.
>> +///
>> +/// Returns with Ok(true) if the worker was aborted or the expected maintenance mode was not set,
>> +/// Ok(false) if no more operations are active.
>> +fn wait_on_active_operations(
>> + store: &str,
>> + worker: Option<&dyn WorkerTaskContext>,
>
> this can drop the Option, all callers actually run in a worker context..
Right, now I'm wondering why this was an option to begin with in the
do_unmount_device(), but will drop that for the v3.
>
>> + maintenance_expected: MaintenanceType,
>> + mut status_msg_callback: impl FnMut(i64, i64),
>> +) -> Result<bool, Error> {
>> + let mut active_operations = task_tracking::get_active_operations(&store)?;
>> +
>> + while active_operations.read + active_operations.write > 0 {
>> + if let Some(worker) = worker {
>> + if worker.abort_requested()
>> + || expect_maintenance_type(&store, maintenance_expected).is_err()
>> + {
>> + return Ok(true);
>> + }
>> + status_msg_callback(active_operations.read, active_operations.write);
>> + }
>> + std::thread::sleep(std::time::Duration::from_secs(1));
>> + active_operations = task_tracking::get_active_operations(&store)?;
>> + }
>> +
>> + Ok(false)
>> +}
>> +
>> #[sortable]
>> const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
>> (
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing
2025-11-13 8:18 ` Fabian Grünbichler
@ 2025-11-13 8:43 ` Christian Ebner
2025-11-13 9:08 ` Fabian Grünbichler
0 siblings, 1 reply; 17+ messages in thread
From: Christian Ebner @ 2025-11-13 8:43 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/13/25 9:18 AM, Fabian Grünbichler wrote:
> On November 12, 2025 5:36 pm, Christian Ebner wrote:
>> Provide a helper which allows to either clear the maintenance mode if
>> the worker was aborted, or call the provided callback while holding
>> the datastore config lock.
>>
>> In preparation for reusing the same logic for the s3 refresh.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>> src/api2/admin/datastore.rs | 50 +++++++++++++++++++++++++------------
>> 1 file changed, 34 insertions(+), 16 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 7daccf9fd..8d58b5059 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -2584,7 +2584,6 @@ fn do_unmount_device(
>> if datastore.backing_device.is_none() {
>> bail!("can't unmount non-removable datastore");
>> }
>> - let mount_point = datastore.absolute_path();
>>
>> let mut old_status = String::new();
>> let aborted = wait_on_active_operations(
>> @@ -2602,21 +2601,14 @@ fn do_unmount_device(
>> },
>> )?;
>>
>> - if aborted || worker.is_some_and(|w| w.abort_requested()) {
>> - let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
>> - .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
>> - .and_then(|(lock, config)| {
>> - unset_maintenance(lock, config)
>> - .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
>> - });
>> - bail!("aborted, due to user request");
>> - } else {
>> - let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
>> - crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
>> - unset_maintenance(lock, config)
>> - .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>> - }
>> - Ok(())
>> + let mount_point = datastore.absolute_path();
>> + clear_or_run_maintenance_locked(
>> + &datastore.name,
>> + worker,
>> + MaintenanceType::Unmount,
>> + aborted,
>> + || crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point)),
>> + )
>> }
>>
>> #[api(
>> @@ -2747,6 +2739,32 @@ fn wait_on_active_operations(
>> Ok(false)
>> }
>>
>> +// Either clear the current maintenance mode if the worker was aborted or run the provided callback
>> +// while keeping the datastore config lock, so the mode cannot be altered. Clears the maintenance
>> +// mode after successful callback execution.
>> +fn clear_or_run_maintenance_locked(
>> + store: &str,
>> + worker: Option<&dyn WorkerTaskContext>,
>
> this can also drop the Option ;)
>
> but given that we now have two helpers with two almost identical call
> sites, could we not make it a single helper?
Okay, can combine them into one. For me it was mentally less friction to
have these separated, as especially given that the waiting on active
operations seemed worth it's own encapsulation and gets a dedicated
callback. Now this requires either 2 callbacks or a format string and
callback.
But since this is not required elsewhere (yet?) I will combine this
helper into one.
> e.g., `run_maintenance_locked` that
> - waits for operations to finish while checking worker status and
> maintenance type (with status updates via a passed in format string or
> callback)
> - obtains the lock
> - runs the actual maintenance task via a provided closure/callback/..
> - clears the maintenance mode and drops the lock
>
>> + maintenance_expected: MaintenanceType,
>> + aborted: bool,
>> + callback: impl Fn() -> Result<(), Error>,
>> +) -> Result<(), Error> {
>> + if aborted || worker.is_some_and(|w| w.abort_requested()) {
>> + let _ = expect_maintenance_type(store, maintenance_expected)
>> + .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
>> + .and_then(|(lock, config)| {
>> + unset_maintenance(lock, config)
>> + .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
>> + });
>> + bail!("aborted, due to user request");
>> + } else {
>> + let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
>> + callback()?;
>> + unset_maintenance(lock, config)
>> + .map_err(|e| format_err!("could not reset maintenance mode: {e}"))
>> + }
>> +}
>> +
>> #[sortable]
>> const DATASTORE_INFO_SUBDIRS: SubdirMap = &[
>> (
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh
2025-11-13 8:15 ` Fabian Grünbichler
@ 2025-11-13 9:03 ` Christian Ebner
0 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-13 9:03 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/13/25 9:15 AM, Fabian Grünbichler wrote:
> On November 12, 2025 5:36 pm, Christian Ebner wrote:
>> Currently, the s3 refresh does not take into consideration already
>> ongoing active operations, only blocking new ones.
>>
>> This will however lead to inconsistencies if there are ongoing read
>> or write operations. Therefore, actively wait for ongoing operatioins
>> to complete before running the actual refresh and keep the datastore
>> config locked so the maintenance mode cannot be altered.
>>
>> If an abort was requested while waiting, clear the maintenance mode
>> as well.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - use refactored helpers from identical unmount logic
>>
>> src/api2/admin/datastore.rs | 28 +++++++++++++++++++++++-----
>> 1 file changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 91189d7ae..93e085be3 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -2716,11 +2716,29 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
>> Some(store.clone()),
>> auth_id.to_string(),
>> to_stdout,
>> - move |_worker| {
>> - proxmox_async::runtime::block_on(datastore.s3_refresh())?;
>> -
>> - let (_lock, config) = expect_maintenance_type(&store, MaintenanceType::S3Refresh)?;
>> - unset_maintenance(_lock, config).context("failed to clear maintenance mode")
>> + move |worker| {
>> + let mut old_status = String::new();
>> + let aborted = wait_on_active_operations(
>> + &store,
>> + Some(&worker),
>> + MaintenanceType::S3Refresh,
>> + |reads, writes| {
>> + let status = format!(
>> + "waiting for active operations to finsish: read {reads}, write {writes}",
>> + );
>> + if status != old_status {
>> + info!("{status}");
>> + old_status = status;
>> + }
>> + },
>> + )?;
>> + clear_or_run_maintenance_locked(
>> + &store,
>> + Some(&worker),
>> + MaintenanceType::S3Refresh,
>> + aborted,
>> + || proxmox_async::runtime::block_on(datastore.s3_refresh()),
>> + )
>
> there is one more call to s3_refresh when (re)creating a datastore, that
> seems to me is running completely unprotected..
Oh, yes indeed, that needs to be adapted as well now.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing
2025-11-13 8:43 ` Christian Ebner
@ 2025-11-13 9:08 ` Fabian Grünbichler
2025-11-13 9:11 ` Christian Ebner
0 siblings, 1 reply; 17+ messages in thread
From: Fabian Grünbichler @ 2025-11-13 9:08 UTC (permalink / raw)
To: Christian Ebner, Proxmox Backup Server development discussion
On November 13, 2025 9:43 am, Christian Ebner wrote:
> On 11/13/25 9:18 AM, Fabian Grünbichler wrote:
>> On November 12, 2025 5:36 pm, Christian Ebner wrote:
>>> Provide a helper which allows to either clear the maintenance mode if
>>> the worker was aborted, or call the provided callback while holding
>>> the datastore config lock.
>>>
>>> In preparation for reusing the same logic for the s3 refresh.
>>>
>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>> ---
>>> changes since version 1:
>>> - not present in previous version
>>>
>>> src/api2/admin/datastore.rs | 50 +++++++++++++++++++++++++------------
>>> 1 file changed, 34 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>>> index 7daccf9fd..8d58b5059 100644
>>> --- a/src/api2/admin/datastore.rs
>>> +++ b/src/api2/admin/datastore.rs
>>> @@ -2584,7 +2584,6 @@ fn do_unmount_device(
>>> if datastore.backing_device.is_none() {
>>> bail!("can't unmount non-removable datastore");
>>> }
>>> - let mount_point = datastore.absolute_path();
>>>
>>> let mut old_status = String::new();
>>> let aborted = wait_on_active_operations(
>>> @@ -2602,21 +2601,14 @@ fn do_unmount_device(
>>> },
>>> )?;
>>>
>>> - if aborted || worker.is_some_and(|w| w.abort_requested()) {
>>> - let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
>>> - .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
>>> - .and_then(|(lock, config)| {
>>> - unset_maintenance(lock, config)
>>> - .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
>>> - });
>>> - bail!("aborted, due to user request");
>>> - } else {
>>> - let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
>>> - crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
>>> - unset_maintenance(lock, config)
>>> - .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>>> - }
>>> - Ok(())
>>> + let mount_point = datastore.absolute_path();
>>> + clear_or_run_maintenance_locked(
>>> + &datastore.name,
>>> + worker,
>>> + MaintenanceType::Unmount,
>>> + aborted,
>>> + || crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point)),
>>> + )
>>> }
>>>
>>> #[api(
>>> @@ -2747,6 +2739,32 @@ fn wait_on_active_operations(
>>> Ok(false)
>>> }
>>>
>>> +// Either clear the current maintenance mode if the worker was aborted or run the provided callback
>>> +// while keeping the datastore config lock, so the mode cannot be altered. Clears the maintenance
>>> +// mode after successful callback execution.
>>> +fn clear_or_run_maintenance_locked(
>>> + store: &str,
>>> + worker: Option<&dyn WorkerTaskContext>,
>>
>> this can also drop the Option ;)
>>
>> but given that we now have two helpers with two almost identical call
>> sites, could we not make it a single helper?
>
> Okay, can combine them into one. For me it was mentally less friction to
> have these separated, as especially given that the waiting on active
> operations seemed worth it's own encapsulation and gets a dedicated
> callback. Now this requires either 2 callbacks or a format string and
> callback.
>
> But since this is not required elsewhere (yet?) I will combine this
> helper into one.
we could skip the status callback and just make the status line generic
enough, it is almost identical anyway, and if we want to make it more
clear we can print a line ("Starting .." or whatever?) before calling
the helper?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing
2025-11-13 9:08 ` Fabian Grünbichler
@ 2025-11-13 9:11 ` Christian Ebner
0 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-13 9:11 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 11/13/25 10:08 AM, Fabian Grünbichler wrote:
> On November 13, 2025 9:43 am, Christian Ebner wrote:
>> On 11/13/25 9:18 AM, Fabian Grünbichler wrote:
>>> On November 12, 2025 5:36 pm, Christian Ebner wrote:
>>>> Provide a helper which allows to either clear the maintenance mode if
>>>> the worker was aborted, or call the provided callback while holding
>>>> the datastore config lock.
>>>>
>>>> In preparation for reusing the same logic for the s3 refresh.
>>>>
>>>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>>>> ---
>>>> changes since version 1:
>>>> - not present in previous version
>>>>
>>>> src/api2/admin/datastore.rs | 50 +++++++++++++++++++++++++------------
>>>> 1 file changed, 34 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>>>> index 7daccf9fd..8d58b5059 100644
>>>> --- a/src/api2/admin/datastore.rs
>>>> +++ b/src/api2/admin/datastore.rs
>>>> @@ -2584,7 +2584,6 @@ fn do_unmount_device(
>>>> if datastore.backing_device.is_none() {
>>>> bail!("can't unmount non-removable datastore");
>>>> }
>>>> - let mount_point = datastore.absolute_path();
>>>>
>>>> let mut old_status = String::new();
>>>> let aborted = wait_on_active_operations(
>>>> @@ -2602,21 +2601,14 @@ fn do_unmount_device(
>>>> },
>>>> )?;
>>>>
>>>> - if aborted || worker.is_some_and(|w| w.abort_requested()) {
>>>> - let _ = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)
>>>> - .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
>>>> - .and_then(|(lock, config)| {
>>>> - unset_maintenance(lock, config)
>>>> - .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
>>>> - });
>>>> - bail!("aborted, due to user request");
>>>> - } else {
>>>> - let (lock, config) = expect_maintenance_type(&datastore.name, MaintenanceType::Unmount)?;
>>>> - crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point))?;
>>>> - unset_maintenance(lock, config)
>>>> - .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>>>> - }
>>>> - Ok(())
>>>> + let mount_point = datastore.absolute_path();
>>>> + clear_or_run_maintenance_locked(
>>>> + &datastore.name,
>>>> + worker,
>>>> + MaintenanceType::Unmount,
>>>> + aborted,
>>>> + || crate::tools::disks::unmount_by_mountpoint(Path::new(&mount_point)),
>>>> + )
>>>> }
>>>>
>>>> #[api(
>>>> @@ -2747,6 +2739,32 @@ fn wait_on_active_operations(
>>>> Ok(false)
>>>> }
>>>>
>>>> +// Either clear the current maintenance mode if the worker was aborted or run the provided callback
>>>> +// while keeping the datastore config lock, so the mode cannot be altered. Clears the maintenance
>>>> +// mode after successful callback execution.
>>>> +fn clear_or_run_maintenance_locked(
>>>> + store: &str,
>>>> + worker: Option<&dyn WorkerTaskContext>,
>>>
>>> this can also drop the Option ;)
>>>
>>> but given that we now have two helpers with two almost identical call
>>> sites, could we not make it a single helper?
>>
>> Okay, can combine them into one. For me it was mentally less friction to
>> have these separated, as especially given that the waiting on active
>> operations seemed worth it's own encapsulation and gets a dedicated
>> callback. Now this requires either 2 callbacks or a format string and
>> callback.
>>
>> But since this is not required elsewhere (yet?) I will combine this
>> helper into one.
>
> we could skip the status callback and just make the status line generic
> enough, it is almost identical anyway, and if we want to make it more
> clear we can print a line ("Starting .." or whatever?) before calling
> the helper?
Yeah, let's get rid of the extra parameter and make the status output
generic.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* [pbs-devel] superseded: [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
` (6 preceding siblings ...)
2025-11-13 8:20 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/6] wait for active operations to finish " Fabian Grünbichler
@ 2025-11-13 14:23 ` Christian Ebner
7 siblings, 0 replies; 17+ messages in thread
From: Christian Ebner @ 2025-11-13 14:23 UTC (permalink / raw)
To: pbs-devel
superseded-by version 3:
https://lore.proxmox.com/pbs-devel/20251113142214.719217-1-c.ebner@proxmox.com/T/
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-11-13 14:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-12 16:36 [pbs-devel] [PATCH proxmox-backup v2 0/6] wait for active operations to finish before s3 refresh Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 1/6] api: datastore: fix typo in helper function name Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 2/6] api: admin: make expected maintenance type helper generic over type Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 3/6] api: admin: factor out busy waiting on active operations Christian Ebner
2025-11-13 8:15 ` Fabian Grünbichler
2025-11-13 8:38 ` Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 4/6] api: admin: factor out locking and maintenance mode clearing Christian Ebner
2025-11-13 8:18 ` Fabian Grünbichler
2025-11-13 8:43 ` Christian Ebner
2025-11-13 9:08 ` Fabian Grünbichler
2025-11-13 9:11 ` Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 5/6] datastore: s3 refresh: set/unset maintenance mode in api handler Christian Ebner
2025-11-12 16:36 ` [pbs-devel] [PATCH proxmox-backup v2 6/6] api: datastore: wait for active operations to clear before s3 refresh Christian Ebner
2025-11-13 8:15 ` Fabian Grünbichler
2025-11-13 9:03 ` Christian Ebner
2025-11-13 8:20 ` [pbs-devel] partially-applied: [PATCH proxmox-backup v2 0/6] wait for active operations to finish " Fabian Grünbichler
2025-11-13 14:23 ` [pbs-devel] superseded: " Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox