public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Hannes Laimer <h.laimer@proxmox.com>
To: Christian Ebner <c.ebner@proxmox.com>,
	Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 4/8] api: admin: run configured sync jobs when a datastore is mounted
Date: Fri, 30 May 2025 13:45:26 +0200	[thread overview]
Message-ID: <9b24918c-0f4a-489d-8bf8-d3e2a423749b@proxmox.com> (raw)
In-Reply-To: <4566b2e4-1736-4a30-bd65-0a457d6efd59@proxmox.com>

added some inline comments, I'll probably send a v3 addressing some of
the things you mentioned

On 5/30/25 12:02, Christian Ebner wrote:
> please see some comments inline
> 
> On 5/15/25 14:41, Hannes Laimer wrote:
>> When a datastore is mounted, spawn a new task to run all sync jobs
>> marked with `run-on-mount`. These jobs run sequentially and include
>> any job for which the mounted datastore is:
>>
>> - The source or target in a local push/pull job
>> - The source in a push job to a remote datastore
>> - The target in a pull job from a remote datastore
>>
>> Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
>> ---
>>   src/api2/admin/datastore.rs | 91 +++++++++++++++++++++++++++++++++++--
>>   src/api2/admin/sync.rs      |  2 +-
>>   src/server/sync.rs          |  7 +--
>>   3 files changed, 91 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 392494488..8463adb6a 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -42,8 +42,8 @@ use pbs_api_types::{
>>       DataStoreConfig, DataStoreListItem, DataStoreMountStatus, 
>> DataStoreStatus,
>>       GarbageCollectionJobStatus, GroupListItem, JobScheduleStatus, 
>> KeepOptions, MaintenanceMode,
>>       MaintenanceType, Operation, PruneJobOptions, SnapshotListItem, 
>> SnapshotVerifyState,
>> -    BACKUP_ARCHIVE_NAME_SCHEMA, BACKUP_ID_SCHEMA, 
>> BACKUP_NAMESPACE_SCHEMA, BACKUP_TIME_SCHEMA,
>> -    BACKUP_TYPE_SCHEMA, CATALOG_NAME, CLIENT_LOG_BLOB_NAME, 
>> DATASTORE_SCHEMA,
>> +    SyncJobConfig, BACKUP_ARCHIVE_NAME_SCHEMA, BACKUP_ID_SCHEMA, 
>> BACKUP_NAMESPACE_SCHEMA,
>> +    BACKUP_TIME_SCHEMA, BACKUP_TYPE_SCHEMA, CATALOG_NAME, 
>> CLIENT_LOG_BLOB_NAME, DATASTORE_SCHEMA,
>>       IGNORE_VERIFIED_BACKUPS_SCHEMA, MANIFEST_BLOB_NAME, 
>> MAX_NAMESPACE_DEPTH, NS_MAX_DEPTH_SCHEMA,
>>       PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_BACKUP, 
>> PRIV_DATASTORE_MODIFY, PRIV_DATASTORE_PRUNE,
>>       PRIV_DATASTORE_READ, PRIV_DATASTORE_VERIFY, PRIV_SYS_MODIFY, 
>> UPID, UPID_SCHEMA,
>> @@ -2510,6 +2510,51 @@ pub fn do_mount_device(datastore: 
>> DataStoreConfig) -> Result<(), Error> {
>>       Ok(())
>>   }
>> +async fn do_sync_jobs(
>> +    jobs_to_run: Vec<SyncJobConfig>,
>> +    worker: Arc<WorkerTask>,
>> +) -> Result<(), Error> {
>> +    let count = jobs_to_run.len();
>> +    info!(
>> +        "will run {} sync jobs: {}",
>> +        count,
>> +        jobs_to_run
>> +            .iter()
>> +            .map(|j| j.id.clone())
>> +            .collect::<Vec<String>>()
>> +            .join(", ")
>> +    );
> 
> nit:
> 
> above can be rewritten without cloning the job ids and `count` in-lined as:
> ```
>      info!(
>          "will run {count} sync jobs: {}",
>          jobs_to_run
>              .iter()
>              .map(|sync_job_config| sync_job_config.id.as_str())
>              .collect::<Vec<&str>>()
>              .join(", ")
>      );
> ```
> 
>> +
>> +    for (i, job_config) in jobs_to_run.into_iter().enumerate() {
>> +        if worker.abort_requested() {
>> +            bail!("aborted due to user request");
>> +        }
>> +        let job_id = job_config.id.clone();
>> +        let Ok(job) = Job::new("syncjob", &job_id) else {
> 
> nit: this should log an error/warning and the status progress, as this 
> will fail if the job lock cannot be acquired. That is something which is 
> of interest for debugging.
> 
makes sense, will do

>> +            continue;
>> +        };
>> +        let auth_id = Authid::root_auth_id().clone();
> 
> nit: auth_id does not need to be cloned here ...
> 
>> +        info!("[{}/{count}] starting '{job_id}'...", i + 1);
>> +        match crate::server::do_sync_job(
>> +            job,
>> +            job_config,
>> +            &auth_id,
> 
> ... since only passed as reference here.
> 
>> +            Some("mount".to_string()),
>> +            false,
>> +        ) {
>> +            Ok((_upid, handle)) => {
>> +                tokio::select! {
>> +                    sync_done = handle.fuse() => if let Err(err) = 
>> sync_done { warn!("could not wait for job to finish: {err}"); },
> 
> question: should this be logged as error rather than a warning ...
> 

hmm, maybe. But it doesn't mean anything necessarily failed, the sync
itself will continue. We'll just start the next one without waiting.

But I'm not super sure exactly when this would fail, maybe there are
some cases where an error would be warranted. But with my current
understanding it is not really.

>> +                    _abort = worker.abort_future() => bail!("aborted 
>> due to user request"),
>> +                };
>> +            }
>> +            Err(err) => warn!("unable to start sync job {job_id} - 
>> {err}"),
> 
> ... same here?
> 

in my head error means task could not continue, which is not the case
here, we'll just skip this jobs. But no hard feelings, could also make
an error here.

>> +        }
>> +    }
>> +
>> +    Ok(())
>> +}
>> +
>>   #[api(
>>       protected: true,
>>       input: {
>> @@ -2541,12 +2586,48 @@ pub fn mount(store: String, rpcenv: &mut dyn 
>> RpcEnvironment) -> Result<Value, Er
>>       let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
>>       let to_stdout = rpcenv.env_type() == RpcEnvironmentType::CLI;
>> -    let upid = WorkerTask::new_thread(
>> +    let upid = WorkerTask::spawn(
> 
> question: is it okay to run this on the same thread here? 
> `do_mount_device` does some blocking calls after all?
> 

yes, there's no real reason to look for job configs unless mounting is
successful and done. So the blocking is not a problem, actually quite
the opposite.

>>           "mount-device",
>> -        Some(store),
>> +        Some(store.clone()),
>>           auth_id.to_string(),
>>           to_stdout,
>> -        move |_worker| do_mount_device(datastore),
>> +        move |_worker| async move {
>> +            do_mount_device(datastore.clone())?;
>> +            let Ok((sync_config, _digest)) = 
>> pbs_config::sync::config() else {
>> +                warn!("unable to read sync job config, won't run any 
>> sync jobs");
>> +                return Ok(());
>> +            };
>> +            let Ok(list) = sync_config.convert_to_typed_array("sync") 
>> else {
>> +                warn!("unable to parse sync job config, won't run any 
>> sync jobs");
>> +                return Ok(());
>> +            };
>> +            let jobs_to_run: Vec<SyncJobConfig> = list
>> +                .into_iter()
>> +                .filter(|job: &SyncJobConfig| {
>> +                    // add job iff (running on mount is enabled and) 
>> any of these apply
>> +                    //   - the jobs is local and we are source or target
>> +                    //   - we are the source of a push to a remote
>> +                    //   - we are the target of a pull from a remote
>> +                    //
>> +                    // `job.store == datastore.name` iff we are the 
>> target for pull from remote or we
>> +                    // are the source for push to remote, therefore 
>> we don't have to check for the
>> +                    // direction of the job.
>> +                    job.run_on_mount.unwrap_or(false)
>> +                        && (job.remote.is_none() && job.remote_store 
>> == datastore.name
>> +                            || job.store == datastore.name)
>> +                })
>> +                .collect();
>> +            if !jobs_to_run.is_empty() {
> 
> comment: an additional log info to the mount task log would be nice, so 
> one sees from it as well that some sync jobs were triggered.
> 

sure

>> +                let _ = WorkerTask::spawn(
>> +                    "mount-sync-jobs",
>> +                    Some(store),
>> +                    auth_id.to_string(),
>> +                    false,
>> +                    move |worker| async move 
>> { do_sync_jobs(jobs_to_run, worker).await },
>> +                );
>> +            }
>> +            Ok(())
>> +        },
> 
> comment: all of above is executed within a api endpoint flagged as 
> protected! This however leads to the sync job to be executed by the 
> proxmox-backup-proxy, all the synced contents therefore written and 
> owner by the root user instead of the backup user.
> 

actually true, good catch! I didn't even check that, I just assumed we'd 
explicitly set the owner whenever we create dirs/files during sync. We 
do this for garbage collection and in basically all other places IIRC.

So, I think we should also do that for syncs, can you think of a reason
to not do that? If not, I'll prepare a patch for that.

>>       )?;
>>       Ok(json!(upid))
>> diff --git a/src/api2/admin/sync.rs b/src/api2/admin/sync.rs
>> index 6722ebea0..01dea5126 100644
>> --- a/src/api2/admin/sync.rs
>> +++ b/src/api2/admin/sync.rs
>> @@ -161,7 +161,7 @@ pub fn run_sync_job(
>>       let to_stdout = rpcenv.env_type() == RpcEnvironmentType::CLI;
>> -    let upid_str = do_sync_job(job, sync_job, &auth_id, None, 
>> to_stdout)?;
>> +    let (upid_str, _) = do_sync_job(job, sync_job, &auth_id, None, 
>> to_stdout)?;
>>       Ok(upid_str)
>>   }
>> diff --git a/src/server/sync.rs b/src/server/sync.rs
>> index 09814ef0c..c45a8975e 100644
>> --- a/src/server/sync.rs
>> +++ b/src/server/sync.rs
>> @@ -12,6 +12,7 @@ use futures::{future::FutureExt, select};
>>   use hyper::http::StatusCode;
>>   use pbs_config::BackupLockGuard;
>>   use serde_json::json;
>> +use tokio::task::JoinHandle;
>>   use tracing::{info, warn};
>>   use proxmox_human_byte::HumanByte;
>> @@ -598,7 +599,7 @@ pub fn do_sync_job(
>>       auth_id: &Authid,
>>       schedule: Option<String>,
>>       to_stdout: bool,
>> -) -> Result<String, Error> {
>> +) -> Result<(String, JoinHandle<()>), Error> {
>>       let job_id = format!(
>>           "{}:{}:{}:{}:{}",
>>           sync_job.remote.as_deref().unwrap_or("-"),
>> @@ -614,7 +615,7 @@ pub fn do_sync_job(
>>           bail!("can't sync to same datastore");
>>       }
>> -    let upid_str = WorkerTask::spawn(
>> +    let (upid_str, handle) = WorkerTask::spawn_with_handle(
>>           &worker_type,
>>           Some(job_id.clone()),
>>           auth_id.to_string(),
>> @@ -730,7 +731,7 @@ pub fn do_sync_job(
>>           },
>>       )?;
>> -    Ok(upid_str)
>> +    Ok((upid_str, handle))
>>   }
>>   pub(super) fn ignore_not_verified_or_encrypted(
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-05-30 11:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-15 12:41 [pbs-devel] [PATCH proxmox/proxmox-backup v2 0/8] trigger sync jobs on mount Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox v2 1/8] rest-server: add function that returns a join handle for spawn Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox v2 2/8] pbs-api-types: add run-on-mount flag to SyncJobConfig Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 3/8] api: config: sync: update run-on-mount correctly Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 4/8] api: admin: run configured sync jobs when a datastore is mounted Hannes Laimer
2025-05-30 10:02   ` Christian Ebner
2025-05-30 11:45     ` Hannes Laimer [this message]
2025-05-30 13:08       ` Christian Ebner
2025-06-04 12:22         ` Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 5/8] api: admin: trigger sync jobs only on datastore mount Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 6/8] bin: manager: run uuid_mount/mount tasks on the proxy Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 7/8] ui: add run-on-mount checkbox to SyncJob form Hannes Laimer
2025-05-15 12:41 ` [pbs-devel] [PATCH proxmox-backup v2 8/8] ui: add task title for triggering sync jobs Hannes Laimer
2025-06-04 12:32 ` [pbs-devel] superseded: Re: [PATCH proxmox/proxmox-backup v2 0/8] trigger sync jobs on mount Hannes Laimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b24918c-0f4a-489d-8bf8-d3e2a423749b@proxmox.com \
    --to=h.laimer@proxmox.com \
    --cc=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal