From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH v5 proxmox-backup 08/31] fix #3044: server: implement push support for sync operations
Date: Fri, 25 Oct 2024 12:10:21 +0200 [thread overview]
Message-ID: <1729850354.sdki2la8q6.astroid@yuna.none> (raw)
In-Reply-To: <20241018084242.144010-9-c.ebner@proxmox.com>
high-level: there is a slight issue here w.r.t. remove
vanished. compared to pull, where we filter removal candidates by
ownership, for push we just remove the whole namespace or group. this
can lead to problems if the user in remote.cfg is highly privileged and
can see and delete *all* groups.
some suggestions below for how to improve this, but we probably want to:
- callout the dangers of shared push targets in general in the docs
- extend the ACL handling from push to pull with 4.0, so that it is
possible to restrict pulling to certain namespace sub-trees on the
remote side
On October 18, 2024 10:42 am, Christian Ebner wrote:
> Adds the functionality required to push datastore contents from a
> source to a remote target.
> This includes syncing of the namespaces, backup groups and snapshots
> based on the provided filters as well as removing vanished contents
> from the target when requested.
>
> While trying to mimic the pull direction of sync jobs, the
> implementation is different as access to the remote must be performed
> via the REST API, not needed for the pull job which can access the
> local datastore via the filesystem directly.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 4:
> - no changes
>
> changes since version 3:
> - Avoid reading known chunks, only re-index based on digest
> - Avoid tempfile for manifest, upload source manifest directly
> - Add map_to_target helper for source to target namespace mapping
> - Only try creating non pre-existing namespace components on target
> - Drop `job_user`, privs are now all checked for `local_user`
> - Removing vanished namespaces now requires PRIV_REMOTE_DATASTORE_MODIFY
>
> src/server/mod.rs | 1 +
> src/server/push.rs | 910 +++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 911 insertions(+)
> create mode 100644 src/server/push.rs
>
> diff --git a/src/server/mod.rs b/src/server/mod.rs
> index 2e40bde3c..7c14ed4b8 100644
> --- a/src/server/mod.rs
> +++ b/src/server/mod.rs
> @@ -36,6 +36,7 @@ pub mod auth;
> pub mod metric_collection;
>
> pub(crate) mod pull;
> +pub(crate) mod push;
> pub(crate) mod sync;
>
> pub(crate) async fn reload_proxy_certificate() -> Result<(), Error> {
> diff --git a/src/server/push.rs b/src/server/push.rs
> new file mode 100644
> index 000000000..bf6045214
> --- /dev/null
> +++ b/src/server/push.rs
> @@ -0,0 +1,910 @@
> +//! Sync datastore by pushing contents to remote server
> +
> +use std::cmp::Ordering;
> +use std::collections::HashSet;
> +use std::sync::{Arc, Mutex};
> +
> +use anyhow::{bail, format_err, Error};
> +use futures::stream::{self, StreamExt, TryStreamExt};
> +use tokio::sync::mpsc;
> +use tokio_stream::wrappers::ReceiverStream;
> +use tracing::info;
> +
> +use pbs_api_types::{
> + print_store_and_ns, Authid, BackupDir, BackupGroup, BackupNamespace, CryptMode, GroupFilter,
> + GroupListItem, NamespaceListItem, Operation, RateLimitConfig, Remote, SnapshotListItem,
> + PRIV_REMOTE_DATASTORE_BACKUP, PRIV_REMOTE_DATASTORE_MODIFY, PRIV_REMOTE_DATASTORE_PRUNE,
> +};
> +use pbs_client::{BackupRepository, BackupWriter, HttpClient, MergedChunkInfo, UploadOptions};
> +use pbs_config::CachedUserInfo;
> +use pbs_datastore::data_blob::ChunkInfo;
> +use pbs_datastore::dynamic_index::DynamicIndexReader;
> +use pbs_datastore::fixed_index::FixedIndexReader;
> +use pbs_datastore::index::IndexFile;
> +use pbs_datastore::manifest::{ArchiveType, CLIENT_LOG_BLOB_NAME, MANIFEST_BLOB_NAME};
> +use pbs_datastore::read_chunk::AsyncReadChunk;
> +use pbs_datastore::{BackupManifest, DataStore, StoreProgress};
> +
> +use super::sync::{
> + check_namespace_depth_limit, LocalSource, RemovedVanishedStats, SkipInfo, SkipReason,
> + SyncSource, SyncStats,
> +};
> +use crate::api2::config::remote;
> +
> +/// Target for backups to be pushed to
> +pub(crate) struct PushTarget {
> + // Name of the remote as found in remote.cfg
> + remote: String,
> + // Target repository on remote
> + repo: BackupRepository,
> + // Target namespace on remote
> + ns: BackupNamespace,
> + // Http client to connect to remote
> + client: HttpClient,
owner: Authid
would need to be added here to allow filtering groups by ownership..
> +}
> +
> +/// Parameters for a push operation
> +pub(crate) struct PushParameters {
> + /// Source of backups to be pushed to remote
> + source: Arc<LocalSource>,
> + /// Target for backups to be pushed to
> + target: PushTarget,
> + /// Local user limiting the accessible source contents, makes sure that the sync job sees the
> + /// same source content when executed by different users with different privileges
> + /// User as which the job gets executed, requires the permissions on the remote
> + local_user: Authid,
> + /// Whether to remove groups which exist locally, but not on the remote end
> + remove_vanished: bool,
> + /// How many levels of sub-namespaces to push (0 == no recursion, None == maximum recursion)
> + max_depth: Option<usize>,
> + /// Filters for reducing the push scope
> + group_filter: Vec<GroupFilter>,
> + /// How many snapshots should be transferred at most (taking the newest N snapshots)
> + transfer_last: Option<usize>,
> +}
> +
> +impl PushParameters {
> + /// Creates a new instance of `PushParameters`.
> + #[allow(clippy::too_many_arguments)]
> + pub(crate) fn new(
> + store: &str,
> + ns: BackupNamespace,
> + remote_id: &str,
> + remote_store: &str,
> + remote_ns: BackupNamespace,
> + local_user: Authid,
> + remove_vanished: Option<bool>,
> + max_depth: Option<usize>,
> + group_filter: Option<Vec<GroupFilter>>,
> + limit: RateLimitConfig,
> + transfer_last: Option<usize>,
> + ) -> Result<Self, Error> {
> + if let Some(max_depth) = max_depth {
> + ns.check_max_depth(max_depth)?;
> + remote_ns.check_max_depth(max_depth)?;
> + };
> + let remove_vanished = remove_vanished.unwrap_or(false);
> +
> + let source = Arc::new(LocalSource {
> + store: DataStore::lookup_datastore(store, Some(Operation::Read))?,
> + ns,
> + });
> +
> + let (remote_config, _digest) = pbs_config::remote::config()?;
> + let remote: Remote = remote_config.lookup("remote", remote_id)?;
> +
> + let repo = BackupRepository::new(
> + Some(remote.config.auth_id.clone()),
> + Some(remote.config.host.clone()),
> + remote.config.port,
> + remote_store.to_string(),
> + );
> +
> + let client = remote::remote_client_config(&remote, Some(limit))?;
> + let target = PushTarget {
> + remote: remote_id.to_string(),
> + repo,
> + ns: remote_ns,
> + client,
> + };
> + let group_filter = group_filter.unwrap_or_default();
> +
> + Ok(Self {
> + source,
> + target,
> + local_user,
> + remove_vanished,
> + max_depth,
> + group_filter,
> + transfer_last,
> + })
> + }
> +
> + // Map the given namespace from source to target by adapting the prefix
> + fn map_to_target(&self, namespace: &BackupNamespace) -> Result<BackupNamespace, Error> {
> + namespace.map_prefix(&self.source.ns, &self.target.ns)
> + }
> +}
> +
> +// Check if the job user given in the push parameters has the provided privs on the remote
> +// datastore namespace
> +fn check_ns_remote_datastore_privs(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + privs: u64,
> +) -> Result<(), Error> {
> + let user_info = CachedUserInfo::new()?;
> + let mut acl_path: Vec<&str> = vec!["remote", ¶ms.target.remote, params.target.repo.store()];
> +
> + if !namespace.is_root() {
> + let ns_components: Vec<&str> = namespace.components().collect();
> + acl_path.extend(ns_components);
> + }
> +
> + user_info.check_privs(¶ms.local_user, &acl_path, privs, false)?;
> +
> + Ok(())
> +}
> +
> +// Fetch the list of namespaces found on target
> +async fn fetch_target_namespaces(params: &PushParameters) -> Result<Vec<BackupNamespace>, Error> {
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/namespace",
> + store = params.target.repo.store(),
> + );
> + let mut result = params.target.client.get(&api_path, None).await?;
> + let namespaces: Vec<NamespaceListItem> = serde_json::from_value(result["data"].take())?;
> + let mut namespaces: Vec<BackupNamespace> = namespaces
> + .into_iter()
> + .map(|namespace| namespace.ns)
> + .collect();
> + namespaces.sort_unstable_by_key(|a| a.name_len());
> +
> + Ok(namespaces)
> +}
> +
> +// Remove the provided namespace from the target
> +async fn remove_target_namespace(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> +) -> Result<(), Error> {
> + if namespace.is_root() {
> + bail!("cannot remove root namespace from target");
> + }
> +
> + check_ns_remote_datastore_privs(params, namespace, PRIV_REMOTE_DATASTORE_MODIFY)
> + .map_err(|err| format_err!("Pruning remote datastore contents not allowed - {err}"))?;
> +
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/namespace",
> + store = params.target.repo.store(),
> + );
> +
> + let target_ns = params.map_to_target(namespace)?;
> + let args = serde_json::json!({
> + "ns": target_ns.name(),
> + "delete-groups": true,
> + });
> +
> + params.target.client.delete(&api_path, Some(args)).await?;
> +
> + Ok(())
> +}
> +
> +// Fetch the list of groups found on target in given namespace
> +async fn fetch_target_groups(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> +) -> Result<Vec<BackupGroup>, Error> {
this should return two sets of groups - first the ones owned by
params.remote.owner, second the ones owned by other authids..
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/groups",
> + store = params.target.repo.store(),
> + );
> +
> + let args = if !namespace.is_root() {
> + let target_ns = params.map_to_target(namespace)?;
> + Some(serde_json::json!({ "ns": target_ns.name() }))
> + } else {
> + None
> + };
> +
> + let mut result = params.target.client.get(&api_path, args).await?;
> + let groups: Vec<GroupListItem> = serde_json::from_value(result["data"].take())?;
> + let mut groups: Vec<BackupGroup> = groups.into_iter().map(|group| group.backup).collect();
so this would need to become a fold to split the groups for the return
value
> +
> + groups.sort_unstable_by(|a, b| {
> + let type_order = a.ty.cmp(&b.ty);
> + if type_order == Ordering::Equal {
> + a.id.cmp(&b.id)
> + } else {
> + type_order
> + }
> + });
> +
> + Ok(groups)
> +}
> +
> +// Remove the provided backup group in given namespace from the target
> +async fn remove_target_group(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + backup_group: &BackupGroup,
> +) -> Result<(), Error> {
> + check_ns_remote_datastore_privs(params, namespace, PRIV_REMOTE_DATASTORE_PRUNE)
> + .map_err(|err| format_err!("Pruning remote datastore contents not allowed - {err}"))?;
> +
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/groups",
> + store = params.target.repo.store(),
> + );
> +
> + let mut args = serde_json::json!({
> + "backup-id": backup_group.id,
> + "backup-type": backup_group.ty,
> + });
> + if !namespace.is_root() {
> + let target_ns = params.map_to_target(namespace)?;
> + args["ns"] = serde_json::to_value(target_ns.name())?;
> + }
> +
> + params.target.client.delete(&api_path, Some(args)).await?;
> +
> + Ok(())
> +}
> +
> +// Check if the namespace is already present on the target, create it otherwise
> +async fn check_or_create_target_namespace(
> + params: &PushParameters,
> + target_namespaces: &[BackupNamespace],
> + namespace: &BackupNamespace,
> +) -> Result<bool, Error> {
> + let mut created = false;
> +
> + if !namespace.is_root() && !target_namespaces.contains(namespace) {
> + // Namespace not present on target, create namespace.
> + // Sub-namespaces have to be created by creating parent components first.
> +
> + check_ns_remote_datastore_privs(params, namespace, PRIV_REMOTE_DATASTORE_MODIFY)
> + .map_err(|err| format_err!("Creating namespace not allowed - {err}"))?;
> +
> + let mut parent = BackupNamespace::root();
> + for component in namespace.components() {
> + let current = BackupNamespace::from_parent_ns(&parent, component.to_string())?;
> + // Skip over pre-existing parent namespaces on target
> + if target_namespaces.contains(¤t) {
> + parent = current;
> + continue;
> + }
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/namespace",
> + store = params.target.repo.store(),
> + );
> + let mut args = serde_json::json!({ "name": component.to_string() });
> + if !parent.is_root() {
> + args["parent"] = serde_json::to_value(parent.clone())?;
> + }
> + if let Err(err) = params.target.client.post(&api_path, Some(args)).await {
> + let target_store_and_ns = print_store_and_ns(params.target.repo.store(), ¤t);
> + bail!("sync into {target_store_and_ns} failed - namespace creation failed: {err}");
> + }
> + created = true;
> + parent = current;
> + }
> + }
> +
> + Ok(created)
> +}
> +
> +/// Push contents of source datastore matched by given push parameters to target.
> +pub(crate) async fn push_store(mut params: PushParameters) -> Result<SyncStats, Error> {
> + let mut errors = false;
> +
> + // Generate list of source namespaces to push to target, limited by max-depth
> + let mut namespaces = params.source.list_namespaces(&mut params.max_depth).await?;
> +
> + check_namespace_depth_limit(¶ms.source.get_ns(), ¶ms.target.ns, &namespaces)?;
> +
> + namespaces.sort_unstable_by_key(|a| a.name_len());
> +
> + // Fetch all accessible namespaces already present on the target
> + let target_namespaces = fetch_target_namespaces(¶ms).await?;
> + // Remember synced namespaces, removing non-synced ones when remove vanished flag is set
> + let mut synced_namespaces = HashSet::with_capacity(namespaces.len());
> +
> + let (mut groups, mut snapshots) = (0, 0);
> + let mut stats = SyncStats::default();
> + for namespace in namespaces {
> + let source_store_and_ns = print_store_and_ns(params.source.store.name(), &namespace);
> + let target_namespace = params.map_to_target(&namespace)?;
> + let target_store_and_ns = print_store_and_ns(params.target.repo.store(), &target_namespace);
> +
> + info!("----");
> + info!("Syncing {source_store_and_ns} into {target_store_and_ns}");
> +
> + synced_namespaces.insert(target_namespace.clone());
> +
> + match check_or_create_target_namespace(¶ms, &target_namespaces, &target_namespace).await
> + {
> + Ok(true) => info!("Created namespace {target_namespace}"),
> + Ok(false) => {}
> + Err(err) => {
> + info!("Cannot sync {source_store_and_ns} into {target_store_and_ns} - {err}");
> + errors = true;
> + continue;
> + }
> + }
> +
> + match push_namespace(&namespace, ¶ms).await {
> + Ok((sync_progress, sync_stats, sync_errors)) => {
> + errors |= sync_errors;
> + stats.add(sync_stats);
> +
> + if params.max_depth != Some(0) {
> + groups += sync_progress.done_groups;
> + snapshots += sync_progress.done_snapshots;
> +
> + let ns = if namespace.is_root() {
> + "root namespace".into()
> + } else {
> + format!("namespace {namespace}")
> + };
> + info!(
> + "Finished syncing {ns}, current progress: {groups} groups, {snapshots} snapshots"
> + );
> + }
> + }
> + Err(err) => {
> + errors = true;
> + info!("Encountered errors while syncing namespace {namespace} - {err}");
> + }
> + }
> + }
> +
> + if params.remove_vanished {
> + for target_namespace in target_namespaces {
this is very dangerous when you have multiple sync jobs overlapping on a
single target, and should be called out in the docs (it does require
giving highly privileged access to the remote.cfg entry, and for the
user in that entry to be highly privileged on the remote, but still!)
> + if synced_namespaces.contains(&target_namespace) {
> + continue;
> + }
> + if let Err(err) = remove_target_namespace(¶ms, &target_namespace).await {
> + info!("failed to remove vanished namespace {target_namespace} - {err}");
> + continue;
> + }
> + info!("removed vanished namespace {target_namespace}");
> + }
> + }
> +
> + if errors {
> + bail!("sync failed with some errors.");
> + }
> +
> + Ok(stats)
> +}
> +
> +/// Push namespace including all backup groups to target
> +///
> +/// Iterate over all backup groups in the namespace and push them to the target.
> +pub(crate) async fn push_namespace(
> + namespace: &BackupNamespace,
> + params: &PushParameters,
> +) -> Result<(StoreProgress, SyncStats, bool), Error> {
> + // Check if user is allowed to perform backups on remote datastore
> + check_ns_remote_datastore_privs(params, namespace, PRIV_REMOTE_DATASTORE_BACKUP)
> + .map_err(|err| format_err!("Pushing to remote not allowed - {err}"))?;
> +
> + let mut list: Vec<BackupGroup> = params
> + .source
> + .list_groups(namespace, ¶ms.local_user)
> + .await?;
> +
> + list.sort_unstable_by(|a, b| {
> + let type_order = a.ty.cmp(&b.ty);
> + if type_order == Ordering::Equal {
> + a.id.cmp(&b.id)
> + } else {
> + type_order
> + }
> + });
> +
> + let total = list.len();
> + let list: Vec<BackupGroup> = list
> + .into_iter()
> + .filter(|group| group.apply_filters(¶ms.group_filter))
> + .collect();
> +
> + info!(
> + "found {filtered} groups to sync (out of {total} total)",
> + filtered = list.len()
> + );
> +
> + let mut errors = false;
> + // Remember synced groups, remove others when the remove vanished flag is set
> + let mut synced_groups = HashSet::new();
> + let mut progress = StoreProgress::new(list.len() as u64);
> + let mut stats = SyncStats::default();
> +
> + for (done, group) in list.into_iter().enumerate() {
> + progress.done_groups = done as u64;
> + progress.done_snapshots = 0;
> + progress.group_snapshots = 0;
(A continued from below) then we could filter out groups with the wrong
owner on the remote side, instead of attempting to push to them which
can only fail. we might still attempt to push and fail if the remote.cfg
user lacks the privileges to see non-owned groups, but if we can, we can
check client-side and give a nicer error..
> + synced_groups.insert(group.clone());
> +
> + match push_group(params, namespace, &group, &mut progress).await {
> + Ok(sync_stats) => stats.add(sync_stats),
> + Err(err) => {
> + info!("sync group '{group}' failed - {err}");
> + errors = true;
> + }
> + }
> + }
> +
> + if params.remove_vanished {
> + let target_groups = fetch_target_groups(params, namespace).await?;
here we should only iterate over the set of remote-owned groups, to
avoid removing groups that cannot have been created by this sync job..
if we move this call higher up (continued above at (A))
> + for target_group in target_groups {
> + if synced_groups.contains(&target_group) {
> + continue;
> + }
> + if !target_group.apply_filters(¶ms.group_filter) {
> + continue;
> + }
> +
> + info!("delete vanished group '{target_group}'");
> +
> + let count_before = match fetch_target_groups(params, namespace).await {
> + Ok(snapshots) => snapshots.len(),
> + Err(_err) => 0, // ignore errors
> + };
> +
> + if let Err(err) = remove_target_group(params, namespace, &target_group).await {
> + info!("{err}");
> + errors = true;
> + continue;
> + }
> +
> + let mut count_after = match fetch_target_groups(params, namespace).await {
> + Ok(snapshots) => snapshots.len(),
> + Err(_err) => 0, // ignore errors
> + };
> +
> + let deleted_groups = if count_after > 0 {
> + info!("kept some protected snapshots of group '{target_group}'");
> + 0
> + } else {
> + 1
> + };
> +
> + if count_after > count_before {
> + count_after = count_before;
> + }
> +
> + stats.add(SyncStats::from(RemovedVanishedStats {
> + snapshots: count_before - count_after,
> + groups: deleted_groups,
> + namespaces: 0,
> + }));
> + }
> + }
> +
> + Ok((progress, stats, errors))
> +}
> +
> +async fn fetch_target_snapshots(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + group: &BackupGroup,
> +) -> Result<Vec<SnapshotListItem>, Error> {
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/snapshots",
> + store = params.target.repo.store(),
> + );
> + let mut args = serde_json::to_value(group)?;
> + if !namespace.is_root() {
> + let target_ns = params.map_to_target(namespace)?;
> + args["ns"] = serde_json::to_value(target_ns)?;
> + }
> + let mut result = params.target.client.get(&api_path, Some(args)).await?;
> + let snapshots: Vec<SnapshotListItem> = serde_json::from_value(result["data"].take())?;
> +
> + Ok(snapshots)
> +}
> +
> +async fn fetch_previous_backup_time(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + group: &BackupGroup,
> +) -> Result<Option<i64>, Error> {
> + let mut snapshots = fetch_target_snapshots(params, namespace, group).await?;
> + snapshots.sort_unstable_by(|a, b| a.backup.time.cmp(&b.backup.time));
> + Ok(snapshots.last().map(|snapshot| snapshot.backup.time))
> +}
> +
> +async fn forget_target_snapshot(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + snapshot: &BackupDir,
> +) -> Result<(), Error> {
> + check_ns_remote_datastore_privs(params, namespace, PRIV_REMOTE_DATASTORE_PRUNE)
> + .map_err(|err| format_err!("Pruning remote datastore contents not allowed - {err}"))?;
> +
> + let api_path = format!(
> + "api2/json/admin/datastore/{store}/snapshots",
> + store = params.target.repo.store(),
> + );
> + let mut args = serde_json::to_value(snapshot)?;
> + if !namespace.is_root() {
> + let target_ns = params.map_to_target(namespace)?;
> + args["ns"] = serde_json::to_value(target_ns)?;
> + }
> + params.target.client.delete(&api_path, Some(args)).await?;
> +
> + Ok(())
> +}
> +
> +/// Push group including all snaphshots to target
> +///
> +/// Iterate over all snapshots in the group and push them to the target.
> +/// The group sync operation consists of the following steps:
> +/// - Query snapshots of given group from the source
> +/// - Sort snapshots by time
> +/// - Apply transfer last cutoff and filters to list
> +/// - Iterate the snapshot list and push each snapshot individually
> +/// - (Optional): Remove vanished groups if `remove_vanished` flag is set
> +pub(crate) async fn push_group(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + group: &BackupGroup,
> + progress: &mut StoreProgress,
> +) -> Result<SyncStats, Error> {
> + let mut already_synced_skip_info = SkipInfo::new(SkipReason::AlreadySynced);
> + let mut transfer_last_skip_info = SkipInfo::new(SkipReason::TransferLast);
> +
> + let mut snapshots: Vec<BackupDir> = params.source.list_backup_dirs(namespace, group).await?;
> + snapshots.sort_unstable_by(|a, b| a.time.cmp(&b.time));
> +
> + let total_snapshots = snapshots.len();
> + let cutoff = params
> + .transfer_last
> + .map(|count| total_snapshots.saturating_sub(count))
> + .unwrap_or_default();
> +
> + let last_snapshot_time = fetch_previous_backup_time(params, namespace, group)
> + .await?
> + .unwrap_or(i64::MIN);
> +
> + let mut source_snapshots = HashSet::new();
> + let snapshots: Vec<BackupDir> = snapshots
> + .into_iter()
> + .enumerate()
> + .filter(|&(pos, ref snapshot)| {
> + source_snapshots.insert(snapshot.time);
> + if last_snapshot_time > snapshot.time {
> + already_synced_skip_info.update(snapshot.time);
> + return false;
> + } else if already_synced_skip_info.count > 0 {
> + info!("{already_synced_skip_info}");
> + already_synced_skip_info.reset();
> + return true;
> + }
> +
> + if pos < cutoff && last_snapshot_time != snapshot.time {
> + transfer_last_skip_info.update(snapshot.time);
> + return false;
> + } else if transfer_last_skip_info.count > 0 {
> + info!("{transfer_last_skip_info}");
> + transfer_last_skip_info.reset();
> + }
> + true
> + })
> + .map(|(_, dir)| dir)
> + .collect();
> +
> + progress.group_snapshots = snapshots.len() as u64;
> +
> + let target_snapshots = fetch_target_snapshots(params, namespace, group).await?;
> + let target_snapshots: Vec<BackupDir> = target_snapshots
> + .into_iter()
> + .map(|snapshot| snapshot.backup)
> + .collect();
> +
> + let mut stats = SyncStats::default();
> + let mut fetch_previous_manifest = !target_snapshots.is_empty();
> + for (pos, source_snapshot) in snapshots.into_iter().enumerate() {
> + if target_snapshots.contains(&source_snapshot) {
> + progress.done_snapshots = pos as u64 + 1;
> + info!("percentage done: {progress}");
> + continue;
> + }
> + let result =
> + push_snapshot(params, namespace, &source_snapshot, fetch_previous_manifest).await;
> + fetch_previous_manifest = true;
> +
> + progress.done_snapshots = pos as u64 + 1;
> + info!("percentage done: {progress}");
> +
> + // stop on error
> + let sync_stats = result?;
> + stats.add(sync_stats);
> + }
> +
> + if params.remove_vanished {
> + let target_snapshots = fetch_target_snapshots(params, namespace, group).await?;
> + for snapshot in target_snapshots {
> + if source_snapshots.contains(&snapshot.backup.time) {
> + continue;
> + }
> + if snapshot.protected {
> + info!(
> + "don't delete vanished snapshot {name} (protected)",
> + name = snapshot.backup
> + );
> + continue;
> + }
> + if let Err(err) = forget_target_snapshot(params, namespace, &snapshot.backup).await {
> + info!(
> + "could not delete vanished snapshot {name} - {err}",
> + name = snapshot.backup
> + );
> + }
> + info!("delete vanished snapshot {name}", name = snapshot.backup);
> + stats.add(SyncStats::from(RemovedVanishedStats {
> + snapshots: 1,
> + groups: 0,
> + namespaces: 0,
> + }));
> + }
> + }
> +
> + Ok(stats)
> +}
> +
> +/// Push snapshot to target
> +///
> +/// Creates a new snapshot on the target and pushes the content of the source snapshot to the
> +/// target by creating a new manifest file and connecting to the remote as backup writer client.
> +/// Chunks are written by recreating the index by uploading the chunk stream as read from the
> +/// source. Data blobs are uploaded as such.
> +pub(crate) async fn push_snapshot(
> + params: &PushParameters,
> + namespace: &BackupNamespace,
> + snapshot: &BackupDir,
> + fetch_previous_manifest: bool,
> +) -> Result<SyncStats, Error> {
> + let mut stats = SyncStats::default();
> + let target_ns = params.map_to_target(namespace)?;
> + let backup_dir = params
> + .source
> + .store
> + .backup_dir(namespace.clone(), snapshot.clone())?;
> +
> + // Reader locks the snapshot
> + let reader = params.source.reader(namespace, snapshot).await?;
> +
> + // Does not lock the manifest, but the reader already assures a locked snapshot
> + let source_manifest = match backup_dir.load_manifest() {
> + Ok((manifest, _raw_size)) => manifest,
> + Err(err) => {
> + // No manifest in snapshot or failed to read, warn and skip
> + log::warn!("failed to load manifest - {err}");
> + return Ok(stats);
> + }
> + };
> +
> + // Manifest to be created on target, referencing all the source archives after upload.
> + let mut manifest = BackupManifest::new(snapshot.clone());
> +
> + // Writer instance locks the snapshot on the remote side
> + let backup_writer = BackupWriter::start(
> + ¶ms.target.client,
> + None,
> + params.target.repo.store(),
> + &target_ns,
> + snapshot,
> + false,
> + false,
> + )
> + .await?;
> +
> + let mut previous_manifest = None;
> + // Use manifest of previous snapshots in group on target for chunk upload deduplication
> + if fetch_previous_manifest {
> + match backup_writer.download_previous_manifest().await {
> + Ok(manifest) => previous_manifest = Some(Arc::new(manifest)),
> + Err(err) => log::info!("Could not download previous manifest - {err}"),
> + }
> + };
> +
> + // Dummy upload options: the actual compression and/or encryption already happened while
> + // the chunks were generated during creation of the backup snapshot, therefore pre-existing
> + // chunks (already compressed and/or encrypted) can be pushed to the target.
> + // Further, these steps are skipped in the backup writer upload stream.
> + //
> + // Therefore, these values do not need to fit the values given in the manifest.
> + // The original manifest is uploaded in the end anyways.
> + //
> + // Compression is set to true so that the uploaded manifest will be compressed.
> + // Encrypt is set to assure that above files are not encrypted.
> + let upload_options = UploadOptions {
> + compress: true,
> + encrypt: false,
> + previous_manifest,
> + ..UploadOptions::default()
> + };
> +
> + // Avoid double upload penalty by remembering already seen chunks
> + let known_chunks = Arc::new(Mutex::new(HashSet::with_capacity(1024 * 1024)));
> +
> + for entry in source_manifest.files() {
> + let mut path = backup_dir.full_path();
> + path.push(&entry.filename);
> + if path.try_exists()? {
> + match ArchiveType::from_path(&entry.filename)? {
> + ArchiveType::Blob => {
> + let file = std::fs::File::open(path.clone())?;
> + let backup_stats = backup_writer.upload_blob(file, &entry.filename).await?;
> + manifest.add_file(
> + entry.filename.to_string(),
> + backup_stats.size,
> + backup_stats.csum,
> + entry.chunk_crypt_mode(),
> + )?;
> + stats.add(SyncStats {
> + chunk_count: backup_stats.chunk_count as usize,
> + bytes: backup_stats.size as usize,
> + elapsed: backup_stats.duration,
> + removed: None,
> + });
> + }
> + ArchiveType::DynamicIndex => {
> + let index = DynamicIndexReader::open(&path)?;
> + let chunk_reader = reader.chunk_reader(entry.chunk_crypt_mode());
> + let sync_stats = push_index(
> + &entry.filename,
> + index,
> + chunk_reader,
> + &backup_writer,
> + &mut manifest,
> + entry.chunk_crypt_mode(),
> + None,
> + known_chunks.clone(),
> + )
> + .await?;
> + stats.add(sync_stats);
> + }
> + ArchiveType::FixedIndex => {
> + let index = FixedIndexReader::open(&path)?;
> + let chunk_reader = reader.chunk_reader(entry.chunk_crypt_mode());
> + let size = index.index_bytes();
> + let sync_stats = push_index(
> + &entry.filename,
> + index,
> + chunk_reader,
> + &backup_writer,
> + &mut manifest,
> + entry.chunk_crypt_mode(),
> + Some(size),
> + known_chunks.clone(),
> + )
> + .await?;
> + stats.add(sync_stats);
> + }
> + }
> + } else {
> + info!("{path:?} does not exist, skipped.");
> + }
> + }
> +
> + // Fetch client log from source and push to target
> + // this has to be handled individually since the log is never part of the manifest
> + let mut client_log_path = backup_dir.full_path();
> + client_log_path.push(CLIENT_LOG_BLOB_NAME);
> + if client_log_path.is_file() {
> + backup_writer
> + .upload_blob_from_file(
> + &client_log_path,
> + CLIENT_LOG_BLOB_NAME,
> + upload_options.clone(),
> + )
> + .await?;
> + }
> + //TODO: only add log line for conditions as described in feedback
> +
> + // Rewrite manifest for pushed snapshot, recreating manifest from source on target
> + let manifest_json = serde_json::to_value(source_manifest)?;
> + let manifest_string = serde_json::to_string_pretty(&manifest_json)?;
> + let backup_stats = backup_writer
> + .upload_blob_from_data(
> + manifest_string.into_bytes(),
> + MANIFEST_BLOB_NAME,
> + upload_options,
> + )
> + .await?;
> + backup_writer.finish().await?;
> +
> + stats.add(SyncStats {
> + chunk_count: backup_stats.chunk_count as usize,
> + bytes: backup_stats.size as usize,
> + elapsed: backup_stats.duration,
> + removed: None,
> + });
> +
> + Ok(stats)
> +}
> +
> +// Read fixed or dynamic index and push to target by uploading via the backup writer instance
> +//
> +// For fixed indexes, the size must be provided as given by the index reader.
> +#[allow(clippy::too_many_arguments)]
> +async fn push_index<'a>(
> + filename: &'a str,
> + index: impl IndexFile + Send + 'static,
> + chunk_reader: Arc<dyn AsyncReadChunk>,
> + backup_writer: &BackupWriter,
> + manifest: &mut BackupManifest,
> + crypt_mode: CryptMode,
> + size: Option<u64>,
> + known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +) -> Result<SyncStats, Error> {
> + let (upload_channel_tx, upload_channel_rx) = mpsc::channel(20);
> + let mut chunk_infos =
> + stream::iter(0..index.index_count()).map(move |pos| index.chunk_info(pos).unwrap());
> +
> + tokio::spawn(async move {
> + while let Some(chunk_info) = chunk_infos.next().await {
> + // Avoid reading known chunks, as they are not uploaded by the backup writer anyways
> + let needs_upload = {
> + // Need to limit the scope of the lock, otherwise the async block is not `Send`
> + let mut known_chunks = known_chunks.lock().unwrap();
> + // Check if present and insert, chunk will be read and uploaded below if not present
> + known_chunks.insert(chunk_info.digest)
> + };
> +
> + let merged_chunk_info = if needs_upload {
> + chunk_reader
> + .read_raw_chunk(&chunk_info.digest)
> + .await
> + .map(|chunk| {
> + MergedChunkInfo::New(ChunkInfo {
> + chunk,
> + digest: chunk_info.digest,
> + chunk_len: chunk_info.size(),
> + offset: chunk_info.range.start,
> + })
> + })
> + } else {
> + Ok(MergedChunkInfo::Known(vec![(
> + // Pass size instead of offset, will be replaced with offset by the backup
> + // writer
> + chunk_info.size(),
> + chunk_info.digest,
> + )]))
> + };
> + let _ = upload_channel_tx.send(merged_chunk_info).await;
> + }
> + });
> +
> + let merged_chunk_info_stream = ReceiverStream::new(upload_channel_rx).map_err(Error::from);
> +
> + let upload_options = UploadOptions {
> + compress: true,
> + encrypt: false,
> + fixed_size: size,
> + ..UploadOptions::default()
> + };
> +
> + let upload_stats = backup_writer
> + .upload_index_chunk_info(filename, merged_chunk_info_stream, upload_options)
> + .await?;
> +
> + manifest.add_file(
> + filename.to_string(),
> + upload_stats.size,
> + upload_stats.csum,
> + crypt_mode,
> + )?;
> +
> + Ok(SyncStats {
> + chunk_count: upload_stats.chunk_count as usize,
> + bytes: upload_stats.size as usize,
> + elapsed: upload_stats.duration,
> + removed: None,
> + })
> +}
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2024-10-25 10:11 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 8:42 [pbs-devel] [PATCH v5 proxmox-backup 00/31] fix #3044: push datastore to remote target Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 01/31] client: backup writer: refactor backup and upload stats counters Christian Ebner
2024-10-25 10:20 ` Fabian Grünbichler
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 02/31] client: backup writer: factor out merged chunk stream upload Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 03/31] client: backup writer: allow push uploading index and chunks Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 04/31] config: acl: refactor acl path component check for datastore Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 05/31] config: acl: allow namespace components for remote datastores Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 06/31] api types: implement remote acl path method for sync job Christian Ebner
2024-10-25 11:44 ` Fabian Grünbichler
2024-10-25 12:46 ` Christian Ebner
2024-10-28 11:04 ` Fabian Grünbichler
2024-10-28 15:13 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 07/31] api types: define remote permissions and roles for push sync Christian Ebner
2024-10-25 10:15 ` Fabian Grünbichler
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 08/31] fix #3044: server: implement push support for sync operations Christian Ebner
2024-10-25 10:10 ` Fabian Grünbichler [this message]
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 09/31] api types/config: add `sync-push` config type for push sync jobs Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 10/31] api: push: implement endpoint for sync in push direction Christian Ebner
2024-10-25 11:45 ` Fabian Grünbichler
2024-10-30 13:48 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 11/31] api: sync: move sync job invocation to server sync module Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 12/31] api: sync jobs: expose optional `sync-direction` parameter Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 13/31] api: admin: avoid duplicate name for list sync jobs api method Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 14/31] api: config: Require PRIV_DATASTORE_AUDIT to modify sync job Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 15/31] api: config: factor out sync job owner check Christian Ebner
2024-10-25 10:16 ` Fabian Grünbichler
2024-10-28 15:17 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 16/31] api: config: extend read access check by sync direction Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 17/31] api: config: extend modify " Christian Ebner
2024-10-25 10:17 ` Fabian Grünbichler
2024-10-25 13:24 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 18/31] bin: manager: add datastore push cli command Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 19/31] ui: group filter: allow to set namespace for local datastore Christian Ebner
2024-10-25 10:32 ` Dominik Csapak
2024-10-28 15:37 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 20/31] ui: sync edit: source group filters based on sync direction Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 21/31] ui: add view with separate grids for pull and push sync jobs Christian Ebner
2024-10-25 10:39 ` Dominik Csapak
2024-10-28 15:52 ` Christian Ebner
2024-10-29 6:22 ` Dominik Csapak
2024-10-29 7:26 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 22/31] ui: sync job: adapt edit window to be used for pull and push Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 23/31] ui: sync: pass sync-direction to allow removing push jobs Christian Ebner
2024-10-25 10:42 ` Dominik Csapak
2024-10-30 13:23 ` Christian Ebner
2024-10-30 13:33 ` Fabian Grünbichler
2024-10-30 13:50 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 24/31] ui: sync view: do not use data model proxy for store Christian Ebner
2024-10-25 10:44 ` Dominik Csapak
2024-10-30 13:29 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 25/31] ui: sync view: set sync direction when invoking run task via api Christian Ebner
2024-10-25 10:44 ` Dominik Csapak
2024-10-30 13:30 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 26/31] datastore: move `BackupGroupDeleteStats` to api types Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 27/31] api types: implement api type for `BackupGroupDeleteStats` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 28/31] api/api-types: refactor api endpoint version, add api types Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 29/31] datastore: increment deleted group counter when removing group Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 30/31] api: datastore/namespace: return backup groups delete stats on remove Christian Ebner
2024-10-25 10:10 ` Fabian Grünbichler
2024-10-30 13:37 ` Christian Ebner
2024-10-30 13:42 ` Fabian Grünbichler
2024-10-31 9:43 ` Christian Ebner
2024-10-31 12:12 ` Fabian Grünbichler
2024-10-31 12:26 ` Christian Ebner
2024-10-18 8:42 ` [pbs-devel] [PATCH v5 proxmox-backup 31/31] server: sync job: use delete stats provided by the api Christian Ebner
2024-10-25 10:17 ` Fabian Grünbichler
2024-10-30 13:44 ` Christian Ebner
2024-10-31 12:20 ` [pbs-devel] [PATCH v5 proxmox-backup 00/31] fix #3044: push datastore to remote target Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1729850354.sdki2la8q6.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox