public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [RFC v2 proxmox-backup 14/36] client: backup: split payload to dedicated stream
Date: Mon, 11 Mar 2024 15:57:52 +0100	[thread overview]
Message-ID: <1710164125.yzk79dmpim.astroid@yuna.none> (raw)
In-Reply-To: <20240305092703.126906-15-c.ebner@proxmox.com>

On March 5, 2024 10:26 am, Christian Ebner wrote:
> This patch is in preparation for being able to quickly lookup
> metadata for previous snapshots, by splitting the upload of
> a pxar archive into two dedicated streams, one for metadata,
> being assigned a .pxar.meta.didx suffix and one for payload
> data, being assigned a .pxar.pld.didx suffix.
> 
> The patch constructs all the required duplicate chunk stream,
> backup writer and upload stream instances required for the
> split archive uploads.
> 
> This not only makes it possible reuse the payload chunks for
> further backup runs but keeps the metadata archive small,
> with the outlook of even making the currently used catalog
> obsolete.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - refactor pxar backup stream geneartion for split stream case
> - refactor archive name generation for split archive case
> 
>  pbs-client/src/pxar/create.rs                 |  4 +
>  pbs-client/src/pxar_backup_stream.rs          | 48 +++++++++---
>  proxmox-backup-client/src/main.rs             | 75 +++++++++++++++++--
>  .../src/proxmox_restore_daemon/api.rs         | 12 ++-
>  pxar-bin/src/main.rs                          |  1 +
>  tests/catar.rs                                |  1 +
>  6 files changed, 119 insertions(+), 22 deletions(-)
> 
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index de8c0696..59aa4450 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -141,6 +141,7 @@ pub async fn create_archive<T, F>(
>      feature_flags: Flags,
>      callback: F,
>      catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
> +    mut payload_writer: Option<T>,

this parameter position is a bit arbitrary - and the later additions in
this series don't really make it better.. maybe we could use this as an
opportunity for some house keeping, thinking about what should go into
the `options`, and whether some of the rest could be meaningfully
grouped?

>      options: PxarCreateOptions,
>  ) -> Result<(), Error>
>  where
> @@ -171,6 +172,9 @@ where
>      }
>  
>      let mut encoder = Encoder::new(&mut writer, &metadata).await?;
> +    if let Some(writer) = payload_writer.as_mut() {
> +        encoder = encoder.attach_payload_output(writer);
> +    }
>  
>      let mut patterns = options.patterns;
>  
> diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
> index 22a6ffdc..9a600cc1 100644
> --- a/pbs-client/src/pxar_backup_stream.rs
> +++ b/pbs-client/src/pxar_backup_stream.rs
> @@ -40,20 +40,34 @@ impl PxarBackupStream {
>          dir: Dir,
>          catalog: Arc<Mutex<CatalogWriter<W>>>,
>          options: crate::pxar::PxarCreateOptions,
> -    ) -> Result<Self, Error> {
> -        let (tx, rx) = std::sync::mpsc::sync_channel(10);
> -
> +        separate_payload_stream: bool,
> +    ) -> Result<(Self, Option<Self>), Error> {
>          let buffer_size = 256 * 1024;
>  
> -        let error = Arc::new(Mutex::new(None));
> -        let error2 = Arc::clone(&error);
> -        let handler = async move {
> -            let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
> +        let (tx, rx) = std::sync::mpsc::sync_channel(10);
> +        let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
> +            buffer_size,
> +            StdChannelWriter::new(tx),
> +        ));
> +        let writer = pxar::encoder::sync::StandardWriter::new(writer);
> +
> +        let (payload_writer, payload_rx) = if separate_payload_stream {
> +            let (tx, rx) = std::sync::mpsc::sync_channel(10);
> +            let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
>                  buffer_size,
>                  StdChannelWriter::new(tx),
>              ));
> +            (
> +                Some(pxar::encoder::sync::StandardWriter::new(payload_writer)),
> +                Some(rx),
> +            )
> +        } else {
> +            (None, None)
> +        };
>  
> -            let writer = pxar::encoder::sync::StandardWriter::new(writer);
> +        let error = Arc::new(Mutex::new(None));
> +        let error2 = Arc::clone(&error);
> +        let handler = async move {
>              if let Err(err) = crate::pxar::create_archive(
>                  dir,
>                  writer,
> @@ -63,6 +77,7 @@ impl PxarBackupStream {
>                      Ok(())
>                  },
>                  Some(catalog),
> +                payload_writer,
>                  options,
>              )
>              .await
> @@ -76,21 +91,30 @@ impl PxarBackupStream {
>          let future = Abortable::new(handler, registration);
>          tokio::spawn(future);
>  
> -        Ok(Self {
> +        let backup_stream = Self {
> +            rx: Some(rx),
> +            handle: Some(handle.clone()),
> +            error: error.clone(),
> +        };
> +
> +        let backup_stream_payload = payload_rx.map(|rx| Self {

nit: IMHO this is a `backup_payload_stream` (a stream of payload(s)),
not a `backup_stream_payload` (the payload of a backup stream)

>              rx: Some(rx),
>              handle: Some(handle),
>              error,
> -        })
> +        });
> +
> +        Ok((backup_stream, backup_stream_payload))
>      }
>  
>      pub fn open<W: Write + Send + 'static>(
>          dirname: &Path,
>          catalog: Arc<Mutex<CatalogWriter<W>>>,
>          options: crate::pxar::PxarCreateOptions,
> -    ) -> Result<Self, Error> {
> +        separate_payload_stream: bool,
> +    ) -> Result<(Self, Option<Self>), Error> {
>          let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
>  
> -        Self::new(dir, catalog, options)
> +        Self::new(dir, catalog, options, separate_payload_stream)
>      }
>  }
>  
> diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
> index 256080be..fd9a4b97 100644
> --- a/proxmox-backup-client/src/main.rs
> +++ b/proxmox-backup-client/src/main.rs
> @@ -187,17 +187,24 @@ async fn backup_directory<P: AsRef<Path>>(
>      client: &BackupWriter,
>      dir_path: P,
>      archive_name: &str,
> +    payload_target: Option<&str>,
>      chunk_size: Option<usize>,
>      catalog: Arc<Mutex<CatalogWriter<TokioWriterAdapter<StdChannelWriter<Error>>>>>,
>      pxar_create_options: pbs_client::pxar::PxarCreateOptions,
>      upload_options: UploadOptions,
> -) -> Result<BackupStats, Error> {
> -    let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
> -    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
> +) -> Result<(BackupStats, Option<BackupStats>), Error> {
>      if upload_options.fixed_size.is_some() {
>          bail!("cannot backup directory with fixed chunk size!");
>      }
>  
> +    let (pxar_stream, payload_stream) = PxarBackupStream::open(
> +        dir_path.as_ref(),
> +        catalog,
> +        pxar_create_options,
> +        payload_target.is_some(),
> +    )?;
> +
> +    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
>      let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
>  
>      let stream = ReceiverStream::new(rx).map_err(Error::from);
> @@ -209,12 +216,43 @@ async fn backup_directory<P: AsRef<Path>>(
>          }
>      });
>  
> +    let stats = client.upload_stream(archive_name, stream, upload_options.clone());
>  
> -    let stats = client
> -        .upload_stream(archive_name, stream, upload_options)
> -        .await?;
> +    if let Some(payload_stream) = payload_stream {
> +        let payload_target = payload_target
> +            .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
>  
> -    Ok(stats)
> +        let mut payload_chunk_stream = ChunkStream::new(
> +            payload_stream,
> +            chunk_size,
> +        );
> +        let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
> +        let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
> +
> +        // spawn payload chunker inside a separate task so that it can run parallel
> +        tokio::spawn(async move {
> +            while let Some(v) = payload_chunk_stream.next().await {
> +                let _ = payload_tx.send(v).await;
> +            }
> +        });
> +
> +        let payload_stats = client.upload_stream(
> +            &payload_target,
> +            stream,
> +            upload_options,
> +        );
> +
> +        match futures::join!(stats, payload_stats) {
> +            (Ok(stats), Ok(payload_stats)) => Ok((stats, Some(payload_stats))),
> +            (Err(err), Ok(_)) => Err(format_err!("upload failed: {err}")),
> +            (Ok(_), Err(err)) => Err(format_err!("upload failed: {err}")),
> +            (Err(err), Err(payload_err)) => {
> +                Err(format_err!("upload failed: {err} - {payload_err}"))
> +            }
> +        }
> +    } else {
> +        Ok((stats.await?, None))
> +    }
>  }
>  
>  async fn backup_image<P: AsRef<Path>>(
> @@ -985,6 +1023,16 @@ async fn create_backup(
>                  manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
>              }
>              (BackupSpecificationType::PXAR, false) => {
> +                let metadata_mode = false; // Until enabled via param
> +                let (target, payload_target) = if metadata_mode {
> +                    (
> +                        format!("{target_base}.meta.{extension}"),
> +                        Some(format!("{target_base}.pld.{extension}")),

*bikeshed mode on* - .pld is rather opaque from a user's perspective,
maybe .data would be a more human readable counterpart to .meta ?

> +                    )
> +                } else {
> +                    (target, None)
> +                };
> +
>                  // start catalog upload on first use
>                  if catalog.is_none() {
>                      let catalog_upload_res =
> @@ -1015,16 +1063,27 @@ async fn create_backup(
>                      ..UploadOptions::default()
>                  };
>  
> -                let stats = backup_directory(
> +                let (stats, payload_stats) = backup_directory(
>                      &client,
>                      &filename,
>                      &target,
> +                    payload_target.as_deref(),
>                      chunk_size_opt,
>                      catalog.clone(),
>                      pxar_options,
>                      upload_options,
>                  )
>                  .await?;
> +
> +                if let Some(payload_stats) = payload_stats {
> +                    manifest.add_file(
> +                        payload_target
> +                            .ok_or_else(|| format_err!("missing payload target archive"))?,
> +                        payload_stats.size,
> +                        payload_stats.csum,
> +                        crypto.mode,
> +                    )?;
> +                }
>                  manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
>                  catalog.lock().unwrap().end_directory()?;
>              }
> diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
> index c2055222..bd8ddb20 100644
> --- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
> +++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
> @@ -356,8 +356,16 @@ fn extract(
>                      };
>  
>                      let pxar_writer = TokioWriter::new(writer);
> -                    create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
> -                        .await
> +                    create_archive(
> +                        dir,
> +                        pxar_writer,
> +                        Flags::DEFAULT,
> +                        |_| Ok(()),
> +                        None,
> +                        None,
> +                        options,
> +                    )
> +                    .await
>                  }
>                  .await;
>                  if let Err(err) = result {
> diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
> index 2bbe90e3..e3b0faac 100644
> --- a/pxar-bin/src/main.rs
> +++ b/pxar-bin/src/main.rs
> @@ -383,6 +383,7 @@ async fn create_archive(
>              Ok(())
>          },
>          None,
> +        None,
>          options,
>      )
>      .await?;
> diff --git a/tests/catar.rs b/tests/catar.rs
> index 36bb4f3b..04af4ffd 100644
> --- a/tests/catar.rs
> +++ b/tests/catar.rs
> @@ -39,6 +39,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
>          Flags::DEFAULT,
>          |_| Ok(()),
>          None,
> +        None,
>          options,
>      ))?;
>  
> -- 
> 2.39.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




  reply	other threads:[~2024-03-11 14:58 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  9:26 [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 01/36] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 02/36] encoder: add optional output writer for file payloads Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 13:50     ` Christian Ebner
2024-03-11 15:41       ` Fabian Grünbichler
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 03/36] format/decoder: add method to read payload references Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 04/36] decoder: add optional payload input stream Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:05     ` Christian Ebner
2024-03-11 15:27       ` Fabian Grünbichler
2024-03-11 15:51         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 05/36] accessor: " Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 06/36] encoder: move to stack based state tracking Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:12     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 07/36] encoder: add payload reference capability Christian Ebner
2024-03-11 13:21   ` Fabian Grünbichler
2024-03-11 14:15     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 08/36] encoder: add payload position capability Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 09/36] encoder: add payload advance capability Christian Ebner
2024-03-11 13:22   ` Fabian Grünbichler
2024-03-11 14:22     ` Christian Ebner
2024-03-11 15:27       ` Fabian Grünbichler
2024-03-11 15:41         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 pxar 10/36] encoder/format: finish payload stream with marker Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 11/36] client: pxar: switch to stack based encoder state Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 12/36] client: backup: factor out extension from backup target Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 13/36] client: backup: early check for fixed index type Christian Ebner
2024-03-11 14:57   ` Fabian Grünbichler
2024-03-11 15:12     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 14/36] client: backup: split payload to dedicated stream Christian Ebner
2024-03-11 14:57   ` Fabian Grünbichler [this message]
2024-03-11 15:22     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 15/36] client: restore: read payload from dedicated index Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:26     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 16/36] tools: cover meta extension for pxar archives Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 17/36] restore: " Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 18/36] client: mount: make split pxar archives mountable Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:29     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 19/36] api: datastore: refactor getting local chunk reader Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 20/36] api: datastore: attach optional payload " Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 21/36] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:31     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 22/36] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:24     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 23/36] www: cover meta extension for pxar archives Christian Ebner
2024-03-11 14:58   ` Fabian Grünbichler
2024-03-11 15:31     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 24/36] index: fetch chunk form index by start/end-offset Christian Ebner
2024-03-12  8:50   ` Fabian Grünbichler
2024-03-14  8:23     ` Christian Ebner
2024-03-12 12:47   ` Dietmar Maurer
2024-03-12 12:51     ` Christian Ebner
2024-03-12 13:03       ` Dietmar Maurer
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 25/36] upload stream: impl reused chunk injector Christian Ebner
2024-03-13  9:43   ` Dietmar Maurer
2024-03-14 14:03     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 26/36] client: chunk stream: add chunk injection queues Christian Ebner
2024-03-12  9:46   ` Fabian Grünbichler
2024-03-19 10:52     ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 27/36] client: implement prepare reference method Christian Ebner
2024-03-12 10:07   ` Fabian Grünbichler
2024-03-19 11:51     ` Christian Ebner
2024-03-19 12:49       ` Fabian Grünbichler
2024-03-20  8:37         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 28/36] client: pxar: implement store to insert chunks on caching Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 29/36] client: pxar: add previous reference to archiver Christian Ebner
2024-03-12 12:12   ` Fabian Grünbichler
2024-03-12 12:25     ` Christian Ebner
2024-03-19 12:59     ` Christian Ebner
2024-03-19 13:04       ` Fabian Grünbichler
2024-03-20  8:52         ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 30/36] client: pxar: add method for metadata comparison Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 31/36] specs: add backup detection mode specification Christian Ebner
2024-03-12 12:17   ` Fabian Grünbichler
2024-03-12 12:31     ` Christian Ebner
2024-03-20  9:28       ` Christian Ebner
2024-03-05  9:26 ` [pbs-devel] [RFC v2 proxmox-backup 32/36] pxar: caching: add look-ahead cache types Christian Ebner
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 33/36] client: pxar: add look-ahead caching Christian Ebner
2024-03-12 14:08   ` Fabian Grünbichler
2024-03-20 10:28     ` Christian Ebner
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 34/36] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-03-13 11:12   ` Fabian Grünbichler
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 35/36] test-suite: add detection mode change benchmark Christian Ebner
2024-03-13 11:48   ` Fabian Grünbichler
2024-03-05  9:27 ` [pbs-devel] [RFC v2 proxmox-backup 36/36] test-suite: Add bin to deb, add shell completions Christian Ebner
2024-03-13 11:18   ` Fabian Grünbichler
2024-03-13 11:44 ` [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1710164125.yzk79dmpim.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal