public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Christian Ebner <c.ebner@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos
Date: Thu, 28 Sep 2023 10:32:56 +0200	[thread overview]
Message-ID: <wodlwvq6ospnpfkyiitc5km42giui65gegilngi7cl5ppfqomx@chdodxt6oxec> (raw)
In-Reply-To: <1478379062.5245.1695887403647@webmail.proxmox.com>

On Thu, Sep 28, 2023 at 09:50:03AM +0200, Christian Ebner wrote:
> 
> > On 28.09.2023 09:04 CEST Wolfgang Bumiller <w.bumiller@proxmox.com> wrote:
> > 
> >  
> > On Wed, Sep 27, 2023 at 02:20:18PM +0200, Christian Ebner wrote:
> > > 
> > > > On 27.09.2023 14:07 CEST Wolfgang Bumiller <w.bumiller@proxmox.com> wrote:
> > > > 
> > > >  
> > > > 'incr' :S
> > > > 
> > > > On Fri, Sep 22, 2023 at 09:16:08AM +0200, Christian Ebner wrote:
> > > > > Adds a helper to allow to increase the encoder position by a given
> > > > > size. This is used to increase the position when adding an appendix
> > > > > section to the pxar stream, as these bytes are never encoded directly
> > > > > but rather referenced by already existing chunks.
> > > > 
> > > > Exposing this seems like a weird choice to me. Why exactly is this
> > > > needed? Why don't we instead expose methods to actually write the
> > > > appendix section instead?
> > > 
> > > This is needed in order to increase the byte offset of the encoder itself.
> > > The appendix section is a list of chunks which are injected in the chunk
> > > stream on upload, but never really consumed by the encoder and subsequently
> > > the chunker itself. So there is no direct writing of the appendix section to
> > > the stream.
> > > 
> > > By adding the bytes, consistency with the rest of the pxar archive is assured,
> > > as these chunks/bytes are present during decoding.
> > 
> > Ah so we inject the *contents* of the old pxar archive by way of sending
> > the chunks a writing "layer" above. Initially I thought the archive
> > would contain chunk ids, but this makes more sense. And is unfortunate
> > for the API :-)
> 
> Yes, an initial approach was to store the chunk ids inline, but that is not
> necessary and added unneeded storage overhead. As is, the chunks are appended
> to a list to be injected after encoding the regular part of the archive,
> while instead of the actual file payload the PXAR_APPENIDX_REF entry with
> payload size and offset relative to the PXAR_APPENDIX entry is stored.
> 
> This section then contains the concatenated referenced chunks, allowing to
> restore file payloads by sequentially skipping to the correct offset and
> restoring the payload from there.
> 
> > 
> > Maybe consider marking the position modification as `unsafe fn`, though?
> > I mean it is a foot gun to break the resulting archive with, after all
> > ;-)
> 
> You are right in that this is to be seen as an unsafe operation. Maybe instead
> of the function to be unsafe, the interface could take the list of chunks as
> input and shift the position accordingly?
> Thereby consuming the chunks and store them for injection afterwards.
> 
> That way the ownership of the chunk list would be moved to the encoder rather than
> being part of the archiver, as is now. The chunk list might then be passed from the
> encoder to be injected to the backup upload stream, although I am not sure if and
> how to bypass the chunker in that case.
> 
> > 
> > But this means we don't have a direct way of creating incremental pxars
> > without a PBS context, doesn't it?
> 
> This is correct. At the moment the only way to create an incremental pxar
> archive is to use the PBS context. Both, index file and catalog are required,
> which could in principle also be provided by a command line parameter, but
> finally also the actual chunk data is needed. That is currently only provided
> during restore of the archive from backup.
> 
> > Would it make sense to have a method here which returns a Writer to
> > the `EncoderOutput` where we could in theory also just "dump in"
> > contents of another actual pxar file (where the byte counting happens
> > implicitly), which also has an extra `unsafe fn add_out_of_band_bytes()`
> > to do the raw byte count modification?
> 
> Yes, this might be possible, but for creating the backup I completely want to
> avoid that. This would require to download the chunk data just to inject it
> for reuse, which is probably way more expensive and defies the purpose of
> reusing the chunks to begin with.

No I meant the `unsafe fn add_out_of_band_bytes()` was supposed to bump
just the counter exactly as we do now, and its `Write` interface
specifically only for *non* PBS backup creation.
But we don't need to flesh out the non-PBS-related API right now at all,
my main concern was to make the pxar API more difficult to use wrongly,
specifically the flushing ;-)
But sure, the PBS part is so separated from the pxar code that there's
never anything preventing you from inserting bogus data into the stream
anyway I suppose... but that's on the PBS code side and doesn't really
need to be taken into account from the pxar crate's API point of view.

> 
> If you intended this to be an addition to the current code, in order to create
> a pxar archive with appendix locally, without the PBS context, then yes.
> This might be possible by passing the data in form of a `MergedKnownChunk`,
> which contains either the raw chunk data or the reused chunks hash and size,
> allowing to pass either the data or the digest needed to index it.
> 
> > 
> > One advantage of having a "starting point" for this type of operation is
> > that we'd also force a `flush()` before out-of-band data gets written.
> > Otherwise, if we don't need/want this, we should probably just add a
> > `flush()` to the encoder we should call before adding any chunks out of
> > band, given that Max already tried to sneak in a BufRead/Writers into
> > the pxar crate for optimization purposes, IIRC ;-)
> 
> Good point, flushing is definitely  required if writes will be buffered to
> not break the byte stream.




  reply	other threads:[~2023-09-28  8:33 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  7:16 [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08   ` Wolfgang Bumiller
2023-09-27 12:26     ` Christian Ebner
2023-09-28  6:49       ` Wolfgang Bumiller
2023-09-28  7:52         ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32   ` Wolfgang Bumiller
2023-09-27 11:53     ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38   ` Wolfgang Bumiller
2023-09-27 11:55     ` Christian Ebner
2023-09-28  8:07       ` Christian Ebner
2023-09-28  9:00         ` Wolfgang Bumiller
2023-09-28  9:27           ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07   ` Wolfgang Bumiller
2023-09-27 12:20     ` Christian Ebner
2023-09-28  7:04       ` Wolfgang Bumiller
2023-09-28  7:50         ` Christian Ebner
2023-09-28  8:32           ` Wolfgang Bumiller [this message]
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26  7:01   ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26  7:11   ` Christian Ebner
2023-09-26  7:15 ` [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=wodlwvq6ospnpfkyiitc5km42giui65gegilngi7cl5ppfqomx@chdodxt6oxec \
    --to=w.bumiller@proxmox.com \
    --cc=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal