public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com, m.carrara@proxmox.com
Subject: Re: [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos
Date: Thu, 28 Sep 2023 09:50:03 +0200 (CEST)	[thread overview]
Message-ID: <1478379062.5245.1695887403647@webmail.proxmox.com> (raw)
In-Reply-To: <r6m56576tv6l2k7th7qmlyq73qxarvg34fxiumwkb7uwcxpuvn@m3m3hyfa3qij>


> On 28.09.2023 09:04 CEST Wolfgang Bumiller <w.bumiller@proxmox.com> wrote:
> 
>  
> On Wed, Sep 27, 2023 at 02:20:18PM +0200, Christian Ebner wrote:
> > 
> > > On 27.09.2023 14:07 CEST Wolfgang Bumiller <w.bumiller@proxmox.com> wrote:
> > > 
> > >  
> > > 'incr' :S
> > > 
> > > On Fri, Sep 22, 2023 at 09:16:08AM +0200, Christian Ebner wrote:
> > > > Adds a helper to allow to increase the encoder position by a given
> > > > size. This is used to increase the position when adding an appendix
> > > > section to the pxar stream, as these bytes are never encoded directly
> > > > but rather referenced by already existing chunks.
> > > 
> > > Exposing this seems like a weird choice to me. Why exactly is this
> > > needed? Why don't we instead expose methods to actually write the
> > > appendix section instead?
> > 
> > This is needed in order to increase the byte offset of the encoder itself.
> > The appendix section is a list of chunks which are injected in the chunk
> > stream on upload, but never really consumed by the encoder and subsequently
> > the chunker itself. So there is no direct writing of the appendix section to
> > the stream.
> > 
> > By adding the bytes, consistency with the rest of the pxar archive is assured,
> > as these chunks/bytes are present during decoding.
> 
> Ah so we inject the *contents* of the old pxar archive by way of sending
> the chunks a writing "layer" above. Initially I thought the archive
> would contain chunk ids, but this makes more sense. And is unfortunate
> for the API :-)

Yes, an initial approach was to store the chunk ids inline, but that is not
necessary and added unneeded storage overhead. As is, the chunks are appended
to a list to be injected after encoding the regular part of the archive,
while instead of the actual file payload the PXAR_APPENIDX_REF entry with
payload size and offset relative to the PXAR_APPENDIX entry is stored.

This section then contains the concatenated referenced chunks, allowing to
restore file payloads by sequentially skipping to the correct offset and
restoring the payload from there.

> 
> Maybe consider marking the position modification as `unsafe fn`, though?
> I mean it is a foot gun to break the resulting archive with, after all
> ;-)

You are right in that this is to be seen as an unsafe operation. Maybe instead
of the function to be unsafe, the interface could take the list of chunks as
input and shift the position accordingly?
Thereby consuming the chunks and store them for injection afterwards.

That way the ownership of the chunk list would be moved to the encoder rather than
being part of the archiver, as is now. The chunk list might then be passed from the
encoder to be injected to the backup upload stream, although I am not sure if and
how to bypass the chunker in that case.

> 
> But this means we don't have a direct way of creating incremental pxars
> without a PBS context, doesn't it?

This is correct. At the moment the only way to create an incremental pxar
archive is to use the PBS context. Both, index file and catalog are required,
which could in principle also be provided by a command line parameter, but
finally also the actual chunk data is needed. That is currently only provided
during restore of the archive from backup.

> Would it make sense to have a method here which returns a Writer to
> the `EncoderOutput` where we could in theory also just "dump in"
> contents of another actual pxar file (where the byte counting happens
> implicitly), which also has an extra `unsafe fn add_out_of_band_bytes()`
> to do the raw byte count modification?

Yes, this might be possible, but for creating the backup I completely want to
avoid that. This would require to download the chunk data just to inject it
for reuse, which is probably way more expensive and defies the purpose of
reusing the chunks to begin with.

If you intended this to be an addition to the current code, in order to create
a pxar archive with appendix locally, without the PBS context, then yes.
This might be possible by passing the data in form of a `MergedKnownChunk`,
which contains either the raw chunk data or the reused chunks hash and size,
allowing to pass either the data or the digest needed to index it.

> 
> One advantage of having a "starting point" for this type of operation is
> that we'd also force a `flush()` before out-of-band data gets written.
> Otherwise, if we don't need/want this, we should probably just add a
> `flush()` to the encoder we should call before adding any chunks out of
> band, given that Max already tried to sneak in a BufRead/Writers into
> the pxar crate for optimization purposes, IIRC ;-)

Good point, flushing is definitely  required if writes will be buffered to
not break the byte stream.




  reply	other threads:[~2023-09-28  7:50 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  7:16 [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08   ` Wolfgang Bumiller
2023-09-27 12:26     ` Christian Ebner
2023-09-28  6:49       ` Wolfgang Bumiller
2023-09-28  7:52         ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32   ` Wolfgang Bumiller
2023-09-27 11:53     ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38   ` Wolfgang Bumiller
2023-09-27 11:55     ` Christian Ebner
2023-09-28  8:07       ` Christian Ebner
2023-09-28  9:00         ` Wolfgang Bumiller
2023-09-28  9:27           ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07   ` Wolfgang Bumiller
2023-09-27 12:20     ` Christian Ebner
2023-09-28  7:04       ` Wolfgang Bumiller
2023-09-28  7:50         ` Christian Ebner [this message]
2023-09-28  8:32           ` Wolfgang Bumiller
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26  7:01   ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26  7:11   ` Christian Ebner
2023-09-26  7:15 ` [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1478379062.5245.1695887403647@webmail.proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=m.carrara@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal