public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size
Date: Thu, 28 Sep 2023 11:27:28 +0200 (CEST)	[thread overview]
Message-ID: <532454423.5350.1695893248395@webmail.proxmox.com> (raw)
In-Reply-To: <37sal4okdwhrkqslzsdbtxtah53zl5z6vyu6x44wv6xr3gha6n@zkbdjo42wrlj>


> On 28.09.2023 11:00 CEST Wolfgang Bumiller <w.bumiller@proxmox.com> wrote:
> 
>  
> On Thu, Sep 28, 2023 at 10:07:40AM +0200, Christian Ebner wrote:
> > I was giving this some more thought and are not really convinced that sending
> > this trough an encoder instance, which digests the encoded byte stream and counts
> > the bytes is the right approach here.
> 
> How about moving the logic `encode_metadata` from `Encoder` into
> `Metadata` with an `Option<&mut SeqWrite>` parameter, not a full
> Encoder, and just having the encoding vs counting logic live right next
> to each other depending on whether the writer is Some?
> That should be as cheap as it gets?
>

Hmm, the Metadata should however not be concerned about how it might be encoded in different
contexts. That is something only the encoder should be concerned about.

> > 
> > The purpose of this function is to calculate the bytes, which I can easily skip over
> > *without* having to call any expensive encoding/decoding functionality.
> > I might get around this by simply calling the decoder on the byte stream, than I do
> > not need this at all (if I'm not missing something). Might that be the better approach?
> 
> I'm not sure decoding is that much cheaper than dummy-encoding...
> depending on the data I'd say it could even be more expensive in some
> cases? (rare cases though, only with lots of ACLs/xattrs around I
> suppose...)

Probably so, although the data structures are somewhat simple in this case.

> 
> > 
> > Additionally, and maybe even better, I might get rid of this also by letting the
> > PXAR_APPENDIX_REF offset point to the start of the file payload entry, instead of the
> > file entry as is now, thereby being able to blindly skip over this already to begin with.
> > Although I am not sure if that is the best approach for handling the metadata, which should
> > ideally not be encoded twice, once before the PXAR_APPENDIX_REF and the PXAR_PAYLOAD.
> 
> Not sure why skipping data would encode it twice? Or did you mean to
> imply that previously we pointed to metadata, but when instead pointing
> to the payload we need to instead encode it in the new archive which we
> previously did not need to do?

Yes, I was referring to the latter, having to encode the metadata also in the regular part of the archive
would give the same data twice, once the newly encoded and the same in the appended chunks. Storing this
twice bloats the size unnecessarily.

Which brings me to another point I did not take into consideration so far: How to handle files which metadata
changed but was not checked against. Since the catalog only contains size and mtime, only these are comparable.
But I need the current xattrs, acls ecc... Will have to look up if changing those actually changes the mtime as well.




  reply	other threads:[~2023-09-28  9:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  7:16 [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08   ` Wolfgang Bumiller
2023-09-27 12:26     ` Christian Ebner
2023-09-28  6:49       ` Wolfgang Bumiller
2023-09-28  7:52         ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32   ` Wolfgang Bumiller
2023-09-27 11:53     ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38   ` Wolfgang Bumiller
2023-09-27 11:55     ` Christian Ebner
2023-09-28  8:07       ` Christian Ebner
2023-09-28  9:00         ` Wolfgang Bumiller
2023-09-28  9:27           ` Christian Ebner [this message]
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07   ` Wolfgang Bumiller
2023-09-27 12:20     ` Christian Ebner
2023-09-28  7:04       ` Wolfgang Bumiller
2023-09-28  7:50         ` Christian Ebner
2023-09-28  8:32           ` Wolfgang Bumiller
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26  7:01   ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26  7:11   ` Christian Ebner
2023-09-26  7:15 ` [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=532454423.5350.1695893248395@webmail.proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=w.bumiller@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal