From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>,
Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
Date: Mon, 13 Sep 2021 09:25:52 +0200 [thread overview]
Message-ID: <916d18cc-b4ac-ff07-e0fe-6c4dab61105d@proxmox.com> (raw)
In-Reply-To: <758f28ef-4c6c-ccaf-54a7-17e4c0f8b10c@proxmox.com>
On 13.09.21 09:14, Dominik Csapak wrote:
> On 9/11/21 17:08, Thomas Lamprecht wrote:
>> On 10.09.21 11:09, Dominik Csapak wrote:
>> Also interesting, just below above quote:
>>
>>> D.3 Applications MAY choose to supplement this file name storage through the use
>>> of the 0x0008 Extra Field. Storage for this optional field is currently
>>> undefined, however it will be used to allow storing extended information
>>> on source or target encoding that MAY further assist applications with file
>>> name, or file content encoding tasks. Please contact PKWARE with any
>>> requirements on how this field SHOULD be used.
>
> AFAIU, that part of the 'spec' is not open and would require a license of PKWARE
yeah that's what I figured, but observe and replicate could in theory still be OK I guess?
>
>>
>>
>> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or
>> other cross-platform tools like 7zip do.
>>
>> It seems that at least Debian's version of info zip had some thoughts about this and can
>> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one
>> that some other tools maybe check for?
>>
>> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967
>>
>> I say Debian's version, as upstream still talks about Unicode support on their home page,
>> which itself may be just outdated too, but it could also be that Debian patched that in.
>>
>> Any how, it seems to me that there'd be some more compatible options that do not plainly
>> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that
>> angle quite some more; data restoration is probably the most important aspect of a backup
>> system - so every way we expose doing so should work as as good as possible - even if going
>> outside our Linux bubble.
>
> I tested it with debians zip (Info-Zip), and despite was the spec says,
> i could not get it to write an 'extra-field' with the filenames.
>
hmm, meh...
> Instead, when it finds a filename that has a filename with any byte that
> has the high-bit set (>127) it sets the EFS bit in the filename iff
> the filename is valid utf-8, and does nothing if not.
>
> IMHO, that sounds like a reasonable thing to do, so i'd suggest
> that we test each filename for valid UTF-8, and set the bit
> if that's true. This marks ASCII only filenames also as UTF-8,
> but thats technically true and simplifies the rust code a bit
> (and should be faster for UTF-8 Filenames,
> since we do not have to check for a high bit first and then try to
> convert...)
>
> Does that sound ok to you?
Yes, sounds reasonable to me for nwo. It'd be an improvement and we basically only use
this for on-the-fly generation, so if we find a possible "better" (hard to measure that
with such a mess of format) approach in the future it would help all users immediately
too then.
>
> For non-valid UTF-8 filenames that have a high bit i'd produce
> CP437 filenames on windows, and on linux it'd just be the
> byte value.
just to be sure: what does "i'd produce" mean here? As we certainly do not differ ZIP
generation semantics by user OS (thank goodness), this means that a unzip by most
reasonable tools would produce the outcome you described, or?
>
>>
>>> pbs-tools/src/zip.rs | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
>>> index 605480a8..88eea07b 100644
>>> --- a/pbs-tools/src/zip.rs
>>> +++ b/pbs-tools/src/zip.rs
>>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
>>> const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
>>> const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
>>> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
>>> +
>>
>> - typo in constant name: purupose vs. purpose
>
> yeah thanks...
>
>> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling
>> the parser that the crc32 is not in the header but in the data descriptor after
>> the compressed data; your bitwise-OR+comment order suggests different.
>
> sorry
>
>> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the
>> bug report...
>
> that bug-report wasn't there when i wrote the patch.
I naturally only compared mail vs BZ time and missed the date difference of one day,
argh, sorry.
>
>>
>> _If_ we'd go down this way then the following const name and formatting would make this
>> easier to read IMO:
>>
>> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor
>> | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437)
>>
>
> makes, sense, though if we only set it conditionally i'd split the
> EFS_MARK into its own constant.
>
sounds good to me.
next prev parent reply other threads:[~2021-09-13 7:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-10 9:09 Dominik Csapak
2021-09-11 15:08 ` Thomas Lamprecht
2021-09-13 7:14 ` Dominik Csapak
2021-09-13 7:25 ` Thomas Lamprecht [this message]
2021-09-13 7:53 ` Dominik Csapak
2021-09-13 7:57 ` Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=916d18cc-b4ac-ff07-e0fe-6c4dab61105d@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.