[pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
@ 2021-09-10  9:09 Dominik Csapak
  2021-09-11 15:08 ` Thomas Lamprecht
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Csapak @ 2021-09-10  9:09 UTC (permalink / raw)
  To: pbs-devel

this flag marks the file names as 'UTF-8' encoded.

By default, encoding of file names in zips are defined as code page 437,
but we save the filenames as bytes (like in linux fs).

For linux systems this neither would be a problem since most tools
simply use the filenames as bytes, but for the zip utility under
windows it's important since NTFS uses UTF-16 for file names.

Since we generate zips only on pxars (file based backup on linux) or
via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair
assumption that we can mark most filenames as UTF-8.

For zips generated from linux backups to be extracted on windows it is
impossible to do the correct thing anyway, since windows can not have
arbitrary bytes in file names, and for each encoding chosen, there is
some file that cannot be shown correctly.
so either all filenames are decoded as CP437 ('ö' -> '├╢')
or non UTF-8 encoded file-names have garbage characters in them (�)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
sending as RFC since there is no way to have it correct in all cases,
and we have to decide if we want CP437 or UTF-8 by default

 pbs-tools/src/zip.rs | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
index 605480a8..88eea07b 100644
--- a/pbs-tools/src/zip.rs
+++ b/pbs-tools/src/zip.rs
@@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
 const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
 const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;

+const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
+
 // bits for time:
 // 0-4: day of the month (1-31)
 // 5-8: month: (1 = jan, etc.)
@@ -249,7 +251,7 @@ impl ZipEntry {
             LocalFileHeader {
                 signature: LOCAL_FH_SIG,
                 version_needed: 0x2d,
-                flags: 1 << 3,
+                flags: GENERAL_PURUPOSE_FLAGS,
                 compression: 0x8,
                 time,
                 date,
@@ -332,7 +334,7 @@ impl ZipEntry {
                 signature: CENTRAL_DIRECTORY_FH_SIG,
                 version_made_by: VERSION_MADE_BY,
                 version_needed: VERSION_NEEDED,
-                flags: 1 << 3,
+                flags: GENERAL_PURUPOSE_FLAGS,
                 compression: 0x8,
                 time,
                 date,
-- 
2.30.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
  2021-09-10  9:09 [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files Dominik Csapak
@ 2021-09-11 15:08 ` Thomas Lamprecht
  2021-09-13  7:14   ` Dominik Csapak
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Lamprecht @ 2021-09-11 15:08 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 10.09.21 11:09, Dominik Csapak wrote:
> this flag marks the file names as 'UTF-8' encoded.
> 
> By default, encoding of file names in zips are defined as code page 437,
> but we save the filenames as bytes (like in linux fs).
> 
> For linux systems this neither would be a problem since most tools
> simply use the filenames as bytes, but for the zip utility under
> windows it's important since NTFS uses UTF-16 for file names.
> 
> Since we generate zips only on pxars (file based backup on linux) or
> via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair
> assumption that we can mark most filenames as UTF-8.
> 
> For zips generated from linux backups to be extracted on windows it is
> impossible to do the correct thing anyway, since windows can not have
> arbitrary bytes in file names, and for each encoding chosen, there is
> some file that cannot be shown correctly.
> so either all filenames are decoded as CP437 ('ö' -> '├╢')
> or non UTF-8 encoded file-names have garbage characters in them (�)
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> sending as RFC since there is no way to have it correct in all cases,
> and we have to decide if we want CP437 or UTF-8 by default
> 

Yeah, it's not only that we may not be incorrect, the closest definition of a ZIP spec
says "not set == should be cp437 but meh" and "set == MUST be valid UTF-8" about this
bit:

> D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform 
> to the original ZIP character encoding.  If general purpose bit 11 is set, the 
> filename and comment MUST support The Unicode Standard, Version 4.1.0 or 
> greater using the character encoding form defined by the UTF-8 storage 
> specification.  The Unicode Standard is published by the The Unicode
> Consortium (www.unicode.org).  UTF-8 encoded data stored within ZIP files 
> is expected to not include a byte order mark (BOM). 
> 
- Appendix D, https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Also interesting, just below above quote:

> D.3 Applications MAY choose to supplement this file name storage through the use 
> of the 0x0008 Extra Field.  Storage for this optional field is currently 
> undefined, however it will be used to allow storing extended information 
> on source or target encoding that MAY further assist applications with file 
> name, or file content encoding tasks.  Please contact PKWARE with any
> requirements on how this field SHOULD be used.


So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or
other cross-platform tools like 7zip do.

It seems that at least Debian's version of info zip had some thoughts about this and can
(or always does, did not checked that closely) safe utf8 filenames in an extra field, one
that some other tools maybe check for?

https://sources.debian.org/src/zip/3.0-12/zip.c/#L967

I say Debian's version, as upstream still talks about Unicode support on their home page,
which itself may be just outdated too, but it could also be that Debian patched that in.

Any how, it seems to me that there'd be some more compatible options that do not plainly
state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that
angle quite some more; data restoration is probably the most important aspect of a backup
system - so every way we expose doing so should work as as good as possible - even if going
outside our Linux bubble.

>  pbs-tools/src/zip.rs | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
> index 605480a8..88eea07b 100644
> --- a/pbs-tools/src/zip.rs
> +++ b/pbs-tools/src/zip.rs
> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
>  const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
>  const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
>  
> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
> +

- typo in constant name: purupose vs. purpose
- comment order do not match the bits used, bit 11 is EFS and bit 3 is telling
  the parser that the crc32 is not in the header but in the data descriptor after
  the compressed data; your bitwise-OR+comment order suggests different.
- isn't this related to BZ entry #3618, but that is neither mentioned here nor in the
  bug report...

_If_ we'd go down this way then the following const name and formatting would make this
easier to read IMO:

const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor
    | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437)





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
  2021-09-11 15:08 ` Thomas Lamprecht
@ 2021-09-13  7:14   ` Dominik Csapak
  2021-09-13  7:25     ` Thomas Lamprecht
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Csapak @ 2021-09-13  7:14 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 9/11/21 17:08, Thomas Lamprecht wrote:
> On 10.09.21 11:09, Dominik Csapak wrote:
>> this flag marks the file names as 'UTF-8' encoded.
>>
>> By default, encoding of file names in zips are defined as code page 437,
>> but we save the filenames as bytes (like in linux fs).
>>
>> For linux systems this neither would be a problem since most tools
>> simply use the filenames as bytes, but for the zip utility under
>> windows it's important since NTFS uses UTF-16 for file names.
>>
>> Since we generate zips only on pxars (file based backup on linux) or
>> via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair
>> assumption that we can mark most filenames as UTF-8.
>>
>> For zips generated from linux backups to be extracted on windows it is
>> impossible to do the correct thing anyway, since windows can not have
>> arbitrary bytes in file names, and for each encoding chosen, there is
>> some file that cannot be shown correctly.
>> so either all filenames are decoded as CP437 ('ö' -> '├╢')
>> or non UTF-8 encoded file-names have garbage characters in them (�)
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> sending as RFC since there is no way to have it correct in all cases,
>> and we have to decide if we want CP437 or UTF-8 by default
>>
> 
> Yeah, it's not only that we may not be incorrect, the closest definition of a ZIP spec
> says "not set == should be cp437 but meh" and "set == MUST be valid UTF-8" about this
> bit:
> 
>> D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform
>> to the original ZIP character encoding.  If general purpose bit 11 is set, the
>> filename and comment MUST support The Unicode Standard, Version 4.1.0 or
>> greater using the character encoding form defined by the UTF-8 storage
>> specification.  The Unicode Standard is published by the The Unicode
>> Consortium (www.unicode.org).  UTF-8 encoded data stored within ZIP files
>> is expected to not include a byte order mark (BOM).
>>
> - Appendix D, https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Yes, you're right, i did read that part too loosely i think...

> 
> Also interesting, just below above quote:
> 
>> D.3 Applications MAY choose to supplement this file name storage through the use
>> of the 0x0008 Extra Field.  Storage for this optional field is currently
>> undefined, however it will be used to allow storing extended information
>> on source or target encoding that MAY further assist applications with file
>> name, or file content encoding tasks.  Please contact PKWARE with any
>> requirements on how this field SHOULD be used.

AFAIU, that part of the 'spec' is not open and would require a license 
of PKWARE

> 
> 
> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or
> other cross-platform tools like 7zip do.
> 
> It seems that at least Debian's version of info zip had some thoughts about this and can
> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one
> that some other tools maybe check for?
> 
> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967
> 
> I say Debian's version, as upstream still talks about Unicode support on their home page,
> which itself may be just outdated too, but it could also be that Debian patched that in.
> 
> Any how, it seems to me that there'd be some more compatible options that do not plainly
> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that
> angle quite some more; data restoration is probably the most important aspect of a backup
> system - so every way we expose doing so should work as as good as possible - even if going
> outside our Linux bubble.

I tested it with debians zip (Info-Zip), and despite was the spec says,
i could not get it to write an 'extra-field' with the filenames.

Instead, when it finds a filename that has a filename with any byte that
has the high-bit set (>127) it sets the EFS bit in the filename iff
the filename is valid utf-8, and does nothing if not.

IMHO, that sounds like a reasonable thing to do, so i'd suggest
that we test each filename for valid UTF-8, and set the bit
if that's true. This marks ASCII only filenames also as UTF-8,
but thats technically true and simplifies the rust code a bit
(and should be faster for UTF-8 Filenames,
since we do not have to check for a high bit first and then try to
convert...)

Does that sound ok to you?

For non-valid UTF-8 filenames that have a high bit i'd produce
CP437 filenames on windows, and on linux it'd just be the
byte value.

> 
>>   pbs-tools/src/zip.rs | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
>> index 605480a8..88eea07b 100644
>> --- a/pbs-tools/src/zip.rs
>> +++ b/pbs-tools/src/zip.rs
>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
>>   const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
>>   const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
>>   
>> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
>> +
> 
> - typo in constant name: purupose vs. purpose

yeah thanks...

> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling
>    the parser that the crc32 is not in the header but in the data descriptor after
>    the compressed data; your bitwise-OR+comment order suggests different.

sorry

> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the
>    bug report...

that bug-report wasn't there when i wrote the patch.

> 
> _If_ we'd go down this way then the following const name and formatting would make this
> easier to read IMO:
> 
> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor
>      | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437)
> 

makes, sense, though if we only set it conditionally i'd split the
EFS_MARK into its own constant.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
  2021-09-13  7:14   ` Dominik Csapak
@ 2021-09-13  7:25     ` Thomas Lamprecht
  2021-09-13  7:53       ` Dominik Csapak
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Lamprecht @ 2021-09-13  7:25 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox Backup Server development discussion

On 13.09.21 09:14, Dominik Csapak wrote:
> On 9/11/21 17:08, Thomas Lamprecht wrote:
>> On 10.09.21 11:09, Dominik Csapak wrote:
>> Also interesting, just below above quote:
>>
>>> D.3 Applications MAY choose to supplement this file name storage through the use
>>> of the 0x0008 Extra Field.  Storage for this optional field is currently
>>> undefined, however it will be used to allow storing extended information
>>> on source or target encoding that MAY further assist applications with file
>>> name, or file content encoding tasks.  Please contact PKWARE with any
>>> requirements on how this field SHOULD be used.
> 
> AFAIU, that part of the 'spec' is not open and would require a license of PKWARE

yeah that's what I figured, but observe and replicate could in theory still be OK I guess?

> 
>>
>>
>> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or
>> other cross-platform tools like 7zip do.
>>
>> It seems that at least Debian's version of info zip had some thoughts about this and can
>> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one
>> that some other tools maybe check for?
>>
>> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967
>>
>> I say Debian's version, as upstream still talks about Unicode support on their home page,
>> which itself may be just outdated too, but it could also be that Debian patched that in.
>>
>> Any how, it seems to me that there'd be some more compatible options that do not plainly
>> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that
>> angle quite some more; data restoration is probably the most important aspect of a backup
>> system - so every way we expose doing so should work as as good as possible - even if going
>> outside our Linux bubble.
> 
> I tested it with debians zip (Info-Zip), and despite was the spec says,
> i could not get it to write an 'extra-field' with the filenames.
> 

hmm, meh...

> Instead, when it finds a filename that has a filename with any byte that
> has the high-bit set (>127) it sets the EFS bit in the filename iff
> the filename is valid utf-8, and does nothing if not.
> 
> IMHO, that sounds like a reasonable thing to do, so i'd suggest
> that we test each filename for valid UTF-8, and set the bit
> if that's true. This marks ASCII only filenames also as UTF-8,
> but thats technically true and simplifies the rust code a bit
> (and should be faster for UTF-8 Filenames,
> since we do not have to check for a high bit first and then try to
> convert...)
> 
> Does that sound ok to you?

Yes, sounds reasonable to me for nwo. It'd be an improvement and we basically only use
this for on-the-fly generation, so if we find a possible "better" (hard to measure that
with such a mess of format) approach in the future it would help all users immediately
too then.

> 
> For non-valid UTF-8 filenames that have a high bit i'd produce
> CP437 filenames on windows, and on linux it'd just be the
> byte value.

just to be sure: what does "i'd produce" mean here? As we certainly do not differ ZIP
generation semantics by user OS (thank goodness), this means that a unzip by most
reasonable tools would produce the outcome you described, or? 

> 
>>
>>>   pbs-tools/src/zip.rs | 6 ++++--
>>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
>>> index 605480a8..88eea07b 100644
>>> --- a/pbs-tools/src/zip.rs
>>> +++ b/pbs-tools/src/zip.rs
>>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
>>>   const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
>>>   const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
>>>   +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
>>> +
>>
>> - typo in constant name: purupose vs. purpose
> 
> yeah thanks...
> 
>> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling
>>    the parser that the crc32 is not in the header but in the data descriptor after
>>    the compressed data; your bitwise-OR+comment order suggests different.
> 
> sorry
> 
>> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the
>>    bug report...
> 
> that bug-report wasn't there when i wrote the patch.

I naturally only compared mail vs BZ time and missed the date difference of one day,
argh, sorry.

> 
>>
>> _If_ we'd go down this way then the following const name and formatting would make this
>> easier to read IMO:
>>
>> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor
>>      | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437)
>>
> 
> makes, sense, though if we only set it conditionally i'd split the
> EFS_MARK into its own constant.
> 

sounds good to me.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
  2021-09-13  7:25     ` Thomas Lamprecht
@ 2021-09-13  7:53       ` Dominik Csapak
  2021-09-13  7:57         ` Thomas Lamprecht
  0 siblings, 1 reply; 6+ messages in thread
From: Dominik Csapak @ 2021-09-13  7:53 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 9/13/21 09:25, Thomas Lamprecht wrote:
> On 13.09.21 09:14, Dominik Csapak wrote:
>> On 9/11/21 17:08, Thomas Lamprecht wrote:
>>> On 10.09.21 11:09, Dominik Csapak wrote:
>>> Also interesting, just below above quote:
>>>
>>>> D.3 Applications MAY choose to supplement this file name storage through the use
>>>> of the 0x0008 Extra Field.  Storage for this optional field is currently
>>>> undefined, however it will be used to allow storing extended information
>>>> on source or target encoding that MAY further assist applications with file
>>>> name, or file content encoding tasks.  Please contact PKWARE with any
>>>> requirements on how this field SHOULD be used.
>>
>> AFAIU, that part of the 'spec' is not open and would require a license of PKWARE
> 
> yeah that's what I figured, but observe and replicate could in theory still be OK I guess?
> 
>>
>>>
>>>
>>> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or
>>> other cross-platform tools like 7zip do.
>>>
>>> It seems that at least Debian's version of info zip had some thoughts about this and can
>>> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one
>>> that some other tools maybe check for?
>>>
>>> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967
>>>
>>> I say Debian's version, as upstream still talks about Unicode support on their home page,
>>> which itself may be just outdated too, but it could also be that Debian patched that in.
>>>
>>> Any how, it seems to me that there'd be some more compatible options that do not plainly
>>> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that
>>> angle quite some more; data restoration is probably the most important aspect of a backup
>>> system - so every way we expose doing so should work as as good as possible - even if going
>>> outside our Linux bubble.
>>
>> I tested it with debians zip (Info-Zip), and despite was the spec says,
>> i could not get it to write an 'extra-field' with the filenames.
>>
> 
> hmm, meh...
> 
>> Instead, when it finds a filename that has a filename with any byte that
>> has the high-bit set (>127) it sets the EFS bit in the filename iff
>> the filename is valid utf-8, and does nothing if not.
>>
>> IMHO, that sounds like a reasonable thing to do, so i'd suggest
>> that we test each filename for valid UTF-8, and set the bit
>> if that's true. This marks ASCII only filenames also as UTF-8,
>> but thats technically true and simplifies the rust code a bit
>> (and should be faster for UTF-8 Filenames,
>> since we do not have to check for a high bit first and then try to
>> convert...)
>>
>> Does that sound ok to you?
> 
> Yes, sounds reasonable to me for nwo. It'd be an improvement and we basically only use
> this for on-the-fly generation, so if we find a possible "better" (hard to measure that
> with such a mess of format) approach in the future it would help all users immediately
> too then.
> 
>>
>> For non-valid UTF-8 filenames that have a high bit i'd produce
>> CP437 filenames on windows, and on linux it'd just be the
>> byte value.
> 
> just to be sure: what does "i'd produce" mean here? As we certainly do not differ ZIP
> generation semantics by user OS (thank goodness), this means that a unzip by most
> reasonable tools would produce the outcome you described, or?

oops, this should say "it'd" so "it would".

To be completely clear, such ZIPs extracts as
* Windows(built-in + 7zip): UTF-8 filenames get correctly encoded,
   all other filenames interpreted as CP437
* Debian(zip): filenames as bytes in any case
* Debian(7z): weirdly it uses valid utf-8 as utf-8 bytes,
   but other filenames, it interprets as Windows 1252 and encodes as UTF-8..

i'll send a v2 with this semantic though, as i don't think we can do
much better for now

> 
>>
>>>
>>>>    pbs-tools/src/zip.rs | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs
>>>> index 605480a8..88eea07b 100644
>>>> --- a/pbs-tools/src/zip.rs
>>>> +++ b/pbs-tools/src/zip.rs
>>>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d;
>>>>    const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
>>>>    const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
>>>>    +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor
>>>> +
>>>
>>> - typo in constant name: purupose vs. purpose
>>
>> yeah thanks...
>>
>>> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling
>>>     the parser that the crc32 is not in the header but in the data descriptor after
>>>     the compressed data; your bitwise-OR+comment order suggests different.
>>
>> sorry
>>
>>> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the
>>>     bug report...
>>
>> that bug-report wasn't there when i wrote the patch.
> 
> I naturally only compared mail vs BZ time and missed the date difference of one day,
> argh, sorry.
> 
>>
>>>
>>> _If_ we'd go down this way then the following const name and formatting would make this
>>> easier to read IMO:
>>>
>>> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor
>>>       | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437)
>>>
>>
>> makes, sense, though if we only set it conditionally i'd split the
>> EFS_MARK into its own constant.
>>
> 
> sounds good to me.
> 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files
  2021-09-13  7:53       ` Dominik Csapak
@ 2021-09-13  7:57         ` Thomas Lamprecht
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2021-09-13  7:57 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox Backup Server development discussion

On 13.09.21 09:53, Dominik Csapak wrote:
> 
> oops, this should say "it'd" so "it would".
> 
> To be completely clear, such ZIPs extracts as
> * Windows(built-in + 7zip): UTF-8 filenames get correctly encoded,
>   all other filenames interpreted as CP437
> * Debian(zip): filenames as bytes in any case
> * Debian(7z): weirdly it uses valid utf-8 as utf-8 bytes,
>   but other filenames, it interprets as Windows 1252 and encodes as UTF-8..
> 
> i'll send a v2 with this semantic though, as i don't think we can do
> much better for now

ack, thanks for the clear up!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-13  7:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10  9:09 [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files Dominik Csapak
2021-09-11 15:08 ` Thomas Lamprecht
2021-09-13  7:14   ` Dominik Csapak
2021-09-13  7:25     ` Thomas Lamprecht
2021-09-13  7:53       ` Dominik Csapak
2021-09-13  7:57         ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal