* [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files @ 2021-09-10 9:09 Dominik Csapak 2021-09-11 15:08 ` Thomas Lamprecht 0 siblings, 1 reply; 6+ messages in thread From: Dominik Csapak @ 2021-09-10 9:09 UTC (permalink / raw) To: pbs-devel this flag marks the file names as 'UTF-8' encoded. By default, encoding of file names in zips are defined as code page 437, but we save the filenames as bytes (like in linux fs). For linux systems this neither would be a problem since most tools simply use the filenames as bytes, but for the zip utility under windows it's important since NTFS uses UTF-16 for file names. Since we generate zips only on pxars (file based backup on linux) or via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair assumption that we can mark most filenames as UTF-8. For zips generated from linux backups to be extracted on windows it is impossible to do the correct thing anyway, since windows can not have arbitrary bytes in file names, and for each encoding chosen, there is some file that cannot be shown correctly. so either all filenames are decoded as CP437 ('ö' -> '├╢') or non UTF-8 encoded file-names have garbage characters in them (�) Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> --- sending as RFC since there is no way to have it correct in all cases, and we have to decide if we want CP437 or UTF-8 by default pbs-tools/src/zip.rs | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs index 605480a8..88eea07b 100644 --- a/pbs-tools/src/zip.rs +++ b/pbs-tools/src/zip.rs @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d; const ZIP64_EOCD_RECORD: u32 = 0x06064B50; const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50; +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor + // bits for time: // 0-4: day of the month (1-31) // 5-8: month: (1 = jan, etc.) @@ -249,7 +251,7 @@ impl ZipEntry { LocalFileHeader { signature: LOCAL_FH_SIG, version_needed: 0x2d, - flags: 1 << 3, + flags: GENERAL_PURUPOSE_FLAGS, compression: 0x8, time, date, @@ -332,7 +334,7 @@ impl ZipEntry { signature: CENTRAL_DIRECTORY_FH_SIG, version_made_by: VERSION_MADE_BY, version_needed: VERSION_NEEDED, - flags: 1 << 3, + flags: GENERAL_PURUPOSE_FLAGS, compression: 0x8, time, date, -- 2.30.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files 2021-09-10 9:09 [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files Dominik Csapak @ 2021-09-11 15:08 ` Thomas Lamprecht 2021-09-13 7:14 ` Dominik Csapak 0 siblings, 1 reply; 6+ messages in thread From: Thomas Lamprecht @ 2021-09-11 15:08 UTC (permalink / raw) To: Proxmox Backup Server development discussion, Dominik Csapak On 10.09.21 11:09, Dominik Csapak wrote: > this flag marks the file names as 'UTF-8' encoded. > > By default, encoding of file names in zips are defined as code page 437, > but we save the filenames as bytes (like in linux fs). > > For linux systems this neither would be a problem since most tools > simply use the filenames as bytes, but for the zip utility under > windows it's important since NTFS uses UTF-16 for file names. > > Since we generate zips only on pxars (file based backup on linux) or > via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair > assumption that we can mark most filenames as UTF-8. > > For zips generated from linux backups to be extracted on windows it is > impossible to do the correct thing anyway, since windows can not have > arbitrary bytes in file names, and for each encoding chosen, there is > some file that cannot be shown correctly. > so either all filenames are decoded as CP437 ('ö' -> '├╢') > or non UTF-8 encoded file-names have garbage characters in them (�) > > Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> > --- > sending as RFC since there is no way to have it correct in all cases, > and we have to decide if we want CP437 or UTF-8 by default > Yeah, it's not only that we may not be incorrect, the closest definition of a ZIP spec says "not set == should be cp437 but meh" and "set == MUST be valid UTF-8" about this bit: > D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform > to the original ZIP character encoding. If general purpose bit 11 is set, the > filename and comment MUST support The Unicode Standard, Version 4.1.0 or > greater using the character encoding form defined by the UTF-8 storage > specification. The Unicode Standard is published by the The Unicode > Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files > is expected to not include a byte order mark (BOM). > - Appendix D, https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Also interesting, just below above quote: > D.3 Applications MAY choose to supplement this file name storage through the use > of the 0x0008 Extra Field. Storage for this optional field is currently > undefined, however it will be used to allow storing extended information > on source or target encoding that MAY further assist applications with file > name, or file content encoding tasks. Please contact PKWARE with any > requirements on how this field SHOULD be used. So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or other cross-platform tools like 7zip do. It seems that at least Debian's version of info zip had some thoughts about this and can (or always does, did not checked that closely) safe utf8 filenames in an extra field, one that some other tools maybe check for? https://sources.debian.org/src/zip/3.0-12/zip.c/#L967 I say Debian's version, as upstream still talks about Unicode support on their home page, which itself may be just outdated too, but it could also be that Debian patched that in. Any how, it seems to me that there'd be some more compatible options that do not plainly state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that angle quite some more; data restoration is probably the most important aspect of a backup system - so every way we expose doing so should work as as good as possible - even if going outside our Linux bubble. > pbs-tools/src/zip.rs | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs > index 605480a8..88eea07b 100644 > --- a/pbs-tools/src/zip.rs > +++ b/pbs-tools/src/zip.rs > @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d; > const ZIP64_EOCD_RECORD: u32 = 0x06064B50; > const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50; > > +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor > + - typo in constant name: purupose vs. purpose - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling the parser that the crc32 is not in the header but in the data descriptor after the compressed data; your bitwise-OR+comment order suggests different. - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the bug report... _If_ we'd go down this way then the following const name and formatting would make this easier to read IMO: const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files 2021-09-11 15:08 ` Thomas Lamprecht @ 2021-09-13 7:14 ` Dominik Csapak 2021-09-13 7:25 ` Thomas Lamprecht 0 siblings, 1 reply; 6+ messages in thread From: Dominik Csapak @ 2021-09-13 7:14 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 9/11/21 17:08, Thomas Lamprecht wrote: > On 10.09.21 11:09, Dominik Csapak wrote: >> this flag marks the file names as 'UTF-8' encoded. >> >> By default, encoding of file names in zips are defined as code page 437, >> but we save the filenames as bytes (like in linux fs). >> >> For linux systems this neither would be a problem since most tools >> simply use the filenames as bytes, but for the zip utility under >> windows it's important since NTFS uses UTF-16 for file names. >> >> Since we generate zips only on pxars (file based backup on linux) or >> via file-restore-daemons (linux; ntfs mounted as UTF-8), it's a fair >> assumption that we can mark most filenames as UTF-8. >> >> For zips generated from linux backups to be extracted on windows it is >> impossible to do the correct thing anyway, since windows can not have >> arbitrary bytes in file names, and for each encoding chosen, there is >> some file that cannot be shown correctly. >> so either all filenames are decoded as CP437 ('ö' -> '├╢') >> or non UTF-8 encoded file-names have garbage characters in them (�) >> >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com> >> --- >> sending as RFC since there is no way to have it correct in all cases, >> and we have to decide if we want CP437 or UTF-8 by default >> > > Yeah, it's not only that we may not be incorrect, the closest definition of a ZIP spec > says "not set == should be cp437 but meh" and "set == MUST be valid UTF-8" about this > bit: > >> D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform >> to the original ZIP character encoding. If general purpose bit 11 is set, the >> filename and comment MUST support The Unicode Standard, Version 4.1.0 or >> greater using the character encoding form defined by the UTF-8 storage >> specification. The Unicode Standard is published by the The Unicode >> Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files >> is expected to not include a byte order mark (BOM). >> > - Appendix D, https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Yes, you're right, i did read that part too loosely i think... > > Also interesting, just below above quote: > >> D.3 Applications MAY choose to supplement this file name storage through the use >> of the 0x0008 Extra Field. Storage for this optional field is currently >> undefined, however it will be used to allow storing extended information >> on source or target encoding that MAY further assist applications with file >> name, or file content encoding tasks. Please contact PKWARE with any >> requirements on how this field SHOULD be used. AFAIU, that part of the 'spec' is not open and would require a license of PKWARE > > > So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or > other cross-platform tools like 7zip do. > > It seems that at least Debian's version of info zip had some thoughts about this and can > (or always does, did not checked that closely) safe utf8 filenames in an extra field, one > that some other tools maybe check for? > > https://sources.debian.org/src/zip/3.0-12/zip.c/#L967 > > I say Debian's version, as upstream still talks about Unicode support on their home page, > which itself may be just outdated too, but it could also be that Debian patched that in. > > Any how, it seems to me that there'd be some more compatible options that do not plainly > state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that > angle quite some more; data restoration is probably the most important aspect of a backup > system - so every way we expose doing so should work as as good as possible - even if going > outside our Linux bubble. I tested it with debians zip (Info-Zip), and despite was the spec says, i could not get it to write an 'extra-field' with the filenames. Instead, when it finds a filename that has a filename with any byte that has the high-bit set (>127) it sets the EFS bit in the filename iff the filename is valid utf-8, and does nothing if not. IMHO, that sounds like a reasonable thing to do, so i'd suggest that we test each filename for valid UTF-8, and set the bit if that's true. This marks ASCII only filenames also as UTF-8, but thats technically true and simplifies the rust code a bit (and should be faster for UTF-8 Filenames, since we do not have to check for a high bit first and then try to convert...) Does that sound ok to you? For non-valid UTF-8 filenames that have a high bit i'd produce CP437 filenames on windows, and on linux it'd just be the byte value. > >> pbs-tools/src/zip.rs | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs >> index 605480a8..88eea07b 100644 >> --- a/pbs-tools/src/zip.rs >> +++ b/pbs-tools/src/zip.rs >> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d; >> const ZIP64_EOCD_RECORD: u32 = 0x06064B50; >> const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50; >> >> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor >> + > > - typo in constant name: purupose vs. purpose yeah thanks... > - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling > the parser that the crc32 is not in the header but in the data descriptor after > the compressed data; your bitwise-OR+comment order suggests different. sorry > - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the > bug report... that bug-report wasn't there when i wrote the patch. > > _If_ we'd go down this way then the following const name and formatting would make this > easier to read IMO: > > const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor > | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437) > makes, sense, though if we only set it conditionally i'd split the EFS_MARK into its own constant. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files 2021-09-13 7:14 ` Dominik Csapak @ 2021-09-13 7:25 ` Thomas Lamprecht 2021-09-13 7:53 ` Dominik Csapak 0 siblings, 1 reply; 6+ messages in thread From: Thomas Lamprecht @ 2021-09-13 7:25 UTC (permalink / raw) To: Dominik Csapak, Proxmox Backup Server development discussion On 13.09.21 09:14, Dominik Csapak wrote: > On 9/11/21 17:08, Thomas Lamprecht wrote: >> On 10.09.21 11:09, Dominik Csapak wrote: >> Also interesting, just below above quote: >> >>> D.3 Applications MAY choose to supplement this file name storage through the use >>> of the 0x0008 Extra Field. Storage for this optional field is currently >>> undefined, however it will be used to allow storing extended information >>> on source or target encoding that MAY further assist applications with file >>> name, or file content encoding tasks. Please contact PKWARE with any >>> requirements on how this field SHOULD be used. > > AFAIU, that part of the 'spec' is not open and would require a license of PKWARE yeah that's what I figured, but observe and replicate could in theory still be OK I guess? > >> >> >> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or >> other cross-platform tools like 7zip do. >> >> It seems that at least Debian's version of info zip had some thoughts about this and can >> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one >> that some other tools maybe check for? >> >> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967 >> >> I say Debian's version, as upstream still talks about Unicode support on their home page, >> which itself may be just outdated too, but it could also be that Debian patched that in. >> >> Any how, it seems to me that there'd be some more compatible options that do not plainly >> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that >> angle quite some more; data restoration is probably the most important aspect of a backup >> system - so every way we expose doing so should work as as good as possible - even if going >> outside our Linux bubble. > > I tested it with debians zip (Info-Zip), and despite was the spec says, > i could not get it to write an 'extra-field' with the filenames. > hmm, meh... > Instead, when it finds a filename that has a filename with any byte that > has the high-bit set (>127) it sets the EFS bit in the filename iff > the filename is valid utf-8, and does nothing if not. > > IMHO, that sounds like a reasonable thing to do, so i'd suggest > that we test each filename for valid UTF-8, and set the bit > if that's true. This marks ASCII only filenames also as UTF-8, > but thats technically true and simplifies the rust code a bit > (and should be faster for UTF-8 Filenames, > since we do not have to check for a high bit first and then try to > convert...) > > Does that sound ok to you? Yes, sounds reasonable to me for nwo. It'd be an improvement and we basically only use this for on-the-fly generation, so if we find a possible "better" (hard to measure that with such a mess of format) approach in the future it would help all users immediately too then. > > For non-valid UTF-8 filenames that have a high bit i'd produce > CP437 filenames on windows, and on linux it'd just be the > byte value. just to be sure: what does "i'd produce" mean here? As we certainly do not differ ZIP generation semantics by user OS (thank goodness), this means that a unzip by most reasonable tools would produce the outcome you described, or? > >> >>> pbs-tools/src/zip.rs | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs >>> index 605480a8..88eea07b 100644 >>> --- a/pbs-tools/src/zip.rs >>> +++ b/pbs-tools/src/zip.rs >>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d; >>> const ZIP64_EOCD_RECORD: u32 = 0x06064B50; >>> const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50; >>> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor >>> + >> >> - typo in constant name: purupose vs. purpose > > yeah thanks... > >> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling >> the parser that the crc32 is not in the header but in the data descriptor after >> the compressed data; your bitwise-OR+comment order suggests different. > > sorry > >> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the >> bug report... > > that bug-report wasn't there when i wrote the patch. I naturally only compared mail vs BZ time and missed the date difference of one day, argh, sorry. > >> >> _If_ we'd go down this way then the following const name and formatting would make this >> easier to read IMO: >> >> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor >> | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437) >> > > makes, sense, though if we only set it conditionally i'd split the > EFS_MARK into its own constant. > sounds good to me. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files 2021-09-13 7:25 ` Thomas Lamprecht @ 2021-09-13 7:53 ` Dominik Csapak 2021-09-13 7:57 ` Thomas Lamprecht 0 siblings, 1 reply; 6+ messages in thread From: Dominik Csapak @ 2021-09-13 7:53 UTC (permalink / raw) To: Thomas Lamprecht, Proxmox Backup Server development discussion On 9/13/21 09:25, Thomas Lamprecht wrote: > On 13.09.21 09:14, Dominik Csapak wrote: >> On 9/11/21 17:08, Thomas Lamprecht wrote: >>> On 10.09.21 11:09, Dominik Csapak wrote: >>> Also interesting, just below above quote: >>> >>>> D.3 Applications MAY choose to supplement this file name storage through the use >>>> of the 0x0008 Extra Field. Storage for this optional field is currently >>>> undefined, however it will be used to allow storing extended information >>>> on source or target encoding that MAY further assist applications with file >>>> name, or file content encoding tasks. Please contact PKWARE with any >>>> requirements on how this field SHOULD be used. >> >> AFAIU, that part of the 'spec' is not open and would require a license of PKWARE > > yeah that's what I figured, but observe and replicate could in theory still be OK I guess? > >> >>> >>> >>> So I'd like to know what standard tools like info-zip (i.e., Debian's "zip" package) or >>> other cross-platform tools like 7zip do. >>> >>> It seems that at least Debian's version of info zip had some thoughts about this and can >>> (or always does, did not checked that closely) safe utf8 filenames in an extra field, one >>> that some other tools maybe check for? >>> >>> https://sources.debian.org/src/zip/3.0-12/zip.c/#L967 >>> >>> I say Debian's version, as upstream still talks about Unicode support on their home page, >>> which itself may be just outdated too, but it could also be that Debian patched that in. >>> >>> Any how, it seems to me that there'd be some more compatible options that do not plainly >>> state that they're 100% utf-8 while actually not being so sure of that, so I'd explore that >>> angle quite some more; data restoration is probably the most important aspect of a backup >>> system - so every way we expose doing so should work as as good as possible - even if going >>> outside our Linux bubble. >> >> I tested it with debians zip (Info-Zip), and despite was the spec says, >> i could not get it to write an 'extra-field' with the filenames. >> > > hmm, meh... > >> Instead, when it finds a filename that has a filename with any byte that >> has the high-bit set (>127) it sets the EFS bit in the filename iff >> the filename is valid utf-8, and does nothing if not. >> >> IMHO, that sounds like a reasonable thing to do, so i'd suggest >> that we test each filename for valid UTF-8, and set the bit >> if that's true. This marks ASCII only filenames also as UTF-8, >> but thats technically true and simplifies the rust code a bit >> (and should be faster for UTF-8 Filenames, >> since we do not have to check for a high bit first and then try to >> convert...) >> >> Does that sound ok to you? > > Yes, sounds reasonable to me for nwo. It'd be an improvement and we basically only use > this for on-the-fly generation, so if we find a possible "better" (hard to measure that > with such a mess of format) approach in the future it would help all users immediately > too then. > >> >> For non-valid UTF-8 filenames that have a high bit i'd produce >> CP437 filenames on windows, and on linux it'd just be the >> byte value. > > just to be sure: what does "i'd produce" mean here? As we certainly do not differ ZIP > generation semantics by user OS (thank goodness), this means that a unzip by most > reasonable tools would produce the outcome you described, or? oops, this should say "it'd" so "it would". To be completely clear, such ZIPs extracts as * Windows(built-in + 7zip): UTF-8 filenames get correctly encoded, all other filenames interpreted as CP437 * Debian(zip): filenames as bytes in any case * Debian(7z): weirdly it uses valid utf-8 as utf-8 bytes, but other filenames, it interprets as Windows 1252 and encodes as UTF-8.. i'll send a v2 with this semantic though, as i don't think we can do much better for now > >> >>> >>>> pbs-tools/src/zip.rs | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/pbs-tools/src/zip.rs b/pbs-tools/src/zip.rs >>>> index 605480a8..88eea07b 100644 >>>> --- a/pbs-tools/src/zip.rs >>>> +++ b/pbs-tools/src/zip.rs >>>> @@ -34,6 +34,8 @@ const VERSION_MADE_BY: u16 = 0x032d; >>>> const ZIP64_EOCD_RECORD: u32 = 0x06064B50; >>>> const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50; >>>> +const GENERAL_PURUPOSE_FLAGS: u16 = (1 << 3) | (1 << 11); // EFS + Data Descriptor >>>> + >>> >>> - typo in constant name: purupose vs. purpose >> >> yeah thanks... >> >>> - comment order do not match the bits used, bit 11 is EFS and bit 3 is telling >>> the parser that the crc32 is not in the header but in the data descriptor after >>> the compressed data; your bitwise-OR+comment order suggests different. >> >> sorry >> >>> - isn't this related to BZ entry #3618, but that is neither mentioned here nor in the >>> bug report... >> >> that bug-report wasn't there when i wrote the patch. > > I naturally only compared mail vs BZ time and missed the date difference of one day, > argh, sorry. > >> >>> >>> _If_ we'd go down this way then the following const name and formatting would make this >>> easier to read IMO: >>> >>> const LFH_GENERAL_PURPOSE_FLAGS: u16 = (1 << 3) // we place crc32 in data descriptor >>> | (1 << 11); // EFS, mark filenames & comments as UTF-8 (not guaranteed but more often OK than CP437) >>> >> >> makes, sense, though if we only set it conditionally i'd split the >> EFS_MARK into its own constant. >> > > sounds good to me. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files 2021-09-13 7:53 ` Dominik Csapak @ 2021-09-13 7:57 ` Thomas Lamprecht 0 siblings, 0 replies; 6+ messages in thread From: Thomas Lamprecht @ 2021-09-13 7:57 UTC (permalink / raw) To: Dominik Csapak, Proxmox Backup Server development discussion On 13.09.21 09:53, Dominik Csapak wrote: > > oops, this should say "it'd" so "it would". > > To be completely clear, such ZIPs extracts as > * Windows(built-in + 7zip): UTF-8 filenames get correctly encoded, > all other filenames interpreted as CP437 > * Debian(zip): filenames as bytes in any case > * Debian(7z): weirdly it uses valid utf-8 as utf-8 bytes, > but other filenames, it interprets as Windows 1252 and encodes as UTF-8.. > > i'll send a v2 with this semantic though, as i don't think we can do > much better for now ack, thanks for the clear up! ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-09-13 7:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-10 9:09 [pbs-devel] [RFC PATCH proxmox-backup] pbs-tools: zip: add EFS flag to zip files Dominik Csapak 2021-09-11 15:08 ` Thomas Lamprecht 2021-09-13 7:14 ` Dominik Csapak 2021-09-13 7:25 ` Thomas Lamprecht 2021-09-13 7:53 ` Dominik Csapak 2021-09-13 7:57 ` Thomas Lamprecht
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox