* [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
@ 2022-01-10 11:23 Dominik Csapak
2022-01-11 5:49 ` [pbs-devel] applied: " Thomas Lamprecht
0 siblings, 1 reply; 3+ messages in thread
From: Dominik Csapak @ 2022-01-10 11:23 UTC (permalink / raw)
To: pbs-devel
this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.
By default, encoding of file names in zips are defined as code page 437,
but we save the filenames as bytes (like in linux fs).
For linux systems this would not be a problem since most tools
simply use the filenames as bytes, but for the zip utility under
windows it's important since NTFS uses UTF-16 for file names.
For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
for other filenames with a high bit set, it depends on the OS/Software
what exactly happens. Some cases below:
* Windows + Built-in/7zip: decoded as CP437
* Debian + zip: Bytes taken as-is
* Debian + 7z: interpreted as Windows1252, decoded as UTF-8
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
* included bug# in the subject
* removed two spurious newlines
proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/proxmox-async/src/zip.rs b/proxmox-async/src/zip.rs
index 12d9f29..d1df713 100644
--- a/proxmox-async/src/zip.rs
+++ b/proxmox-async/src/zip.rs
@@ -34,6 +34,9 @@ const VERSION_MADE_BY: u16 = 0x032d;
const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
+const LFH_GENERAL_PURPOSE_FLAGS: u16 = 1 << 3; // we place crc32 in the data descriptor
+const LFH_GPF_EFS_BIT: u16 = 1 << 11; // EFS, marks filename & comment as UTF-8
+
// bits for time:
// 0-4: day of the month (1-31)
// 5-8: month: (1 = jan, etc.)
@@ -200,6 +203,7 @@ pub struct ZipEntry {
compressed_size: u64,
offset: u64,
is_file: bool,
+ is_utf8_filename: bool,
}
impl ZipEntry {
@@ -220,8 +224,11 @@ impl ZipEntry {
relpath.push(""); // adds trailing slash
}
+ let filename: OsString = relpath.into();
+ let is_utf8_filename = filename.to_str().is_some();
+
Self {
- filename: relpath.into(),
+ filename,
crc32: 0,
mtime,
mode,
@@ -229,6 +236,15 @@ impl ZipEntry {
compressed_size: 0,
offset: 0,
is_file,
+ is_utf8_filename,
+ }
+ }
+
+ fn get_general_purpose_flags(&self) -> u16 {
+ if self.is_utf8_filename {
+ LFH_GENERAL_PURPOSE_FLAGS | LFH_GPF_EFS_BIT
+ } else {
+ LFH_GENERAL_PURPOSE_FLAGS
}
}
@@ -249,7 +265,7 @@ impl ZipEntry {
LocalFileHeader {
signature: LOCAL_FH_SIG,
version_needed: 0x2d,
- flags: 1 << 3,
+ flags: self.get_general_purpose_flags(),
compression: 0x8,
time,
date,
@@ -332,7 +348,7 @@ impl ZipEntry {
signature: CENTRAL_DIRECTORY_FH_SIG,
version_made_by: VERSION_MADE_BY,
version_needed: VERSION_NEEDED,
- flags: 1 << 3,
+ flags: self.get_general_purpose_flags(),
compression: 0x8,
time,
date,
--
2.30.2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [pbs-devel] applied: [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
2022-01-10 11:23 [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files Dominik Csapak
@ 2022-01-11 5:49 ` Thomas Lamprecht
2022-01-11 7:40 ` Dominik Csapak
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2022-01-11 5:49 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Dominik Csapak
On 10.01.22 12:23, Dominik Csapak wrote:
> this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.
>
> By default, encoding of file names in zips are defined as code page 437,
> but we save the filenames as bytes (like in linux fs).
>
> For linux systems this would not be a problem since most tools
> simply use the filenames as bytes, but for the zip utility under
> windows it's important since NTFS uses UTF-16 for file names.
>
> For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
> correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
> for other filenames with a high bit set, it depends on the OS/Software
> what exactly happens. Some cases below:
>
> * Windows + Built-in/7zip: decoded as CP437
> * Debian + zip: Bytes taken as-is
> * Debian + 7z: interpreted as Windows1252, decoded as UTF-8
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
> * included bug# in the subject
> * removed two spurious newlines
>
> proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
>
applied, thanks!
Out of interest, did you benchmark if this changes makes an impact in zip-streaming?
I'd think that if, then only for the case with many small files?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [pbs-devel] applied: [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
2022-01-11 5:49 ` [pbs-devel] applied: " Thomas Lamprecht
@ 2022-01-11 7:40 ` Dominik Csapak
0 siblings, 0 replies; 3+ messages in thread
From: Dominik Csapak @ 2022-01-11 7:40 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 1/11/22 06:49, Thomas Lamprecht wrote:
> On 10.01.22 12:23, Dominik Csapak wrote:
>> this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.
>>
>> By default, encoding of file names in zips are defined as code page 437,
>> but we save the filenames as bytes (like in linux fs).
>>
>> For linux systems this would not be a problem since most tools
>> simply use the filenames as bytes, but for the zip utility under
>> windows it's important since NTFS uses UTF-16 for file names.
>>
>> For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
>> correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
>> for other filenames with a high bit set, it depends on the OS/Software
>> what exactly happens. Some cases below:
>>
>> * Windows + Built-in/7zip: decoded as CP437
>> * Debian + zip: Bytes taken as-is
>> * Debian + 7z: interpreted as Windows1252, decoded as UTF-8
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> changes from v1:
>> * moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
>> * included bug# in the subject
>> * removed two spurious newlines
>>
>> proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>
>>
>
> applied, thanks!
>
> Out of interest, did you benchmark if this changes makes an impact in zip-streaming?
> I'd think that if, then only for the case with many small files?
no i did not benchmark it, but during zip streaming i am here almost
always disk limited (accessing random chunks), so i don't think i would
have gotten interesting results...
ofc i can do some benchmarks with/without the patch this week
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-01-11 7:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 11:23 [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files Dominik Csapak
2022-01-11 5:49 ` [pbs-devel] applied: " Thomas Lamprecht
2022-01-11 7:40 ` Dominik Csapak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox