all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
@ 2022-01-10 11:23 Dominik Csapak
  2022-01-11  5:49 ` [pbs-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 3+ messages in thread
From: Dominik Csapak @ 2022-01-10 11:23 UTC (permalink / raw)
  To: pbs-devel

this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.

By default, encoding of file names in zips are defined as code page 437,
but we save the filenames as bytes (like in linux fs).

For linux systems this would not be a problem since most tools
simply use the filenames as bytes, but for the zip utility under
windows it's important since NTFS uses UTF-16 for file names.

For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
for other filenames with a high bit set, it depends on the OS/Software
what exactly happens. Some cases below:

* Windows + Built-in/7zip: decoded as CP437
* Debian + zip: Bytes taken as-is
* Debian + 7z: interpreted as Windows1252, decoded as UTF-8

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
changes from v1:
* moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
* included bug# in the subject
* removed two spurious newlines

 proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/proxmox-async/src/zip.rs b/proxmox-async/src/zip.rs
index 12d9f29..d1df713 100644
--- a/proxmox-async/src/zip.rs
+++ b/proxmox-async/src/zip.rs
@@ -34,6 +34,9 @@ const VERSION_MADE_BY: u16 = 0x032d;
 const ZIP64_EOCD_RECORD: u32 = 0x06064B50;
 const ZIP64_EOCD_LOCATOR: u32 = 0x07064B50;
 
+const LFH_GENERAL_PURPOSE_FLAGS: u16 = 1 << 3; // we place crc32 in the data descriptor
+const LFH_GPF_EFS_BIT: u16 = 1 << 11; // EFS, marks filename & comment as UTF-8
+
 // bits for time:
 // 0-4: day of the month (1-31)
 // 5-8: month: (1 = jan, etc.)
@@ -200,6 +203,7 @@ pub struct ZipEntry {
     compressed_size: u64,
     offset: u64,
     is_file: bool,
+    is_utf8_filename: bool,
 }
 
 impl ZipEntry {
@@ -220,8 +224,11 @@ impl ZipEntry {
             relpath.push(""); // adds trailing slash
         }
 
+        let filename: OsString = relpath.into();
+        let is_utf8_filename = filename.to_str().is_some();
+
         Self {
-            filename: relpath.into(),
+            filename,
             crc32: 0,
             mtime,
             mode,
@@ -229,6 +236,15 @@ impl ZipEntry {
             compressed_size: 0,
             offset: 0,
             is_file,
+            is_utf8_filename,
+        }
+    }
+
+    fn get_general_purpose_flags(&self) -> u16 {
+        if self.is_utf8_filename {
+            LFH_GENERAL_PURPOSE_FLAGS | LFH_GPF_EFS_BIT
+        } else {
+            LFH_GENERAL_PURPOSE_FLAGS
         }
     }
 
@@ -249,7 +265,7 @@ impl ZipEntry {
             LocalFileHeader {
                 signature: LOCAL_FH_SIG,
                 version_needed: 0x2d,
-                flags: 1 << 3,
+                flags: self.get_general_purpose_flags(),
                 compression: 0x8,
                 time,
                 date,
@@ -332,7 +348,7 @@ impl ZipEntry {
                 signature: CENTRAL_DIRECTORY_FH_SIG,
                 version_made_by: VERSION_MADE_BY,
                 version_needed: VERSION_NEEDED,
-                flags: 1 << 3,
+                flags: self.get_general_purpose_flags(),
                 compression: 0x8,
                 time,
                 date,
-- 
2.30.2





^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pbs-devel] applied: [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
  2022-01-10 11:23 [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files Dominik Csapak
@ 2022-01-11  5:49 ` Thomas Lamprecht
  2022-01-11  7:40   ` Dominik Csapak
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Lamprecht @ 2022-01-11  5:49 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Dominik Csapak

On 10.01.22 12:23, Dominik Csapak wrote:
> this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.
> 
> By default, encoding of file names in zips are defined as code page 437,
> but we save the filenames as bytes (like in linux fs).
> 
> For linux systems this would not be a problem since most tools
> simply use the filenames as bytes, but for the zip utility under
> windows it's important since NTFS uses UTF-16 for file names.
> 
> For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
> correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
> for other filenames with a high bit set, it depends on the OS/Software
> what exactly happens. Some cases below:
> 
> * Windows + Built-in/7zip: decoded as CP437
> * Debian + zip: Bytes taken as-is
> * Debian + 7z: interpreted as Windows1252, decoded as UTF-8
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
> * included bug# in the subject
> * removed two spurious newlines
> 
>  proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
>

applied, thanks!

Out of interest, did you benchmark if this changes makes an impact in zip-streaming?
I'd think that if, then only for the case with many small files?




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [pbs-devel] applied: [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files
  2022-01-11  5:49 ` [pbs-devel] applied: " Thomas Lamprecht
@ 2022-01-11  7:40   ` Dominik Csapak
  0 siblings, 0 replies; 3+ messages in thread
From: Dominik Csapak @ 2022-01-11  7:40 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 1/11/22 06:49, Thomas Lamprecht wrote:
> On 10.01.22 12:23, Dominik Csapak wrote:
>> this flag marks the file names as 'UTF-8' encoded if they are valid UTF-8.
>>
>> By default, encoding of file names in zips are defined as code page 437,
>> but we save the filenames as bytes (like in linux fs).
>>
>> For linux systems this would not be a problem since most tools
>> simply use the filenames as bytes, but for the zip utility under
>> windows it's important since NTFS uses UTF-16 for file names.
>>
>> For filenames that are valid UTF-8, they are decoded as UTF-8 everywhere
>> correctly (Linux as UTF-8 bytes, Windows as correct UTF-16 sequence) and
>> for other filenames with a high bit set, it depends on the OS/Software
>> what exactly happens. Some cases below:
>>
>> * Windows + Built-in/7zip: decoded as CP437
>> * Debian + zip: Bytes taken as-is
>> * Debian + 7z: interpreted as Windows1252, decoded as UTF-8
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
>> ---
>> changes from v1:
>> * moved to proxmox/proxmox-async from proxmox-backup/pbs-tools
>> * included bug# in the subject
>> * removed two spurious newlines
>>
>>   proxmox-async/src/zip.rs | 22 +++++++++++++++++++---
>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>
>>
> 
> applied, thanks!
> 
> Out of interest, did you benchmark if this changes makes an impact in zip-streaming?
> I'd think that if, then only for the case with many small files?

no i did not benchmark it, but during zip streaming i am here almost 
always disk limited (accessing random chunks), so i don't think i would
have gotten interesting results...

ofc i can do some benchmarks with/without the patch this week




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-11  7:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-10 11:23 [pbs-devel] [PATCH proxmox v2] fix #3618: proxmox-async: zip: add conditional EFS flag to zip files Dominik Csapak
2022-01-11  5:49 ` [pbs-devel] applied: " Thomas Lamprecht
2022-01-11  7:40   ` Dominik Csapak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal