public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Max Carrara <m.carrara@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox] proxmox-io: fix `sparse_copy` not copying sparsely on irregular read operations
Date: Mon, 17 Jul 2023 16:19:40 +0200	[thread overview]
Message-ID: <20230717141940.1320395-1-m.carrara@proxmox.com> (raw)

In the uncommon circumstance that calls to `read()` end up reading any number of
bytes other than 4096, the subsequently read bytes become misaligned, causing
blocks of zeroes to be written unnecessarily.

To illustrate, imagine you have a 12KiB file:

  [x][x][x][x][ ][ ][ ][ ][x][x][x][x]
   └──4096──┘  └──4096──┘  └──4096──┘

The first and last block are filled with some data, whereas the middle block is
empty and will therefore result in only zeroes being read.

In order for the empty block to be skipped with `seek()`, the entire buffer has
to be filled with zeroes.

If, for example, the first `read()` ends up putting only 3KiB into the buffer,
the empty block in the middle won't be detected properly, as the buffer will
now always contain some data. What results are four misaligned reads:

  [x][x][x][x][ ][ ][ ][ ][x][x][x][x]
   ├─────┘  ├────────┘  ├────────┘  │
   1        2           3           4


This is fixed by ensuring chunks of 4KiB are always read into the buffer,
except when the last block is truncated. In order to prevent frequent small
reads, the incoming reader is also buffered via `io::BufReader`.

Signed-off-by: Max Carrara <m.carrara@proxmox.com>
---
 proxmox-io/src/sparse_copy.rs | 43 +++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/proxmox-io/src/sparse_copy.rs b/proxmox-io/src/sparse_copy.rs
index 94035de..9d09e6a 100644
--- a/proxmox-io/src/sparse_copy.rs
+++ b/proxmox-io/src/sparse_copy.rs
@@ -1,7 +1,13 @@
-use std::io::{self, ErrorKind, Read, Seek, SeekFrom, Write};
+use std::io::{self, BufReader, BufWriter, ErrorKind, Read, Seek, SeekFrom, Write};
 
 #[cfg(feature = "tokio")]
-use tokio::io::{AsyncRead, AsyncSeek, AsyncSeekExt, AsyncWrite, AsyncWriteExt};
+use tokio::io::{
+    AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt, AsyncWrite, AsyncWriteExt,
+    BufReader as AsyncBufReader, BufWriter as AsyncBufWriter,
+};
+
+const CHUNK_SIZE: usize = 4096;
+const BUF_SIZE: usize = 8192;
 
 /// Efficiently check whether a byte slice contains only zeroes.
 ///
@@ -45,19 +51,27 @@ pub fn sparse_copy<R: Read + ?Sized, W: Write + Seek + ?Sized>(
     reader: &mut R,
     writer: &mut W,
 ) -> Result<SparseCopyResult, io::Error> {
-    let mut buf = crate::byte_buffer::ByteBuffer::with_capacity(4096);
+    let mut reader = BufReader::with_capacity(BUF_SIZE, reader);
+    let mut writer = BufWriter::with_capacity(BUF_SIZE, writer);
+
+    let mut buf: Vec<u8> = crate::vec::undefined(CHUNK_SIZE);
     let mut written = 0;
     let mut seek_amount: i64 = 0;
     let mut seeked_last = false;
+
     loop {
         buf.clear();
-        let len = match buf.read_from(reader) {
+        let len = match reader
+            .by_ref()
+            .take(CHUNK_SIZE as u64)
+            .read_to_end(&mut buf)
+        {
             Ok(len) => len,
             Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
             Err(e) => return Err(e),
         };
 
-        if len > 0 && buffer_is_zero(&buf[..]) {
+        if len > 0 && buffer_is_zero(&buf[..len]) {
             seek_amount += len as i64;
             continue;
         }
@@ -70,7 +84,7 @@ pub fn sparse_copy<R: Read + ?Sized, W: Write + Seek + ?Sized>(
         }
 
         if len > 0 {
-            writer.write_all(&buf[..])?;
+            writer.write_all(&buf[..len])?;
             written += len as u64;
             seeked_last = false;
         } else {
@@ -112,19 +126,26 @@ where
     R: AsyncRead + Unpin,
     W: AsyncWrite + AsyncSeek + Unpin,
 {
-    let mut buf = crate::byte_buffer::ByteBuffer::with_capacity(4096);
+    let mut reader = AsyncBufReader::with_capacity(BUF_SIZE, reader);
+    let mut writer = AsyncBufWriter::with_capacity(BUF_SIZE, writer);
+
+    let mut buf: Vec<u8> = crate::vec::undefined(CHUNK_SIZE);
     let mut written = 0;
     let mut seek_amount: i64 = 0;
     let mut seeked_last = false;
     loop {
         buf.clear();
-        let len = match buf.read_from_async(reader).await {
+        let len = match (&mut reader)
+            .take(CHUNK_SIZE as u64)
+            .read_to_end(&mut buf)
+            .await
+        {
             Ok(len) => len,
             Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
             Err(e) => return Err(e),
         };
 
-        if len > 0 && buffer_is_zero(&buf[..]) {
+        if len > 0 && buffer_is_zero(&buf[..len]) {
             seek_amount += len as i64;
             continue;
         }
@@ -137,10 +158,12 @@ where
         }
 
         if len > 0 {
-            writer.write_all(&buf[..]).await?;
+            writer.write_all(&buf[..len]).await?;
             written += len as u64;
             seeked_last = false;
         } else {
+            // buffer must be flushed before it goes out of scope
+            writer.flush().await?;
             return Ok(SparseCopyResult {
                 written,
                 seeked_last,
-- 
2.39.2





             reply	other threads:[~2023-07-17 14:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-17 14:19 Max Carrara [this message]
2023-08-09 11:07 ` [pbs-devel] applied: " Wolfgang Bumiller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230717141940.1320395-1-m.carrara@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal