From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <w.bumiller@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 4CD3575058;
 Fri,  4 Jun 2021 14:22:41 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 3DE3C1C96F;
 Fri,  4 Jun 2021 14:22:11 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id 1A70E1C964;
 Fri,  4 Jun 2021 14:22:07 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id E01E646717;
 Fri,  4 Jun 2021 14:22:06 +0200 (CEST)
Date: Fri, 4 Jun 2021 14:22:05 +0200
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Stefan Reiter <s.reiter@proxmox.com>
Cc: pve-devel@lists.proxmox.com, pbs-devel@lists.proxmox.com
Message-ID: <20210604122205.ivembuqsue7osqah@olga.proxmox.com>
References: <20210602143833.4423-1-s.reiter@proxmox.com>
 <20210602143833.4423-4-s.reiter@proxmox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20210602143833.4423-4-s.reiter@proxmox.com>
User-Agent: NeoMutt/20180716
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.016 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [backup.rs]
Subject: Re: [pve-devel] [pbs-devel] [PATCH proxmox-backup 3/9] backup: add
 CachedChunkReader utilizing AsyncLruCache
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 04 Jun 2021 12:22:41 -0000

On Wed, Jun 02, 2021 at 04:38:27PM +0200, Stefan Reiter wrote:
> Provides a fast cache read implementation with full async and
> concurrency support.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> This is technically all that's needed for proxmox-backup-qemu to build and
> function as intended, but I decided to also use this IMHO cleaner implementation
> to replace the AsyncIndexReader with the following patches.
> 
>  src/backup.rs                     |  3 ++
>  src/backup/cached_chunk_reader.rs | 87 +++++++++++++++++++++++++++++++
>  2 files changed, 90 insertions(+)
>  create mode 100644 src/backup/cached_chunk_reader.rs
> 
> diff --git a/src/backup.rs b/src/backup.rs
> index ae937be0..5e1147b4 100644
> --- a/src/backup.rs
> +++ b/src/backup.rs
> @@ -259,3 +259,6 @@ pub use catalog_shell::*;
>  
>  mod async_index_reader;
>  pub use async_index_reader::*;
> +
> +mod cached_chunk_reader;
> +pub use cached_chunk_reader::*;
> diff --git a/src/backup/cached_chunk_reader.rs b/src/backup/cached_chunk_reader.rs
> new file mode 100644
> index 00000000..fd5a049f
> --- /dev/null
> +++ b/src/backup/cached_chunk_reader.rs
> @@ -0,0 +1,87 @@
> +//! An async and concurrency safe data reader backed by a local LRU cache.
> +
> +use anyhow::Error;
> +
> +use std::future::Future;
> +use std::sync::Arc;
> +
> +use crate::backup::{AsyncReadChunk, IndexFile};
> +use crate::tools::async_lru_cache::{AsyncCacher, AsyncLruCache};
> +
> +struct AsyncChunkCacher<T> {
> +    reader: Arc<T>,
> +}
> +
> +impl<T: AsyncReadChunk + Send + Sync + 'static> AsyncCacher<[u8; 32], Arc<Vec<u8>>>
> +    for AsyncChunkCacher<T>
> +{
> +    fn fetch(
> +        &self,
> +        key: [u8; 32],
> +    ) -> Box<dyn Future<Output = Result<Option<Arc<Vec<u8>>>, Error>> + Send> {
> +        let reader = Arc::clone(&self.reader);
> +        Box::new(async move {
> +            AsyncReadChunk::read_chunk(reader.as_ref(), &key)
> +                .await
> +                .map(|x| Some(Arc::new(x)))
> +        })
> +    }
> +}
> +
> +/// Represents an AsyncLruCache used for storing data chunks.
> +pub type ChunkCache = Arc<AsyncLruCache<[u8; 32], Arc<Vec<u8>>>>;

Given that you now use this type in an external crate but still have to
*instantiate* it via `Arc::new(AsyncLruCache::new(...))` I wonder if
this should just be a struct prividing a `new`, `clone` and
`Deref<Target = AsyncLruCache>`?

> +
> +/// Allows arbitrary data reads from an Index via an AsyncReadChunk implementation, using an LRU
> +/// cache internally to cache chunks and provide support for multiple concurrent reads (potentially
> +/// to the same chunk).
> +pub struct CachedChunkReader<I: IndexFile, R: AsyncReadChunk + Send + Sync + 'static> {
> +    cache: ChunkCache,
> +    cacher: AsyncChunkCacher<R>,
> +    index: I,
> +}
> +
> +impl<I: IndexFile, R: AsyncReadChunk + Send + Sync + 'static> CachedChunkReader<I, R> {
> +    /// Create a new reader with a local LRU cache containing 'capacity' chunks.
> +    pub fn new(reader: R, index: I, capacity: usize) -> Self {
> +        let cache = Arc::new(AsyncLruCache::new(capacity));
> +        Self::new_with_cache(reader, index, cache)
> +    }
> +
> +    /// Create a new reader with a custom LRU cache. Use this to share a cache between multiple
> +    /// readers.
> +    pub fn new_with_cache(reader: R, index: I, cache: ChunkCache) -> Self {
> +        Self {
> +            cache,
> +            cacher: AsyncChunkCacher {
> +                reader: Arc::new(reader),
> +            },
> +            index,
> +        }
> +    }
> +
> +    /// Read data at a given byte offset into a variable size buffer. Returns the amount of bytes
> +    /// read, which will always be the size of the buffer except when reaching EOF.
> +    pub async fn read_at(&self, buf: &mut [u8], offset: u64) -> Result<usize, Error> {
> +        let size = buf.len();
> +        let mut read: usize = 0;
> +        while read < size {
> +            let cur_offset = offset + read as u64;
> +            if let Some(chunk) = self.index.chunk_from_offset(cur_offset) {
> +                let info = self.index.chunk_info(chunk.0).unwrap();
> +
> +                // will never be None, see AsyncChunkCacher

Why comment on this unwrap but not on the one above? ;-)

> +                let data = self.cache.access(info.digest, &self.cacher).await?.unwrap();
> +
> +                let want_bytes = ((info.range.end - cur_offset) as usize).min(size - read);
> +                let slice = &mut buf[read..(read + want_bytes)];
> +                let intra_chunk = chunk.1 as usize;
> +                slice.copy_from_slice(&data[intra_chunk..(intra_chunk + want_bytes)]);
> +                read += want_bytes;
> +            } else {
> +                // EOF
> +                break;
> +            }
> +        }
> +        Ok(read)
> +    }
> +}
> -- 
> 2.30.2