From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 515E289BE for ; Tue, 22 Aug 2023 13:34:17 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 29FA9BA91 for ; Tue, 22 Aug 2023 13:33:47 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 22 Aug 2023 13:33:46 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1248F432F1 for ; Tue, 22 Aug 2023 13:33:46 +0200 (CEST) Message-ID: <8aaefb26-be4b-477c-b6a7-9e2fbe598215@proxmox.com> Date: Tue, 22 Aug 2023 13:33:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: de-AT, en-US To: Max Carrara , Proxmox VE development discussion References: <20230821134444.620021-1-l.wagner@proxmox.com> <20230821134444.620021-5-l.wagner@proxmox.com> From: Lukas Wagner In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.038 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [lib.rs, mod.rs] Subject: Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Aug 2023 11:34:17 -0000 Thanks for the review! Comments inline. On 8/22/23 12:08, Max Carrara wrote: > On 8/21/23 15:44, Lukas Wagner wrote: >> For now, it contains a file-backed cache with expiration logic. >> The cache should be safe to be accessed from multiple processes at >> once. >> > > This seems pretty neat! The cache implementation seems straightforward > enough. I'll see if I can test it more thoroughly later. > > However, in my opinion we should have a crate like > "proxmox-collections" (or something of the sort) with modules for each > data structure / collection similar to the standard library; I'm > curious what others think about that. imo it would be a great > opportunity to introduce that crate in this series, since you're > already introducing one for the cache anyway. > > So, proxmox-collections would look something like this: > > proxmox-collections > └── src > ├── cache > │   ├── mod.rs > │   └── shared_cache.rs > └── lib.rs > > Let me know what you think! > I guess this would sense. Not sure if this will gain any other data structures soon, but I think going in that direction makes sense. (...) >> + /// >> + /// Expired entries will *not* be returned. >> + fn get>(&self, key: S) -> Result, Error>; >> +} > > I don't necessarily think that a trait would be necessary in this > case, as there's not really any other structure (that can be used as > caching mechanism) that you're abstracting over. (more below) > Yes, you are right. Clear case of premature optimi... refactoring ;) >> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs >> new file mode 100644 >> index 0000000..be6212c >> --- /dev/null >> +++ b/proxmox-cache/src/shared_cache.rs >> @@ -0,0 +1,263 @@ >> +use std::path::{Path, PathBuf}; >> + >> +use anyhow::{bail, Error}; >> +use serde::{Deserialize, Serialize}; >> +use serde_json::Value; >> + >> +use proxmox_schema::api_types::SAFE_ID_FORMAT; >> +use proxmox_sys::fs::CreateOptions; >> + >> +use crate::{Cache, DefaultTimeProvider, TimeProvider}; >> + >> +/// A simple, file-backed cache that can be used from multiple processes concurrently. >> +/// >> +/// Cache entries are stored as individual files inside a base directory. For instance, >> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside >> +/// the base directory. As the extension implies, the cached data will be stored as a JSON >> +/// string. >> +/// >> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`. >> +/// >> +/// ## Key Space >> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the >> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*` >> +/// >> +/// ## Concurrency >> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from >> +/// multiple processes at the same time is safe. >> +/// >> +/// ## Performance >> +/// On a tmpfs: >> +/// ```sh >> +/// $ cargo run --release --example=performance >> +/// inserting 100000 keys took 896.609758ms (8.966µs per key) >> +/// getting 100000 keys took 584.874842ms (5.848µs per key) >> +/// deleting 100000 keys took 247.742702ms (2.477µs per key) >> +/// >> +/// Inserting/getting large objects might of course result in lower performance due to the cost >> +/// of serialization. >> +/// ``` >> +/// >> +pub struct SharedCache { >> + base_path: PathBuf, >> + time_provider: Box, >> + create_options: CreateOptions, >> +} > > Instead, this should be generic: > > pub struct SharedCache { ... } True, I could: K: AsRef and V: Serialize + Deserialize But yeah, as this is just an RFC to get feedback for the whole concept, some of the implementation details are not completely fleshed out, partly on purpose and partly due to oversight. > > .. and maybe rename it to SharedFileCache to make it explicit that this > operates on a file. (but that's more dependent on one's taste tbh) > Actually I originally named it `SharedFileCache`, but then changed it to changed it to `SharedCache`, because the former sounds a bit like it caches *files* rather than values - at least in my head. (...) > ... can be replaced as follows, in order to make it similar to > std::collections::{HashMap, BTreeMap}: > > impl> for SharedCache { > // Returns old value on successful insert, if given > fn insert(&self, k: K, v: Value) -> Result, Error> { > // ... > } > > fn get(&self, k: K) -> Result, Error> { > // ... > } > > fn remove(&self, k: K) -> Result, Error> { > // ... > } > } > > If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can > be added as well, such as remove_entry, retain, clear, etc. > I don't have any hard feelings regarding the naming, but not returning a Value from `delete` was a conscious decision - we simply don't need it right now. I don't want to deserialize to just throw away the value. Also, reading *and* deleting at the same time *might* introduce the need for file locking - although I'm not completely sure about that yet. If we ever need a `remove` that also returns the value, we could just introduce a second method, e.g. `take`. > >> + >> +impl SharedCache { >> + pub fn new>(base_path: P, options: CreateOptions) -> Result { >> + proxmox_sys::fs::create_path( >> + base_path.as_ref(), >> + Some(options.clone()), >> + Some(options.clone()), >> + )?; >> + >> + Ok(SharedCache { >> + base_path: base_path.as_ref().to_owned(), >> + time_provider: Box::new(DefaultTimeProvider), >> + create_options: options, >> + }) >> + } >> + >> + fn enforce_safe_key(key: &str) -> Result<(), Error> { >> + let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format(); >> + if safe_id_regex.is_match(key) { >> + Ok(()) >> + } else { >> + bail!("invalid key format") >> + } >> + } >> + >> + fn get_path_for_key(&self, key: &str) -> Result { >> + Self::enforce_safe_key(key)?; >> + let mut path = self.base_path.join(key); >> + path.set_extension("json"); >> + Ok(path) >> + } >> +} >> + >> +#[derive(Debug, Clone, Serialize, Deserialize)] >> +struct CachedItem { >> + value: Value, >> + added_at: i64, >> + expires_in: Option, >> +} >> + > > ... and for completion's sake: This can stay, as it's specific to the > alternative implementation I've written above. > > All in all, I think this would make your implementation more flexible. > Let me know what you think! > -- - Lukas