public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Lukas Wagner <l.wagner@proxmox.com>
To: Max Carrara <m.carrara@proxmox.com>,
	Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Subject: Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
Date: Tue, 22 Aug 2023 13:33:44 +0200	[thread overview]
Message-ID: <8aaefb26-be4b-477c-b6a7-9e2fbe598215@proxmox.com> (raw)
In-Reply-To: <cbc11250-6646-85c9-9a5e-b8f629ecd8a7@proxmox.com>

Thanks for the review! Comments inline.

On 8/22/23 12:08, Max Carrara wrote:
> On 8/21/23 15:44, Lukas Wagner wrote:
>> For now, it contains a file-backed cache with expiration logic.
>> The cache should be safe to be accessed from multiple processes at
>> once.
>>
> 
> This seems pretty neat! The cache implementation seems straightforward
> enough. I'll see if I can test it more thoroughly later.
> 
> However, in my opinion we should have a crate like
> "proxmox-collections" (or something of the sort) with modules for each
> data structure / collection similar to the standard library; I'm
> curious what others think about that. imo it would be a great
> opportunity to introduce that crate in this series, since you're
> already introducing one for the cache anyway.
> 
> So, proxmox-collections would look something like this:
> 
>    proxmox-collections
>    └── src
>        ├── cache
>        │   ├── mod.rs
>        │   └── shared_cache.rs
>        └── lib.rs
> 
> Let me know what you think!
> 

I guess this would sense. Not sure if this will gain any other data 
structures soon, but I think going in that direction makes sense.


(...)

>> +    ///
>> +    /// Expired entries will *not* be returned.
>> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
>> +}
> 
> I don't necessarily think that a trait would be necessary in this
> case, as there's not really any other structure (that can be used as
> caching mechanism) that you're abstracting over. (more below)
> 

Yes, you are right. Clear case of premature optimi... refactoring ;)

>> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
>> new file mode 100644
>> index 0000000..be6212c
>> --- /dev/null
>> +++ b/proxmox-cache/src/shared_cache.rs
>> @@ -0,0 +1,263 @@
>> +use std::path::{Path, PathBuf};
>> +
>> +use anyhow::{bail, Error};
>> +use serde::{Deserialize, Serialize};
>> +use serde_json::Value;
>> +
>> +use proxmox_schema::api_types::SAFE_ID_FORMAT;
>> +use proxmox_sys::fs::CreateOptions;
>> +
>> +use crate::{Cache, DefaultTimeProvider, TimeProvider};
>> +
>> +/// A simple, file-backed cache that can be used from multiple processes concurrently.
>> +///
>> +/// Cache entries are stored as individual files inside a base directory. For instance,
>> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
>> +/// the base directory. As the extension implies, the cached data will be stored as a JSON
>> +/// string.
>> +///
>> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
>> +///
>> +/// ## Key Space
>> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
>> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
>> +///
>> +/// ## Concurrency
>> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
>> +/// multiple processes at the same time is safe.
>> +///
>> +/// ## Performance
>> +/// On a tmpfs:
>> +/// ```sh
>> +///   $ cargo run --release --example=performance
>> +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
>> +///   getting 100000 keys took 584.874842ms (5.848µs per key)
>> +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
>> +///
>> +/// Inserting/getting large objects might of course result in lower performance due to the cost
>> +/// of serialization.
>> +/// ```
>> +///
>> +pub struct SharedCache {
>> +    base_path: PathBuf,
>> +    time_provider: Box<dyn TimeProvider>,
>> +    create_options: CreateOptions,
>> +}
> 
> Instead, this should be generic:
> 
> pub struct SharedCache<K, V> { ... }

True, I could: K: AsRef<str> and V: Serialize + Deserialize

But yeah, as this is just an RFC to get feedback for the whole concept,
some of the implementation details are not completely fleshed out, 
partly on purpose and partly due to oversight.

> 
> .. and maybe rename it to SharedFileCache to make it explicit that this
> operates on a file. (but that's more dependent on one's taste tbh)
> 
Actually I originally named it `SharedFileCache`, but then changed it to 
changed it to `SharedCache`, because the former sounds a bit like it 
caches *files* rather than values - at least in my head.

(...)
> ... can be replaced as follows, in order to make it similar to
> std::collections::{HashMap, BTreeMap}:
> 
> impl<K: AsRef<str>> for SharedCache<K, Value> {
>      // Returns old value on successful insert, if given
>      fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn get(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn remove(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> }
> 
> If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
> be added as well, such as remove_entry, retain, clear, etc.
> 

I don't have any hard feelings regarding the naming, but not returning a 
Value from `delete` was a conscious decision - we simply don't need it 
right now. I don't want to deserialize to just throw away the value.
Also, reading *and* deleting at the same time *might* introduce the need 
for file locking - although I'm not completely sure about that yet.

If we ever need a `remove` that also returns the value, we could just 
introduce a second method, e.g. `take`.
> 
>> +
>> +impl SharedCache {
>> +    pub fn new<P: AsRef<Path>>(base_path: P, options: CreateOptions) -> Result<Self, Error> {
>> +        proxmox_sys::fs::create_path(
>> +            base_path.as_ref(),
>> +            Some(options.clone()),
>> +            Some(options.clone()),
>> +        )?;
>> +
>> +        Ok(SharedCache {
>> +            base_path: base_path.as_ref().to_owned(),
>> +            time_provider: Box::new(DefaultTimeProvider),
>> +            create_options: options,
>> +        })
>> +    }
>> +
>> +    fn enforce_safe_key(key: &str) -> Result<(), Error> {
>> +        let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format();
>> +        if safe_id_regex.is_match(key) {
>> +            Ok(())
>> +        } else {
>> +            bail!("invalid key format")
>> +        }
>> +    }
>> +
>> +    fn get_path_for_key(&self, key: &str) -> Result<PathBuf, Error> {
>> +        Self::enforce_safe_key(key)?;
>> +        let mut path = self.base_path.join(key);
>> +        path.set_extension("json");
>> +        Ok(path)
>> +    }
>> +}
>> +
>> +#[derive(Debug, Clone, Serialize, Deserialize)]
>> +struct CachedItem {
>> +    value: Value,
>> +    added_at: i64,
>> +    expires_in: Option<i64>,
>> +}
>> +
> 
> ... and for completion's sake: This can stay, as it's specific to the
> alternative implementation I've written above.
> 
> All in all, I think this would make your implementation more flexible.
> Let me know what you think!
> 


-- 
- Lukas




  reply	other threads:[~2023-08-22 11:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
2023-08-30 15:38   ` [pve-devel] applied: " Thomas Lamprecht
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir Lukas Wagner
2023-08-22  8:39   ` Wolfgang Bumiller
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 3/7] sys: fs: remove unnecessary clippy allow directive Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' Lukas Wagner
2023-08-22 10:08   ` Max Carrara
2023-08-22 11:33     ` Lukas Wagner [this message]
2023-08-22 12:01       ` Wolfgang Bumiller
2023-08-22 11:56     ` Wolfgang Bumiller
2023-08-22 13:52       ` Max Carrara
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 5/7] cache: add debian packaging Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox-perl-rs 6/7] cache: add bindings for `SharedCache` from `proxmox-cache` Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status Lukas Wagner
2023-08-22  8:51   ` Lukas Wagner
2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
2023-08-22 11:25   ` Wolfgang Bumiller
2023-08-30 17:07   ` Wolf Noble

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8aaefb26-be4b-477c-b6a7-9e2fbe598215@proxmox.com \
    --to=l.wagner@proxmox.com \
    --cc=m.carrara@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal