public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls
@ 2023-08-21 13:44 Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

This patch series introduces a caching mechanism for expensive status
update calls made from pvestatd.

As a first example for this RFC I introduced the cache to the arguably
most expensive call, namely `storage_info` from pve-storage. Instead
of caching the results of the `storage_info` function as a whole, we
only cache the results from status updates from individual storage
plugins.

Regarding the cache implementation: I really tried it keep it as simple
as possible. Cached values are not queried terribly often and are
rather small, so I ended up with with simply creating a json-file per
cache key in a  directory in tmpfs. Benefits of this approach is that
it does not need any locking mechanisms (files are replaced atomically).
Also it is easily debugable/introspectable.

Open questions:
  - not sure what a good expiration time for cached entries is. For
    now I picked 30s, but there was not much thought behind that value.

The first three patches for `proxmox` are purely preparatory and cleanup.



proxmox:

Lukas Wagner (5):
  sys: fs: move tests to a sub-module
  sys: add make_tmp_dir
  sys: fs: remove unnecessary clippy allow directive
  cache: add new crate 'proxmox-cache'
  cache: add debian packaging

 Cargo.toml                            |   1 +
 proxmox-cache/Cargo.toml              |  20 ++
 proxmox-cache/debian/changelog        |   5 +
 proxmox-cache/debian/control          |  47 +++++
 proxmox-cache/debian/copyright        |  18 ++
 proxmox-cache/debian/debcargo.toml    |   7 +
 proxmox-cache/examples/performance.rs |  82 ++++++++
 proxmox-cache/src/lib.rs              |  40 ++++
 proxmox-cache/src/shared_cache.rs     | 263 ++++++++++++++++++++++++++
 proxmox-sys/src/fs/dir.rs             | 106 +++++++++--
 proxmox-sys/src/fs/mod.rs             |   2 -
 11 files changed, 571 insertions(+), 20 deletions(-)
 create mode 100644 proxmox-cache/Cargo.toml
 create mode 100644 proxmox-cache/debian/changelog
 create mode 100644 proxmox-cache/debian/control
 create mode 100644 proxmox-cache/debian/copyright
 create mode 100644 proxmox-cache/debian/debcargo.toml
 create mode 100644 proxmox-cache/examples/performance.rs
 create mode 100644 proxmox-cache/src/lib.rs
 create mode 100644 proxmox-cache/src/shared_cache.rs


proxmox-perl-rs:

Lukas Wagner (1):
  cache: add bindings for `SharedCache` from `proxmox-cache`

 common/pkg/Makefile |  1 +
 common/src/cache.rs | 59 +++++++++++++++++++++++++++++++++++++++++++++
 common/src/mod.rs   |  1 +
 pve-rs/Cargo.toml   |  5 ++++
 pve-rs/src/lib.rs   |  1 +
 5 files changed, 67 insertions(+)
 create mode 100644 common/src/cache.rs


pve-storage:

Lukas Wagner (1):
  stats: api: cache storage plugin status

 src/PVE/API2/Storage/Config.pm | 10 +++++++++
 src/PVE/Storage.pm             | 40 ++++++++++++++++++++++++----------
 2 files changed, 39 insertions(+), 11 deletions(-)


Summary over all repositories:
  18 files changed, 677 insertions(+), 31 deletions(-)

-- 
murpp v0.4.0





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-30 15:38   ` [pve-devel] applied: " Thomas Lamprecht
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir Lukas Wagner
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

This ensures that test code is not compiled in regular builds

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 proxmox-sys/src/fs/dir.rs | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/proxmox-sys/src/fs/dir.rs b/proxmox-sys/src/fs/dir.rs
index bdef85e..6aee316 100644
--- a/proxmox-sys/src/fs/dir.rs
+++ b/proxmox-sys/src/fs/dir.rs
@@ -152,16 +152,21 @@ fn create_path_at_do(
     }
 }
 
-#[test]
-fn test_create_path() {
-    create_path(
-        "testdir/testsub/testsub2/testfinal",
-        Some(CreateOptions::new().perm(stat::Mode::from_bits_truncate(0o755))),
-        Some(
-            CreateOptions::new()
-                .owner(nix::unistd::Uid::effective())
-                .group(nix::unistd::Gid::effective()),
-        ),
-    )
-    .expect("expected create_path to work");
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_create_path() {
+        create_path(
+            "testdir/testsub/testsub2/testfinal",
+            Some(CreateOptions::new().perm(stat::Mode::from_bits_truncate(0o755))),
+            Some(
+                CreateOptions::new()
+                    .owner(nix::unistd::Uid::effective())
+                    .group(nix::unistd::Gid::effective()),
+            ),
+        )
+        .expect("expected create_path to work");
+    }
 }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-22  8:39   ` Wolfgang Bumiller
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 3/7] sys: fs: remove unnecessary clippy allow directive Lukas Wagner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

Under the hood, this function calls `mkdtemp` from libc. Unfortunatly
the nix crate did not provide bindings for this function, so we have
to call into libc directly.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 proxmox-sys/src/fs/dir.rs | 73 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 71 insertions(+), 2 deletions(-)

diff --git a/proxmox-sys/src/fs/dir.rs b/proxmox-sys/src/fs/dir.rs
index 6aee316..72bf1ad 100644
--- a/proxmox-sys/src/fs/dir.rs
+++ b/proxmox-sys/src/fs/dir.rs
@@ -1,6 +1,8 @@
-use std::ffi::CStr;
+use std::ffi::{CStr, CString, OsStr};
+use std::fs::File;
+use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, OwnedFd};
-use std::path::Path;
+use std::path::{Path, PathBuf};
 
 use anyhow::{bail, Error};
 use nix::errno::Errno;
@@ -8,6 +10,8 @@ use nix::fcntl::OFlag;
 use nix::sys::stat;
 use nix::unistd;
 
+use proxmox_lang::try_block;
+
 use crate::fs::{fchown, CreateOptions};
 
 /// Creates directory at the provided path with specified ownership.
@@ -152,6 +156,54 @@ fn create_path_at_do(
     }
 }
 
+///  Create a temporary directory.
+///
+/// `prefix` determines where the temporary directory will be created. For instance, if
+/// `prefix` is `/tmp`, on success the function will return a path in the style of
+/// `/tmp/tmp_XXXXXX`, where X stands for a random string, ensuring that the path is unique.
+///
+/// By default, the created directory has `0o700` permissions. If this is not desired, custom
+/// [`CreateOptions`] can be passed via the `option` parameter.
+pub fn make_tmp_dir<P: AsRef<Path>>(
+    prefix: P,
+    options: Option<CreateOptions>,
+) -> Result<PathBuf, Error> {
+    let mut template = prefix.as_ref().to_owned();
+    template = template.join("tmp_XXXXXX");
+    let template = CString::new(template.into_os_string().as_bytes())?;
+
+    let raw_template_buffer = template.into_raw();
+
+    let path = unsafe {
+        let raw_returned_buffer = libc::mkdtemp(raw_template_buffer);
+        if raw_returned_buffer.is_null() {
+            // The returned pointer points to the same buffer, so in case
+            // of an error we need to make sure to claim it back to that
+            // it is freed properly.
+            drop(CString::from_raw(raw_template_buffer));
+            return Err(std::io::Error::last_os_error().into());
+        }
+        CString::from_raw(raw_returned_buffer)
+    };
+
+    let path = OsStr::from_bytes(path.as_bytes());
+    let path = PathBuf::from(path);
+
+    if let Some(options) = options {
+        if let Err(err) = try_block!({
+            let fd = crate::fd::open(&path, OFlag::O_DIRECTORY, stat::Mode::empty())?;
+            let mut file = File::from(fd);
+            options.apply_to(&mut file, &path)?;
+            Ok::<(), Error>(())
+        }) {
+            let _ = unistd::unlink(&path);
+            bail!("could not apply create options to new temporary directory: {err}");
+        }
+    }
+
+    Ok(path)
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -169,4 +221,21 @@ mod tests {
         )
         .expect("expected create_path to work");
     }
+
+    #[test]
+    fn test_make_tmp_dir() -> Result<(), Error> {
+        let options = CreateOptions::new()
+            .owner(unistd::Uid::effective())
+            .group(unistd::Gid::effective())
+            .perm(stat::Mode::from_bits_truncate(0o755));
+
+        let path = make_tmp_dir("/tmp", Some(options))?;
+
+        assert!(path.exists());
+        assert!(path.is_dir());
+
+        std::fs::remove_dir_all(&path)?;
+
+        Ok(())
+    }
 }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox 3/7] sys: fs: remove unnecessary clippy allow directive
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' Lukas Wagner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

It seems like the mentioned clippy bug has since been fixed.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 proxmox-sys/src/fs/dir.rs | 4 ----
 proxmox-sys/src/fs/mod.rs | 2 --
 2 files changed, 6 deletions(-)

diff --git a/proxmox-sys/src/fs/dir.rs b/proxmox-sys/src/fs/dir.rs
index 72bf1ad..0c1d151 100644
--- a/proxmox-sys/src/fs/dir.rs
+++ b/proxmox-sys/src/fs/dir.rs
@@ -18,8 +18,6 @@ use crate::fs::{fchown, CreateOptions};
 ///
 /// Errors if the directory already exists.
 pub fn create_dir<P: AsRef<Path>>(path: P, options: CreateOptions) -> Result<(), nix::Error> {
-    // clippy bug?: from_bits_truncate is actually a const fn...
-    #[allow(clippy::or_fun_call)]
     let mode: stat::Mode = options
         .perm
         .unwrap_or(stat::Mode::from_bits_truncate(0o770));
@@ -130,8 +128,6 @@ fn create_path_at_do(
                     final_opts.as_ref()
                 };
 
-                // clippy bug?: from_bits_truncate is actually a const fn...
-                #[allow(clippy::or_fun_call)]
                 let mode = opts
                     .and_then(|o| o.perm)
                     .unwrap_or(stat::Mode::from_bits_truncate(0o755));
diff --git a/proxmox-sys/src/fs/mod.rs b/proxmox-sys/src/fs/mod.rs
index 8fb677c..ae54d78 100644
--- a/proxmox-sys/src/fs/mod.rs
+++ b/proxmox-sys/src/fs/mod.rs
@@ -69,8 +69,6 @@ impl CreateOptions {
     }
 
     pub fn apply_to(&self, file: &mut File, path: &Path) -> Result<(), Error> {
-        // clippy bug?: from_bits_truncate is actually a const fn...
-        #[allow(clippy::or_fun_call)]
         let mode: stat::Mode = self.perm.unwrap_or(stat::Mode::from_bits_truncate(0o644));
 
         if let Err(err) = stat::fchmod(file.as_raw_fd(), mode) {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
                   ` (2 preceding siblings ...)
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 3/7] sys: fs: remove unnecessary clippy allow directive Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-22 10:08   ` Max Carrara
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 5/7] cache: add debian packaging Lukas Wagner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

For now, it contains a file-backed cache with expiration logic.
The cache should be safe to be accessed from multiple processes at
once.

The cache stores values in a directory, based on the key.
E.g. key "foo" results in a file 'foo.json' in the given base
directory. If a new value is set, the file is atomically replaced.
The JSON file also contains some metadata, namely 'added_at' and
'expire_in' - they are used for cache expiration.

Note: This cache is not suited to applications that
 - Might want to cache huge amounts of data, and/or access the cache
   very frequently (due to the overhead of JSON de/serialization)
 - Require arbitrary keys - right now, keys are limited by
   SAFE_ID_REGEX

The cache was developed for the use in pvestatd, in order to cache
e.g. storage plugin status. There, these limitations do not really
play any role.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 Cargo.toml                            |   1 +
 proxmox-cache/Cargo.toml              |  20 ++
 proxmox-cache/examples/performance.rs |  82 ++++++++
 proxmox-cache/src/lib.rs              |  40 ++++
 proxmox-cache/src/shared_cache.rs     | 263 ++++++++++++++++++++++++++
 5 files changed, 406 insertions(+)
 create mode 100644 proxmox-cache/Cargo.toml
 create mode 100644 proxmox-cache/examples/performance.rs
 create mode 100644 proxmox-cache/src/lib.rs
 create mode 100644 proxmox-cache/src/shared_cache.rs

diff --git a/Cargo.toml b/Cargo.toml
index e334ac1..940e1d0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -5,6 +5,7 @@ members = [
     "proxmox-async",
     "proxmox-auth-api",
     "proxmox-borrow",
+    "proxmox-cache",
     "proxmox-client",
     "proxmox-compression",
     "proxmox-http",
diff --git a/proxmox-cache/Cargo.toml b/proxmox-cache/Cargo.toml
new file mode 100644
index 0000000..b20921f
--- /dev/null
+++ b/proxmox-cache/Cargo.toml
@@ -0,0 +1,20 @@
+[package]
+name = "proxmox-cache"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+license.workspace = true
+repository.workspace = true
+exclude.workspace = true
+description = "Cache implementations"
+
+[dependencies]
+anyhow.workspace = true
+proxmox-sys.workspace = true
+proxmox-time.workspace = true
+proxmox-schema = { workspace = true, features = ["api-types"]}
+serde_json.workspace = true
+serde = { workspace = true, features = ["derive"]}
+
+[dev-dependencies]
+nix.workspace = true
diff --git a/proxmox-cache/examples/performance.rs b/proxmox-cache/examples/performance.rs
new file mode 100644
index 0000000..420f61c
--- /dev/null
+++ b/proxmox-cache/examples/performance.rs
@@ -0,0 +1,82 @@
+use proxmox_cache::Cache;
+use proxmox_cache::SharedCache;
+use proxmox_sys::fs::CreateOptions;
+use std::time::Instant;
+
+fn main() {
+    let options = CreateOptions::new()
+        .owner(nix::unistd::Uid::effective())
+        .group(nix::unistd::Gid::effective())
+        .perm(nix::sys::stat::Mode::from_bits_truncate(0o755));
+
+    let cache = SharedCache::new("/tmp/pmx-cache", options).unwrap();
+
+    let mut keys = Vec::new();
+
+    for i in 0..100000 {
+        keys.push(format!("key_{i}"));
+    }
+
+    let data = serde_json::json!({
+        "member1": "foo",
+        "member2": "foo",
+        "member3": "foo",
+        "member4": "foo",
+        "member5": "foo",
+        "member5": "foo",
+        "member6": "foo",
+        "member7": "foo",
+        "member8": "foo",
+        "array": [10, 20, 30, 40, 50],
+        "object": {
+            "member1": "foo",
+            "member2": "foo",
+            "member3": "foo",
+            "member4": "foo",
+            "member5": "foo",
+            "member5": "foo",
+            "member6": "foo",
+            "member7": "foo",
+            "member8": "foo",
+        }
+    });
+
+    let before = Instant::now();
+
+    for key in &keys {
+        cache
+            .set(key, data.clone(), None)
+            .expect("could not insert value");
+    }
+
+    let time = Instant::now() - before;
+    let time_per_op = time / keys.len() as u32;
+    println!(
+        "inserting {len} keys took {time:?} ({time_per_op:?} per key)",
+        len = keys.len(),
+    );
+
+    let before = Instant::now();
+    for key in &keys {
+        let _ = cache.get(key).expect("could not get value");
+    }
+
+    let time = Instant::now() - before;
+    let time_per_op = time / keys.len() as u32;
+    println!(
+        "getting {len} keys took {time:?} ({time_per_op:?} per key)",
+        len = keys.len(),
+    );
+
+    let before = Instant::now();
+    for key in &keys {
+        cache.delete(key).expect("could not delete value");
+    }
+
+    let time = Instant::now() - before;
+    let time_per_op = time / keys.len() as u32;
+    println!(
+        "deleting {len} keys took {time:?} ({time_per_op:?} per key)",
+        len = keys.len(),
+    );
+}
diff --git a/proxmox-cache/src/lib.rs b/proxmox-cache/src/lib.rs
new file mode 100644
index 0000000..d496dc7
--- /dev/null
+++ b/proxmox-cache/src/lib.rs
@@ -0,0 +1,40 @@
+use anyhow::Error;
+use serde_json::Value;
+
+pub mod shared_cache;
+
+pub use shared_cache::SharedCache;
+
+trait TimeProvider {
+    /// Returns the current time as a UNIX epoch (second resolution)
+    fn now(&self) -> i64;
+}
+
+struct DefaultTimeProvider;
+
+impl TimeProvider for DefaultTimeProvider {
+    fn now(&self) -> i64 {
+        proxmox_time::epoch_i64()
+    }
+}
+
+pub trait Cache {
+    /// Set or insert a cache entry.
+    ///
+    /// If `expires_in` is set, this entry will expire in the desired number of
+    /// seconds.
+    fn set<S: AsRef<str>>(
+        &self,
+        key: S,
+        value: Value,
+        expires_in: Option<i64>,
+    ) -> Result<(), Error>;
+
+    /// Delete a cache entry.
+    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error>;
+
+    /// Get a value from the cache.
+    ///
+    /// Expired entries will *not* be returned.
+    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
+}
diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
new file mode 100644
index 0000000..be6212c
--- /dev/null
+++ b/proxmox-cache/src/shared_cache.rs
@@ -0,0 +1,263 @@
+use std::path::{Path, PathBuf};
+
+use anyhow::{bail, Error};
+use serde::{Deserialize, Serialize};
+use serde_json::Value;
+
+use proxmox_schema::api_types::SAFE_ID_FORMAT;
+use proxmox_sys::fs::CreateOptions;
+
+use crate::{Cache, DefaultTimeProvider, TimeProvider};
+
+/// A simple, file-backed cache that can be used from multiple processes concurrently.
+///
+/// Cache entries are stored as individual files inside a base directory. For instance,
+/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
+/// the base directory. As the extension implies, the cached data will be stored as a JSON
+/// string.
+///
+/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
+///
+/// ## Key Space
+/// Due to the fact that cache keys are being directly used as filenames, they have to match the
+/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
+///
+/// ## Concurrency
+/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
+/// multiple processes at the same time is safe.
+///
+/// ## Performance
+/// On a tmpfs:
+/// ```sh
+///   $ cargo run --release --example=performance
+///   inserting 100000 keys took 896.609758ms (8.966µs per key)
+///   getting 100000 keys took 584.874842ms (5.848µs per key)
+///   deleting 100000 keys took 247.742702ms (2.477µs per key)
+///
+/// Inserting/getting large objects might of course result in lower performance due to the cost
+/// of serialization.
+/// ```
+///
+pub struct SharedCache {
+    base_path: PathBuf,
+    time_provider: Box<dyn TimeProvider>,
+    create_options: CreateOptions,
+}
+
+impl Cache for SharedCache {
+    fn set<S: AsRef<str>>(
+        &self,
+        key: S,
+        value: Value,
+        expires_in: Option<i64>,
+    ) -> Result<(), Error> {
+        let path = self.get_path_for_key(key.as_ref())?;
+        let added_at = self.time_provider.now();
+
+        let item = CachedItem {
+            value,
+            added_at,
+            expires_in,
+        };
+
+        let serialized = serde_json::to_vec_pretty(&item)?;
+
+        // Atomically replace file
+        proxmox_sys::fs::replace_file(path, &serialized, self.create_options.clone(), true)?;
+        Ok(())
+    }
+
+    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error> {
+        let path = self.get_path_for_key(key.as_ref())?;
+        std::fs::remove_file(path)?;
+        Ok(())
+    }
+
+    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error> {
+        let path = self.get_path_for_key(key.as_ref())?;
+
+        let value = if let Some(content) = proxmox_sys::fs::file_get_optional_contents(path)? {
+            let value: CachedItem = serde_json::from_slice(&content)?;
+
+            let now = self.time_provider.now();
+
+            if let Some(expires_in) = value.expires_in {
+                // Check if value is not expired yet. Also do not allow
+                // values from the future, in case we have clock jumps
+                if value.added_at + expires_in > now && value.added_at <= now {
+                    Some(value.value)
+                } else {
+                    None
+                }
+            } else {
+                Some(value.value)
+            }
+        } else {
+            None
+        };
+
+        Ok(value)
+    }
+}
+
+impl SharedCache {
+    pub fn new<P: AsRef<Path>>(base_path: P, options: CreateOptions) -> Result<Self, Error> {
+        proxmox_sys::fs::create_path(
+            base_path.as_ref(),
+            Some(options.clone()),
+            Some(options.clone()),
+        )?;
+
+        Ok(SharedCache {
+            base_path: base_path.as_ref().to_owned(),
+            time_provider: Box::new(DefaultTimeProvider),
+            create_options: options,
+        })
+    }
+
+    fn enforce_safe_key(key: &str) -> Result<(), Error> {
+        let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format();
+        if safe_id_regex.is_match(key) {
+            Ok(())
+        } else {
+            bail!("invalid key format")
+        }
+    }
+
+    fn get_path_for_key(&self, key: &str) -> Result<PathBuf, Error> {
+        Self::enforce_safe_key(key)?;
+        let mut path = self.base_path.join(key);
+        path.set_extension("json");
+        Ok(path)
+    }
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+struct CachedItem {
+    value: Value,
+    added_at: i64,
+    expires_in: Option<i64>,
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::cell::Cell;
+    use std::sync::Arc;
+
+    #[test]
+    fn test_basic_set_and_get() {
+        let cache = TestCache::new();
+        cache
+            .cache
+            .set("foo", Value::String("bar".into()), None)
+            .unwrap();
+
+        assert_eq!(
+            cache.cache.get("foo").unwrap(),
+            Some(Value::String("bar".into()))
+        );
+        assert!(cache.cache.get("notthere").unwrap().is_none());
+    }
+
+    #[derive(Clone)]
+    struct MockTimeProvider {
+        current_time: Arc<Cell<i64>>,
+    }
+
+    impl TimeProvider for MockTimeProvider {
+        fn now(&self) -> i64 {
+            self.current_time.get()
+        }
+    }
+
+    impl MockTimeProvider {
+        fn elapse_time(&self, duration: i64) {
+            let now = self.current_time.get();
+            self.current_time.set(now + duration);
+        }
+    }
+
+    impl Default for MockTimeProvider {
+        fn default() -> Self {
+            Self {
+                current_time: Arc::new(Cell::new(0)),
+            }
+        }
+    }
+
+    struct TestCache {
+        cache: SharedCache,
+        time: MockTimeProvider,
+        path: PathBuf,
+    }
+
+    impl TestCache {
+        fn new() -> Self {
+            let path = proxmox_sys::fs::make_tmp_dir("/tmp/", None).unwrap();
+
+            let options = CreateOptions::new()
+                .owner(nix::unistd::Uid::effective())
+                .group(nix::unistd::Gid::effective())
+                .perm(nix::sys::stat::Mode::from_bits_truncate(0o600));
+
+            let mut cache = SharedCache::new(&path, options).unwrap();
+            let time = MockTimeProvider::default();
+
+            cache.time_provider = Box::new(time.clone());
+
+            Self { cache, time, path }
+        }
+    }
+
+    impl Drop for TestCache {
+        fn drop(&mut self) {
+            let _ = std::fs::remove_dir_all(&self.path);
+        }
+    }
+
+    #[test]
+    fn test_expiry() {
+        let cache = TestCache::new();
+
+        cache
+            .cache
+            .set("expiring", Value::String("bar".into()), Some(10))
+            .unwrap();
+        assert!(cache.cache.get("expiring").unwrap().is_some());
+
+        cache.time.elapse_time(9);
+        assert!(cache.cache.get("expiring").unwrap().is_some());
+        cache.time.elapse_time(2);
+        assert!(cache.cache.get("expiring").unwrap().is_none());
+    }
+
+    #[test]
+    fn test_backwards_time_jump() {
+        let cache = TestCache::new();
+
+        cache.time.elapse_time(50);
+        cache
+            .cache
+            .set("future", Value::String("bar".into()), Some(10))
+            .unwrap();
+        cache.time.elapse_time(-20);
+        assert!(cache.cache.get("future").unwrap().is_none());
+    }
+
+    #[test]
+    fn test_invalid_keys() {
+        let cache = TestCache::new();
+
+        assert!(cache
+            .cache
+            .set("../escape_base", Value::Null, None)
+            .is_err());
+        assert!(cache
+            .cache
+            .set("bjørnen drikker øl", Value::Null, None)
+            .is_err());
+        assert!(cache.cache.set("test space", Value::Null, None).is_err());
+        assert!(cache.cache.set("~/foo", Value::Null, None).is_err());
+    }
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox 5/7] cache: add debian packaging
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
                   ` (3 preceding siblings ...)
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox-perl-rs 6/7] cache: add bindings for `SharedCache` from `proxmox-cache` Lukas Wagner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 proxmox-cache/debian/changelog     |  5 ++++
 proxmox-cache/debian/control       | 47 ++++++++++++++++++++++++++++++
 proxmox-cache/debian/copyright     | 18 ++++++++++++
 proxmox-cache/debian/debcargo.toml |  7 +++++
 4 files changed, 77 insertions(+)
 create mode 100644 proxmox-cache/debian/changelog
 create mode 100644 proxmox-cache/debian/control
 create mode 100644 proxmox-cache/debian/copyright
 create mode 100644 proxmox-cache/debian/debcargo.toml

diff --git a/proxmox-cache/debian/changelog b/proxmox-cache/debian/changelog
new file mode 100644
index 0000000..695750d
--- /dev/null
+++ b/proxmox-cache/debian/changelog
@@ -0,0 +1,5 @@
+rust-proxmox-cache (0.1.0-1) unstable; urgency=medium
+
+  * initial Debian package
+
+ -- Proxmox Support Team <support@proxmox.com>  Thu, 04 May 2023 08:40:38 +0200
diff --git a/proxmox-cache/debian/control b/proxmox-cache/debian/control
new file mode 100644
index 0000000..b01dd4a
--- /dev/null
+++ b/proxmox-cache/debian/control
@@ -0,0 +1,47 @@
+Source: rust-proxmox-cache
+Section: rust
+Priority: optional
+Build-Depends: debhelper (>= 12),
+ dh-cargo (>= 25),
+ cargo:native <!nocheck>,
+ rustc:native <!nocheck>,
+ libstd-rust-dev <!nocheck>,
+ librust-anyhow-1+default-dev <!nocheck>,
+ librust-proxmox-schema-2+api-types-dev <!nocheck>,
+ librust-proxmox-schema-2+default-dev <!nocheck>,
+ librust-proxmox-sys-0.5+default-dev <!nocheck>,
+ librust-proxmox-time-1+default-dev (>= 1.1.4-~~) <!nocheck>,
+ librust-serde-1+default-dev <!nocheck>,
+ librust-serde-1+derive-dev <!nocheck>,
+ librust-serde-json-1+default-dev <!nocheck>
+Maintainer: Proxmox Support Team <support@proxmox.com>
+Standards-Version: 4.6.1
+Vcs-Git: https://salsa.debian.org/rust-team/debcargo-conf.git [src/proxmox-cache]
+Vcs-Browser: https://salsa.debian.org/rust-team/debcargo-conf/tree/master/src/proxmox-cache
+X-Cargo-Crate: proxmox-cache
+Rules-Requires-Root: no
+
+Package: librust-proxmox-cache-dev
+Architecture: any
+Multi-Arch: same
+Depends:
+ ${misc:Depends},
+ librust-anyhow-1+default-dev,
+ librust-proxmox-schema-2+api-types-dev,
+ librust-proxmox-schema-2+default-dev,
+ librust-proxmox-sys-0.5+default-dev,
+ librust-proxmox-time-1+default-dev (>= 1.1.4-~~),
+ librust-serde-1+default-dev,
+ librust-serde-1+derive-dev,
+ librust-serde-json-1+default-dev
+Provides:
+ librust-proxmox-cache+default-dev (= ${binary:Version}),
+ librust-proxmox-cache-0-dev (= ${binary:Version}),
+ librust-proxmox-cache-0+default-dev (= ${binary:Version}),
+ librust-proxmox-cache-0.1-dev (= ${binary:Version}),
+ librust-proxmox-cache-0.1+default-dev (= ${binary:Version}),
+ librust-proxmox-cache-0.1.0-dev (= ${binary:Version}),
+ librust-proxmox-cache-0.1.0+default-dev (= ${binary:Version})
+Description: Cache implementations - Rust source code
+ This package contains the source for the Rust proxmox-cache crate, packaged by
+ debcargo for use with cargo and dh-cargo.
diff --git a/proxmox-cache/debian/copyright b/proxmox-cache/debian/copyright
new file mode 100644
index 0000000..0d9eab3
--- /dev/null
+++ b/proxmox-cache/debian/copyright
@@ -0,0 +1,18 @@
+Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+
+Files:
+ *
+Copyright: 2019 - 2023 Proxmox Server Solutions GmbH <support@proxmox.com>
+License: AGPL-3.0-or-later
+ This program is free software: you can redistribute it and/or modify it under
+ the terms of the GNU Affero General Public License as published by the Free
+ Software Foundation, either version 3 of the License, or (at your option) any
+ later version.
+ .
+ This program is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more
+ details.
+ .
+ You should have received a copy of the GNU Affero General Public License along
+ with this program. If not, see <https://www.gnu.org/licenses/>.
diff --git a/proxmox-cache/debian/debcargo.toml b/proxmox-cache/debian/debcargo.toml
new file mode 100644
index 0000000..14ad800
--- /dev/null
+++ b/proxmox-cache/debian/debcargo.toml
@@ -0,0 +1,7 @@
+overlay = "."
+crate_src_path = ".."
+maintainer = "Proxmox Support Team <support@proxmox.com>"
+
+[source]
+#vcs_git = "git://git.proxmox.com/git/proxmox.git"
+#vcs_browser = "https://git.proxmox.com/?p=proxmox.git"
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC proxmox-perl-rs 6/7] cache: add bindings for `SharedCache` from `proxmox-cache`
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
                   ` (4 preceding siblings ...)
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 5/7] cache: add debian packaging Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-21 13:44 ` [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status Lukas Wagner
  2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
  7 siblings, 0 replies; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 common/pkg/Makefile |  1 +
 common/src/cache.rs | 59 +++++++++++++++++++++++++++++++++++++++++++++
 common/src/mod.rs   |  1 +
 pve-rs/Cargo.toml   |  5 ++++
 pve-rs/src/lib.rs   |  1 +
 5 files changed, 67 insertions(+)
 create mode 100644 common/src/cache.rs

diff --git a/common/pkg/Makefile b/common/pkg/Makefile
index 7bf669f..e714570 100644
--- a/common/pkg/Makefile
+++ b/common/pkg/Makefile
@@ -26,6 +26,7 @@ Proxmox/RS/CalendarEvent.pm:
 	  Proxmox::RS::APT::Repositories \
 	  Proxmox::RS::CalendarEvent \
 	  Proxmox::RS::Notify \
+	  Proxmox::RS::SharedCache \
 	  Proxmox::RS::Subscription
 
 all: Proxmox/RS/CalendarEvent.pm
diff --git a/common/src/cache.rs b/common/src/cache.rs
new file mode 100644
index 0000000..e0ad1ea
--- /dev/null
+++ b/common/src/cache.rs
@@ -0,0 +1,59 @@
+#[perlmod::package(name = "Proxmox::RS::SharedCache")]
+mod export {
+    use anyhow::Error;
+    use nix::sys::stat::Mode;
+    use perlmod::Value;
+    use serde_json::Value as JSONValue;
+
+    use proxmox_cache::{Cache, SharedCache};
+    use proxmox_sys::fs::CreateOptions;
+
+    pub struct CacheWrapper {
+        cache: SharedCache,
+    }
+
+    perlmod::declare_magic!(Box<CacheWrapper> : &CacheWrapper as "Proxmox::RS::SharedCache");
+
+    #[export(raw_return)]
+    fn new(#[raw] class: Value, base_dir: &str) -> Result<Value, Error> {
+        let options = CreateOptions::new()
+            .root_only()
+            .perm(Mode::from_bits_truncate(0o700));
+
+        Ok(perlmod::instantiate_magic!(&class, MAGIC => Box::new(
+            CacheWrapper {
+                cache: SharedCache::new(base_dir, options)?
+            }
+        )))
+    }
+
+    #[export]
+    fn set(
+        #[try_from_ref] this: &CacheWrapper,
+        key: &str,
+        value: JSONValue,
+        expires_in: Option<i64>,
+    ) {
+        if let Err(err) = this.cache.set(key, value, expires_in) {
+            log::error!("could not set cache entry '{key}': {err}")
+        }
+    }
+
+    #[export]
+    fn get(#[try_from_ref] this: &CacheWrapper, key: &str) -> Option<JSONValue> {
+        match this.cache.get(key) {
+            Ok(val) => val,
+            Err(err) => {
+                log::error!("could not get cache entry '{key}': {err}");
+                None
+            }
+        }
+    }
+
+    #[export]
+    fn delete(#[try_from_ref] this: &CacheWrapper, key: &str) {
+        if let Err(err) = this.cache.delete(key) {
+            log::error!("could not delete cache entry '{key}': {err}");
+        }
+    }
+}
diff --git a/common/src/mod.rs b/common/src/mod.rs
index c3574f4..136f7ae 100644
--- a/common/src/mod.rs
+++ b/common/src/mod.rs
@@ -1,4 +1,5 @@
 pub mod apt;
+pub mod cache;
 mod calendar_event;
 pub mod logger;
 pub mod notify;
diff --git a/pve-rs/Cargo.toml b/pve-rs/Cargo.toml
index afd50f4..5b878b9 100644
--- a/pve-rs/Cargo.toml
+++ b/pve-rs/Cargo.toml
@@ -34,6 +34,7 @@ url = "2"
 perlmod = { version = "0.13", features = [ "exporter" ] }
 
 proxmox-apt = "0.10"
+proxmox-cache = "0.1.0"
 proxmox-http = { version = "0.9", features = ["client-sync", "client-trait"] }
 proxmox-http-error = "0.1.0"
 proxmox-notify = "0.2"
@@ -43,3 +44,7 @@ proxmox-subscription = "0.4"
 proxmox-sys = "0.5"
 proxmox-tfa = { version = "4.0.4", features = ["api"] }
 proxmox-time = "1.1.3"
+
+#[patch.crates-io]
+#proxmox-cache = { path = "../../proxmox/proxmox-cache" }
+#proxmox-sys = { path = "../../proxmox/proxmox-sys" }
diff --git a/pve-rs/src/lib.rs b/pve-rs/src/lib.rs
index d1915c9..7e39e85 100644
--- a/pve-rs/src/lib.rs
+++ b/pve-rs/src/lib.rs
@@ -4,6 +4,7 @@
 pub mod common;
 
 pub mod apt;
+pub mod cache;
 pub mod notify_context;
 pub mod openid;
 pub mod resource_scheduling;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
                   ` (5 preceding siblings ...)
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox-perl-rs 6/7] cache: add bindings for `SharedCache` from `proxmox-cache` Lukas Wagner
@ 2023-08-21 13:44 ` Lukas Wagner
  2023-08-22  8:51   ` Lukas Wagner
  2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
  7 siblings, 1 reply; 19+ messages in thread
From: Lukas Wagner @ 2023-08-21 13:44 UTC (permalink / raw)
  To: pve-devel

Cache storage plugin status so that pvestatd and API calls can use the
cached results, without having to query all storage plugins again.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
---
 src/PVE/API2/Storage/Config.pm | 10 +++++++++
 src/PVE/Storage.pm             | 40 ++++++++++++++++++++++++----------
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/src/PVE/API2/Storage/Config.pm b/src/PVE/API2/Storage/Config.pm
index e04b6ab..8c3df99 100755
--- a/src/PVE/API2/Storage/Config.pm
+++ b/src/PVE/API2/Storage/Config.pm
@@ -16,6 +16,7 @@ use PVE::JSONSchema qw(get_standard_option);
 use PVE::RPCEnvironment;
 
 use PVE::RESTHandler;
+use PVE::RS::Cache;
 
 use base qw(PVE::RESTHandler);
 
@@ -274,6 +275,9 @@ __PACKAGE__->register_method ({
 		die $err;
 	    }
 
+	    # Remove cached plugin status on configuration changes.
+	    PVE::RS::Cache->pvestatd_cache()->delete("storage_plugin_status");
+
 	    PVE::Storage::write_config($cfg);
 
 	}, "create storage failed");
@@ -373,6 +377,9 @@ __PACKAGE__->register_method ({
 		    ." in Proxmox VE 9. Use 'create-base-path' or 'create-subdirs' instead.\n"
 	    }
 
+	    # Remove cached plugin status on configuration changes.
+	    PVE::RS::Cache->pvestatd_cache()->delete("storage_plugin_status");
+
 	    PVE::Storage::write_config($cfg);
 
 	}, "update storage failed");
@@ -422,6 +429,9 @@ __PACKAGE__->register_method ({
 
 	    delete $cfg->{ids}->{$storeid};
 
+	    # Remove cached plugin status on configuration changes.
+	    PVE::RS::Cache->pvestatd_cache()->delete("storage_plugin_status");
+
 	    PVE::Storage::write_config($cfg);
 
 	}, "delete storage failed");
diff --git a/src/PVE/Storage.pm b/src/PVE/Storage.pm
index a4d85e1..7aa3b2e 100755
--- a/src/PVE/Storage.pm
+++ b/src/PVE/Storage.pm
@@ -23,6 +23,7 @@ use PVE::INotify;
 use PVE::RPCEnvironment;
 use PVE::SSHInfo;
 use PVE::RESTEnvironment qw(log_warn);
+use PVE::RS::Cache;
 
 use PVE::Storage::Plugin;
 use PVE::Storage::DirPlugin;
@@ -1276,6 +1277,10 @@ sub storage_info {
 
     my $cache = {};
 
+    my $status_cache = PVE::RS::Cache->pvestatd_cache();
+    my $cached_status = $status_cache->get("storage_plugin_status");
+    my $refresh_status = !$cached_status;
+
     foreach my $storeid (keys %$ids) {
 	my $scfg = $ids->{$storeid};
 
@@ -1291,21 +1296,34 @@ sub storage_info {
 		if $pd->{select_existing};
 	}
 
-	eval { activate_storage($cfg, $storeid, $cache); };
-	if (my $err = $@) {
-	    warn $err;
-	    next;
+	if ($refresh_status) {
+	    eval { activate_storage($cfg, $storeid, $cache); };
+	    if (my $err = $@) {
+		warn $err;
+		next;
+	    }
+	    $cached_status->{$storeid} = eval { $plugin->status($storeid, $scfg, $cache); };
+
+	    my ($total, $avail, $used, $active) = eval { $plugin->status($storeid, $scfg, $cache); };
+	    warn $@ if $@;
+	    $cached_status->{$storeid} = {};
+
+	    $cached_status->{$storeid}->{total} = int($total);
+	    $cached_status->{$storeid}->{avail} = int($avail);
+	    $cached_status->{$storeid}->{used} = int($used);
+	    $cached_status->{$storeid}->{active} = $active;
+	    next if !$active;
 	}
 
-	my ($total, $avail, $used, $active) = eval { $plugin->status($storeid, $scfg, $cache); };
-	warn $@ if $@;
-	next if !$active;
-	$info->{$storeid}->{total} = int($total);
-	$info->{$storeid}->{avail} = int($avail);
-	$info->{$storeid}->{used} = int($used);
-	$info->{$storeid}->{active} = $active;
+	$info->{$storeid}->{total} = $cached_status->{$storeid}->{total};
+	$info->{$storeid}->{avail} = $cached_status->{$storeid}->{avail};
+	$info->{$storeid}->{used} = $cached_status->{$storeid}->{used};
+	$info->{$storeid}->{active} = $cached_status->{$storeid}->{active};
     }
 
+    # TODO: How long should status results be valid?
+    $status_cache->set('storage_plugin_status', $cached_status, 30) if $refresh_status;
+
     return $info;
 }
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir Lukas Wagner
@ 2023-08-22  8:39   ` Wolfgang Bumiller
  0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22  8:39 UTC (permalink / raw)
  To: Lukas Wagner; +Cc: pve-devel

On Mon, Aug 21, 2023 at 03:44:39PM +0200, Lukas Wagner wrote:
> Under the hood, this function calls `mkdtemp` from libc. Unfortunatly
> the nix crate did not provide bindings for this function, so we have
> to call into libc directly.
> 
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
>  proxmox-sys/src/fs/dir.rs | 73 +++++++++++++++++++++++++++++++++++++--
>  1 file changed, 71 insertions(+), 2 deletions(-)
> 
> diff --git a/proxmox-sys/src/fs/dir.rs b/proxmox-sys/src/fs/dir.rs
> index 6aee316..72bf1ad 100644
> --- a/proxmox-sys/src/fs/dir.rs
> +++ b/proxmox-sys/src/fs/dir.rs
> @@ -1,6 +1,8 @@
> -use std::ffi::CStr;
> +use std::ffi::{CStr, CString, OsStr};
> +use std::fs::File;
> +use std::os::unix::ffi::OsStrExt;
>  use std::os::unix::io::{AsRawFd, OwnedFd};
> -use std::path::Path;
> +use std::path::{Path, PathBuf};
>  
>  use anyhow::{bail, Error};
>  use nix::errno::Errno;
> @@ -8,6 +10,8 @@ use nix::fcntl::OFlag;
>  use nix::sys::stat;
>  use nix::unistd;
>  
> +use proxmox_lang::try_block;
> +
>  use crate::fs::{fchown, CreateOptions};
>  
>  /// Creates directory at the provided path with specified ownership.
> @@ -152,6 +156,54 @@ fn create_path_at_do(
>      }
>  }
>  
> +///  Create a temporary directory.
> +///
> +/// `prefix` determines where the temporary directory will be created. For instance, if
> +/// `prefix` is `/tmp`, on success the function will return a path in the style of
> +/// `/tmp/tmp_XXXXXX`, where X stands for a random string, ensuring that the path is unique.
> +///
> +/// By default, the created directory has `0o700` permissions. If this is not desired, custom
> +/// [`CreateOptions`] can be passed via the `option` parameter.
> +pub fn make_tmp_dir<P: AsRef<Path>>(
> +    prefix: P,
> +    options: Option<CreateOptions>,
> +) -> Result<PathBuf, Error> {
> +    let mut template = prefix.as_ref().to_owned();
> +    template = template.join("tmp_XXXXXX");
> +    let template = CString::new(template.into_os_string().as_bytes())?;
> +
> +    let raw_template_buffer = template.into_raw();

^ This might be shorter without going over the `CString` type with just
a `Vec<u8>` with an explicit `.push(0)` without temporarily giving up
ownership for the `mkdtemp` call.

> +
> +    let path = unsafe {
> +        let raw_returned_buffer = libc::mkdtemp(raw_template_buffer);
> +        if raw_returned_buffer.is_null() {

Need to add

    let err = io::Error::last_os_error();

as the very first thing you do in this branch.

Never give any external libraries a chance to mess with your
`errno` values before you use them, even `std` ;-)

> +            // The returned pointer points to the same buffer, so in case
> +            // of an error we need to make sure to claim it back to that
> +            // it is freed properly.
> +            drop(CString::from_raw(raw_template_buffer));

^ but I think we could avoid this - but as long as you fix up the
`errno` usage the CString code can also just stay this way, no strong
feelings there.

> +            return Err(std::io::Error::last_os_error().into());
> +        }
> +        CString::from_raw(raw_returned_buffer)
> +    };
> +
> +    let path = OsStr::from_bytes(path.as_bytes());
> +    let path = PathBuf::from(path);

^ This seems like there should be a cheap non-copying version:
    PathBuf::from(OsString::from_vec(path.into_bytes())) ?

> +
> +    if let Some(options) = options {
> +        if let Err(err) = try_block!({
> +            let fd = crate::fd::open(&path, OFlag::O_DIRECTORY, stat::Mode::empty())?;
> +            let mut file = File::from(fd);
> +            options.apply_to(&mut file, &path)?;

^ Huh, just noticing this weirdness, can we fix up the apply_to API to
take a `RawFd` or `&BorrowedFd` instead of a `File`? This is... not a
file... :-) And also `CreateOptions` doesn't really need it to be
mutable ;-)

> +            Ok::<(), Error>(())
> +        }) {
> +            let _ = unistd::unlink(&path);

^ This calls `unlink(2)` which does not remove directories. You need to
use either `std::fs::remove_dir()` or `unlinkat` with
`UnlinkatFlags::RemoveDir`.

Also, please also log the error if this fails.

> +            bail!("could not apply create options to new temporary directory: {err}");
> +        }
> +    }
> +
> +    Ok(path)
> +}
> +
>  #[cfg(test)]
>  mod tests {
>      use super::*;
> @@ -169,4 +221,21 @@ mod tests {
>          )
>          .expect("expected create_path to work");
>      }
> +
> +    #[test]
> +    fn test_make_tmp_dir() -> Result<(), Error> {
> +        let options = CreateOptions::new()
> +            .owner(unistd::Uid::effective())
> +            .group(unistd::Gid::effective())
> +            .perm(stat::Mode::from_bits_truncate(0o755));
> +
> +        let path = make_tmp_dir("/tmp", Some(options))?;
> +
> +        assert!(path.exists());
> +        assert!(path.is_dir());
> +
> +        std::fs::remove_dir_all(&path)?;
> +
> +        Ok(())
> +    }
>  }
> -- 
> 2.39.2




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status
  2023-08-21 13:44 ` [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status Lukas Wagner
@ 2023-08-22  8:51   ` Lukas Wagner
  0 siblings, 0 replies; 19+ messages in thread
From: Lukas Wagner @ 2023-08-22  8:51 UTC (permalink / raw)
  To: pve-devel


On 8/21/23 15:44, Lukas Wagner wrote:
> +    my $status_cache = PVE::RS::Cache->pvestatd_cache();

Urgh, just noticed that I seemingly have forgotten to commit a 
refactoring step in `pve-storage`.

Should be `Proxmox::RS::SharedCache->new("/run/pvestatd-cache")`

Everything else should be the same, and as this is just an RFC to get 
feedback on the approach, it shouldn't matter that much ;)


-- 
- Lukas




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls
  2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
                   ` (6 preceding siblings ...)
  2023-08-21 13:44 ` [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status Lukas Wagner
@ 2023-08-22  9:17 ` Fiona Ebner
  2023-08-22 11:25   ` Wolfgang Bumiller
  2023-08-30 17:07   ` Wolf Noble
  7 siblings, 2 replies; 19+ messages in thread
From: Fiona Ebner @ 2023-08-22  9:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Lukas Wagner

Am 21.08.23 um 15:44 schrieb Lukas Wagner:
> Open questions:
>   - not sure what a good expiration time for cached entries is. For
>     now I picked 30s, but there was not much thought behind that value.
> 

If operations affecting the values like allocation, resize, etc. would
invalidate the cache, I think we could go for a bit more. But if they
don't, the limit can't be too high IMHO. Otherwise, users will wonder
why the usage on the storage doesn't change after their action.

And would it make sense to have the cache be opt-in? So that only
pvestatd would use it, but standalone API/CLI calls always get current
values? If there is invalidation like mentioned above, that might not be
needed, but otherwise, I'm a bit afraid that handing out (slightly)
outdated values might trip up some automated scripts doing batch work or
something.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' Lukas Wagner
@ 2023-08-22 10:08   ` Max Carrara
  2023-08-22 11:33     ` Lukas Wagner
  2023-08-22 11:56     ` Wolfgang Bumiller
  0 siblings, 2 replies; 19+ messages in thread
From: Max Carrara @ 2023-08-22 10:08 UTC (permalink / raw)
  To: Proxmox VE development discussion, Lukas Wagner

On 8/21/23 15:44, Lukas Wagner wrote:
> For now, it contains a file-backed cache with expiration logic.
> The cache should be safe to be accessed from multiple processes at
> once.
> 

This seems pretty neat! The cache implementation seems straightforward
enough. I'll see if I can test it more thoroughly later.

However, in my opinion we should have a crate like
"proxmox-collections" (or something of the sort) with modules for each
data structure / collection similar to the standard library; I'm
curious what others think about that. imo it would be a great
opportunity to introduce that crate in this series, since you're
already introducing one for the cache anyway.

So, proxmox-collections would look something like this:

  proxmox-collections
  └── src
      ├── cache
      │   ├── mod.rs
      │   └── shared_cache.rs
      └── lib.rs

Let me know what you think!

> The cache stores values in a directory, based on the key.
> E.g. key "foo" results in a file 'foo.json' in the given base
> directory. If a new value is set, the file is atomically replaced.
> The JSON file also contains some metadata, namely 'added_at' and
> 'expire_in' - they are used for cache expiration.
> 
> Note: This cache is not suited to applications that
>  - Might want to cache huge amounts of data, and/or access the cache
>    very frequently (due to the overhead of JSON de/serialization)
>  - Require arbitrary keys - right now, keys are limited by
>    SAFE_ID_REGEX
> 
> The cache was developed for the use in pvestatd, in order to cache
> e.g. storage plugin status. There, these limitations do not really
> play any role.
> 
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
>  Cargo.toml                            |   1 +
>  proxmox-cache/Cargo.toml              |  20 ++
>  proxmox-cache/examples/performance.rs |  82 ++++++++
>  proxmox-cache/src/lib.rs              |  40 ++++
>  proxmox-cache/src/shared_cache.rs     | 263 ++++++++++++++++++++++++++
>  5 files changed, 406 insertions(+)
>  create mode 100644 proxmox-cache/Cargo.toml
>  create mode 100644 proxmox-cache/examples/performance.rs
>  create mode 100644 proxmox-cache/src/lib.rs
>  create mode 100644 proxmox-cache/src/shared_cache.rs
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index e334ac1..940e1d0 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -5,6 +5,7 @@ members = [
>      "proxmox-async",
>      "proxmox-auth-api",
>      "proxmox-borrow",
> +    "proxmox-cache",
>      "proxmox-client",
>      "proxmox-compression",
>      "proxmox-http",
> diff --git a/proxmox-cache/Cargo.toml b/proxmox-cache/Cargo.toml
> new file mode 100644
> index 0000000..b20921f
> --- /dev/null
> +++ b/proxmox-cache/Cargo.toml
> @@ -0,0 +1,20 @@
> +[package]
> +name = "proxmox-cache"
> +version = "0.1.0"
> +authors.workspace = true
> +edition.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +exclude.workspace = true
> +description = "Cache implementations"
> +
> +[dependencies]
> +anyhow.workspace = true
> +proxmox-sys.workspace = true
> +proxmox-time.workspace = true
> +proxmox-schema = { workspace = true, features = ["api-types"]}
> +serde_json.workspace = true
> +serde = { workspace = true, features = ["derive"]}
> +
> +[dev-dependencies]
> +nix.workspace = true
> diff --git a/proxmox-cache/examples/performance.rs b/proxmox-cache/examples/performance.rs
> new file mode 100644
> index 0000000..420f61c
> --- /dev/null
> +++ b/proxmox-cache/examples/performance.rs
> @@ -0,0 +1,82 @@
> +use proxmox_cache::Cache;
> +use proxmox_cache::SharedCache;
> +use proxmox_sys::fs::CreateOptions;
> +use std::time::Instant;
> +
> +fn main() {
> +    let options = CreateOptions::new()
> +        .owner(nix::unistd::Uid::effective())
> +        .group(nix::unistd::Gid::effective())
> +        .perm(nix::sys::stat::Mode::from_bits_truncate(0o755));
> +
> +    let cache = SharedCache::new("/tmp/pmx-cache", options).unwrap();
> +
> +    let mut keys = Vec::new();
> +
> +    for i in 0..100000 {
> +        keys.push(format!("key_{i}"));
> +    }
> +
> +    let data = serde_json::json!({
> +        "member1": "foo",
> +        "member2": "foo",
> +        "member3": "foo",
> +        "member4": "foo",
> +        "member5": "foo",
> +        "member5": "foo",
> +        "member6": "foo",
> +        "member7": "foo",
> +        "member8": "foo",
> +        "array": [10, 20, 30, 40, 50],
> +        "object": {
> +            "member1": "foo",
> +            "member2": "foo",
> +            "member3": "foo",
> +            "member4": "foo",
> +            "member5": "foo",
> +            "member5": "foo",
> +            "member6": "foo",
> +            "member7": "foo",
> +            "member8": "foo",
> +        }
> +    });
> +
> +    let before = Instant::now();
> +
> +    for key in &keys {
> +        cache
> +            .set(key, data.clone(), None)
> +            .expect("could not insert value");
> +    }
> +
> +    let time = Instant::now() - before;
> +    let time_per_op = time / keys.len() as u32;
> +    println!(
> +        "inserting {len} keys took {time:?} ({time_per_op:?} per key)",
> +        len = keys.len(),
> +    );
> +
> +    let before = Instant::now();
> +    for key in &keys {
> +        let _ = cache.get(key).expect("could not get value");
> +    }
> +
> +    let time = Instant::now() - before;
> +    let time_per_op = time / keys.len() as u32;
> +    println!(
> +        "getting {len} keys took {time:?} ({time_per_op:?} per key)",
> +        len = keys.len(),
> +    );
> +
> +    let before = Instant::now();
> +    for key in &keys {
> +        cache.delete(key).expect("could not delete value");
> +    }
> +
> +    let time = Instant::now() - before;
> +    let time_per_op = time / keys.len() as u32;
> +    println!(
> +        "deleting {len} keys took {time:?} ({time_per_op:?} per key)",
> +        len = keys.len(),
> +    );
> +}
> diff --git a/proxmox-cache/src/lib.rs b/proxmox-cache/src/lib.rs
> new file mode 100644
> index 0000000..d496dc7
> --- /dev/null
> +++ b/proxmox-cache/src/lib.rs
> @@ -0,0 +1,40 @@
> +use anyhow::Error;
> +use serde_json::Value;
> +
> +pub mod shared_cache;
> +
> +pub use shared_cache::SharedCache;
> +
> +trait TimeProvider {
> +    /// Returns the current time as a UNIX epoch (second resolution)
> +    fn now(&self) -> i64;
> +}
> +
> +struct DefaultTimeProvider;
> +
> +impl TimeProvider for DefaultTimeProvider {
> +    fn now(&self) -> i64 {
> +        proxmox_time::epoch_i64()
> +    }
> +}
> +
> +pub trait Cache {
> +    /// Set or insert a cache entry.
> +    ///
> +    /// If `expires_in` is set, this entry will expire in the desired number of
> +    /// seconds.
> +    fn set<S: AsRef<str>>(
> +        &self,
> +        key: S,
> +        value: Value,
> +        expires_in: Option<i64>,
> +    ) -> Result<(), Error>;
> +
> +    /// Delete a cache entry.
> +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error>;
> +
> +    /// Get a value from the cache.
> +    ///
> +    /// Expired entries will *not* be returned.
> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
> +}

I don't necessarily think that a trait would be necessary in this
case, as there's not really any other structure (that can be used as
caching mechanism) that you're abstracting over. (more below)

> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
> new file mode 100644
> index 0000000..be6212c
> --- /dev/null
> +++ b/proxmox-cache/src/shared_cache.rs
> @@ -0,0 +1,263 @@
> +use std::path::{Path, PathBuf};
> +
> +use anyhow::{bail, Error};
> +use serde::{Deserialize, Serialize};
> +use serde_json::Value;
> +
> +use proxmox_schema::api_types::SAFE_ID_FORMAT;
> +use proxmox_sys::fs::CreateOptions;
> +
> +use crate::{Cache, DefaultTimeProvider, TimeProvider};
> +
> +/// A simple, file-backed cache that can be used from multiple processes concurrently.
> +///
> +/// Cache entries are stored as individual files inside a base directory. For instance,
> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
> +/// the base directory. As the extension implies, the cached data will be stored as a JSON
> +/// string.
> +///
> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
> +///
> +/// ## Key Space
> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
> +///
> +/// ## Concurrency
> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
> +/// multiple processes at the same time is safe.
> +///
> +/// ## Performance
> +/// On a tmpfs:
> +/// ```sh
> +///   $ cargo run --release --example=performance
> +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
> +///   getting 100000 keys took 584.874842ms (5.848µs per key)
> +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
> +///
> +/// Inserting/getting large objects might of course result in lower performance due to the cost
> +/// of serialization.
> +/// ```
> +///
> +pub struct SharedCache {
> +    base_path: PathBuf,
> +    time_provider: Box<dyn TimeProvider>,
> +    create_options: CreateOptions,
> +}

Instead, this should be generic:

pub struct SharedCache<K, V> { ... }

.. and maybe rename it to SharedFileCache to make it explicit that this
operates on a file. (but that's more dependent on one's taste tbh)

.. and the impl block below ...

> +
> +impl Cache for SharedCache {
> +    fn set<S: AsRef<str>>(
> +        &self,
> +        key: S,
> +        value: Value,
> +        expires_in: Option<i64>,
> +    ) -> Result<(), Error> {
> +        let path = self.get_path_for_key(key.as_ref())?;
> +        let added_at = self.time_provider.now();
> +
> +        let item = CachedItem {
> +            value,
> +            added_at,
> +            expires_in,
> +        };
> +
> +        let serialized = serde_json::to_vec_pretty(&item)?;
> +
> +        // Atomically replace file
> +        proxmox_sys::fs::replace_file(path, &serialized, self.create_options.clone(), true)?;
> +        Ok(())
> +    }
> +
> +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error> {
> +        let path = self.get_path_for_key(key.as_ref())?;
> +        std::fs::remove_file(path)?;
> +        Ok(())
> +    }
> +
> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error> {
> +        let path = self.get_path_for_key(key.as_ref())?;
> +
> +        let value = if let Some(content) = proxmox_sys::fs::file_get_optional_contents(path)? {
> +            let value: CachedItem = serde_json::from_slice(&content)?;
> +
> +            let now = self.time_provider.now();
> +
> +            if let Some(expires_in) = value.expires_in {
> +                // Check if value is not expired yet. Also do not allow
> +                // values from the future, in case we have clock jumps
> +                if value.added_at + expires_in > now && value.added_at <= now {
> +                    Some(value.value)
> +                } else {
> +                    None
> +                }
> +            } else {
> +                Some(value.value)
> +            }
> +        } else {
> +            None
> +        };
> +
> +        Ok(value)
> +    }
> +}

... can be replaced as follows, in order to make it similar to
std::collections::{HashMap, BTreeMap}:

impl<K: AsRef<str>> for SharedCache<K, Value> {
    // Returns old value on successful insert, if given
    fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
        // ...
    }

    fn get(&self, k: K) -> Result<Option<Value>, Error> {
        // ...
    }

    fn remove(&self, k: K) -> Result<Option<Value>, Error> {
        // ...
    }
}

If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
be added as well, such as remove_entry, retain, clear, etc.


> +
> +impl SharedCache {
> +    pub fn new<P: AsRef<Path>>(base_path: P, options: CreateOptions) -> Result<Self, Error> {
> +        proxmox_sys::fs::create_path(
> +            base_path.as_ref(),
> +            Some(options.clone()),
> +            Some(options.clone()),
> +        )?;
> +
> +        Ok(SharedCache {
> +            base_path: base_path.as_ref().to_owned(),
> +            time_provider: Box::new(DefaultTimeProvider),
> +            create_options: options,
> +        })
> +    }
> +
> +    fn enforce_safe_key(key: &str) -> Result<(), Error> {
> +        let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format();
> +        if safe_id_regex.is_match(key) {
> +            Ok(())
> +        } else {
> +            bail!("invalid key format")
> +        }
> +    }
> +
> +    fn get_path_for_key(&self, key: &str) -> Result<PathBuf, Error> {
> +        Self::enforce_safe_key(key)?;
> +        let mut path = self.base_path.join(key);
> +        path.set_extension("json");
> +        Ok(path)
> +    }
> +}
> +
> +#[derive(Debug, Clone, Serialize, Deserialize)]
> +struct CachedItem {
> +    value: Value,
> +    added_at: i64,
> +    expires_in: Option<i64>,
> +}
> +

... and for completion's sake: This can stay, as it's specific to the
alternative implementation I've written above.

All in all, I think this would make your implementation more flexible.
Let me know what you think!

> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +    use std::cell::Cell;
> +    use std::sync::Arc;
> +
> +    #[test]
> +    fn test_basic_set_and_get() {
> +        let cache = TestCache::new();
> +        cache
> +            .cache
> +            .set("foo", Value::String("bar".into()), None)
> +            .unwrap();
> +
> +        assert_eq!(
> +            cache.cache.get("foo").unwrap(),
> +            Some(Value::String("bar".into()))
> +        );
> +        assert!(cache.cache.get("notthere").unwrap().is_none());
> +    }
> +
> +    #[derive(Clone)]
> +    struct MockTimeProvider {
> +        current_time: Arc<Cell<i64>>,
> +    }
> +
> +    impl TimeProvider for MockTimeProvider {
> +        fn now(&self) -> i64 {
> +            self.current_time.get()
> +        }
> +    }
> +
> +    impl MockTimeProvider {
> +        fn elapse_time(&self, duration: i64) {
> +            let now = self.current_time.get();
> +            self.current_time.set(now + duration);
> +        }
> +    }
> +
> +    impl Default for MockTimeProvider {
> +        fn default() -> Self {
> +            Self {
> +                current_time: Arc::new(Cell::new(0)),
> +            }
> +        }
> +    }
> +
> +    struct TestCache {
> +        cache: SharedCache,
> +        time: MockTimeProvider,
> +        path: PathBuf,
> +    }
> +
> +    impl TestCache {
> +        fn new() -> Self {
> +            let path = proxmox_sys::fs::make_tmp_dir("/tmp/", None).unwrap();
> +
> +            let options = CreateOptions::new()
> +                .owner(nix::unistd::Uid::effective())
> +                .group(nix::unistd::Gid::effective())
> +                .perm(nix::sys::stat::Mode::from_bits_truncate(0o600));
> +
> +            let mut cache = SharedCache::new(&path, options).unwrap();
> +            let time = MockTimeProvider::default();
> +
> +            cache.time_provider = Box::new(time.clone());
> +
> +            Self { cache, time, path }
> +        }
> +    }
> +
> +    impl Drop for TestCache {
> +        fn drop(&mut self) {
> +            let _ = std::fs::remove_dir_all(&self.path);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_expiry() {
> +        let cache = TestCache::new();
> +
> +        cache
> +            .cache
> +            .set("expiring", Value::String("bar".into()), Some(10))
> +            .unwrap();
> +        assert!(cache.cache.get("expiring").unwrap().is_some());
> +
> +        cache.time.elapse_time(9);
> +        assert!(cache.cache.get("expiring").unwrap().is_some());
> +        cache.time.elapse_time(2);
> +        assert!(cache.cache.get("expiring").unwrap().is_none());
> +    }
> +
> +    #[test]
> +    fn test_backwards_time_jump() {
> +        let cache = TestCache::new();
> +
> +        cache.time.elapse_time(50);
> +        cache
> +            .cache
> +            .set("future", Value::String("bar".into()), Some(10))
> +            .unwrap();
> +        cache.time.elapse_time(-20);
> +        assert!(cache.cache.get("future").unwrap().is_none());
> +    }
> +
> +    #[test]
> +    fn test_invalid_keys() {
> +        let cache = TestCache::new();
> +
> +        assert!(cache
> +            .cache
> +            .set("../escape_base", Value::Null, None)
> +            .is_err());
> +        assert!(cache
> +            .cache
> +            .set("bjørnen drikker øl", Value::Null, None)
> +            .is_err());
> +        assert!(cache.cache.set("test space", Value::Null, None).is_err());
> +        assert!(cache.cache.set("~/foo", Value::Null, None).is_err());
> +    }
> +}

[0]: https://doc.rust-lang.org/reference/items/traits.html#object-safety





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls
  2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
@ 2023-08-22 11:25   ` Wolfgang Bumiller
  2023-08-30 17:07   ` Wolf Noble
  1 sibling, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 11:25 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: Proxmox VE development discussion, Lukas Wagner

On Tue, Aug 22, 2023 at 11:17:14AM +0200, Fiona Ebner wrote:
> Am 21.08.23 um 15:44 schrieb Lukas Wagner:
> > Open questions:
> >   - not sure what a good expiration time for cached entries is. For
> >     now I picked 30s, but there was not much thought behind that value.
> > 
> 
> If operations affecting the values like allocation, resize, etc. would
> invalidate the cache, I think we could go for a bit more. But if they
> don't, the limit can't be too high IMHO. Otherwise, users will wonder
> why the usage on the storage doesn't change after their action.
> 
> And would it make sense to have the cache be opt-in? So that only
> pvestatd would use it, but standalone API/CLI calls always get current
> values? If there is invalidation like mentioned above, that might not be
> needed, but otherwise, I'm a bit afraid that handing out (slightly)
> outdated values might trip up some automated scripts doing batch work or
> something.

CLI tools should definitely have an explicit cache-opt-out switch (or -
for some of them - that might even be a sensible default, but ideally
you'd also have a way to quickly check for temporary discrepancies
between the cache and the current value via the CLI).
But I the API should stick to to using the cache by default. (But can of
course also have an explicit cache-opt-out or force-cache-refresh option
- whichever makes more sense.)
That's kind of the whole idea, to not have 5 daemons query the exact
same data simultaneously ;-)




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-22 10:08   ` Max Carrara
@ 2023-08-22 11:33     ` Lukas Wagner
  2023-08-22 12:01       ` Wolfgang Bumiller
  2023-08-22 11:56     ` Wolfgang Bumiller
  1 sibling, 1 reply; 19+ messages in thread
From: Lukas Wagner @ 2023-08-22 11:33 UTC (permalink / raw)
  To: Max Carrara, Proxmox VE development discussion

Thanks for the review! Comments inline.

On 8/22/23 12:08, Max Carrara wrote:
> On 8/21/23 15:44, Lukas Wagner wrote:
>> For now, it contains a file-backed cache with expiration logic.
>> The cache should be safe to be accessed from multiple processes at
>> once.
>>
> 
> This seems pretty neat! The cache implementation seems straightforward
> enough. I'll see if I can test it more thoroughly later.
> 
> However, in my opinion we should have a crate like
> "proxmox-collections" (or something of the sort) with modules for each
> data structure / collection similar to the standard library; I'm
> curious what others think about that. imo it would be a great
> opportunity to introduce that crate in this series, since you're
> already introducing one for the cache anyway.
> 
> So, proxmox-collections would look something like this:
> 
>    proxmox-collections
>    └── src
>        ├── cache
>        │   ├── mod.rs
>        │   └── shared_cache.rs
>        └── lib.rs
> 
> Let me know what you think!
> 

I guess this would sense. Not sure if this will gain any other data 
structures soon, but I think going in that direction makes sense.


(...)

>> +    ///
>> +    /// Expired entries will *not* be returned.
>> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
>> +}
> 
> I don't necessarily think that a trait would be necessary in this
> case, as there's not really any other structure (that can be used as
> caching mechanism) that you're abstracting over. (more below)
> 

Yes, you are right. Clear case of premature optimi... refactoring ;)

>> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
>> new file mode 100644
>> index 0000000..be6212c
>> --- /dev/null
>> +++ b/proxmox-cache/src/shared_cache.rs
>> @@ -0,0 +1,263 @@
>> +use std::path::{Path, PathBuf};
>> +
>> +use anyhow::{bail, Error};
>> +use serde::{Deserialize, Serialize};
>> +use serde_json::Value;
>> +
>> +use proxmox_schema::api_types::SAFE_ID_FORMAT;
>> +use proxmox_sys::fs::CreateOptions;
>> +
>> +use crate::{Cache, DefaultTimeProvider, TimeProvider};
>> +
>> +/// A simple, file-backed cache that can be used from multiple processes concurrently.
>> +///
>> +/// Cache entries are stored as individual files inside a base directory. For instance,
>> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
>> +/// the base directory. As the extension implies, the cached data will be stored as a JSON
>> +/// string.
>> +///
>> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
>> +///
>> +/// ## Key Space
>> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
>> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
>> +///
>> +/// ## Concurrency
>> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
>> +/// multiple processes at the same time is safe.
>> +///
>> +/// ## Performance
>> +/// On a tmpfs:
>> +/// ```sh
>> +///   $ cargo run --release --example=performance
>> +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
>> +///   getting 100000 keys took 584.874842ms (5.848µs per key)
>> +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
>> +///
>> +/// Inserting/getting large objects might of course result in lower performance due to the cost
>> +/// of serialization.
>> +/// ```
>> +///
>> +pub struct SharedCache {
>> +    base_path: PathBuf,
>> +    time_provider: Box<dyn TimeProvider>,
>> +    create_options: CreateOptions,
>> +}
> 
> Instead, this should be generic:
> 
> pub struct SharedCache<K, V> { ... }

True, I could: K: AsRef<str> and V: Serialize + Deserialize

But yeah, as this is just an RFC to get feedback for the whole concept,
some of the implementation details are not completely fleshed out, 
partly on purpose and partly due to oversight.

> 
> .. and maybe rename it to SharedFileCache to make it explicit that this
> operates on a file. (but that's more dependent on one's taste tbh)
> 
Actually I originally named it `SharedFileCache`, but then changed it to 
changed it to `SharedCache`, because the former sounds a bit like it 
caches *files* rather than values - at least in my head.

(...)
> ... can be replaced as follows, in order to make it similar to
> std::collections::{HashMap, BTreeMap}:
> 
> impl<K: AsRef<str>> for SharedCache<K, Value> {
>      // Returns old value on successful insert, if given
>      fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn get(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> 
>      fn remove(&self, k: K) -> Result<Option<Value>, Error> {
>          // ...
>      }
> }
> 
> If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
> be added as well, such as remove_entry, retain, clear, etc.
> 

I don't have any hard feelings regarding the naming, but not returning a 
Value from `delete` was a conscious decision - we simply don't need it 
right now. I don't want to deserialize to just throw away the value.
Also, reading *and* deleting at the same time *might* introduce the need 
for file locking - although I'm not completely sure about that yet.

If we ever need a `remove` that also returns the value, we could just 
introduce a second method, e.g. `take`.
> 
>> +
>> +impl SharedCache {
>> +    pub fn new<P: AsRef<Path>>(base_path: P, options: CreateOptions) -> Result<Self, Error> {
>> +        proxmox_sys::fs::create_path(
>> +            base_path.as_ref(),
>> +            Some(options.clone()),
>> +            Some(options.clone()),
>> +        )?;
>> +
>> +        Ok(SharedCache {
>> +            base_path: base_path.as_ref().to_owned(),
>> +            time_provider: Box::new(DefaultTimeProvider),
>> +            create_options: options,
>> +        })
>> +    }
>> +
>> +    fn enforce_safe_key(key: &str) -> Result<(), Error> {
>> +        let safe_id_regex = SAFE_ID_FORMAT.unwrap_pattern_format();
>> +        if safe_id_regex.is_match(key) {
>> +            Ok(())
>> +        } else {
>> +            bail!("invalid key format")
>> +        }
>> +    }
>> +
>> +    fn get_path_for_key(&self, key: &str) -> Result<PathBuf, Error> {
>> +        Self::enforce_safe_key(key)?;
>> +        let mut path = self.base_path.join(key);
>> +        path.set_extension("json");
>> +        Ok(path)
>> +    }
>> +}
>> +
>> +#[derive(Debug, Clone, Serialize, Deserialize)]
>> +struct CachedItem {
>> +    value: Value,
>> +    added_at: i64,
>> +    expires_in: Option<i64>,
>> +}
>> +
> 
> ... and for completion's sake: This can stay, as it's specific to the
> alternative implementation I've written above.
> 
> All in all, I think this would make your implementation more flexible.
> Let me know what you think!
> 


-- 
- Lukas




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-22 10:08   ` Max Carrara
  2023-08-22 11:33     ` Lukas Wagner
@ 2023-08-22 11:56     ` Wolfgang Bumiller
  2023-08-22 13:52       ` Max Carrara
  1 sibling, 1 reply; 19+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 11:56 UTC (permalink / raw)
  To: Max Carrara; +Cc: Proxmox VE development discussion, Lukas Wagner

On Tue, Aug 22, 2023 at 12:08:38PM +0200, Max Carrara wrote:
> On 8/21/23 15:44, Lukas Wagner wrote:
> > For now, it contains a file-backed cache with expiration logic.
> > The cache should be safe to be accessed from multiple processes at
> > once.
> > 
> 
> This seems pretty neat! The cache implementation seems straightforward
> enough. I'll see if I can test it more thoroughly later.
> 
> However, in my opinion we should have a crate like
> "proxmox-collections" (or something of the sort) with modules for each
> data structure / collection similar to the standard library; I'm
> curious what others think about that. imo it would be a great
> opportunity to introduce that crate in this series, since you're
> already introducing one for the cache anyway.
> 
> So, proxmox-collections would look something like this:
> 
>   proxmox-collections
>   └── src
>       ├── cache
>       │   ├── mod.rs
>       │   └── shared_cache.rs
>       └── lib.rs
> 
> Let me know what you think!

Smaller crates help with compile time, I don't want to pull in a big
collections crate if I need one single type of collection from it ;-)

> 
> > The cache stores values in a directory, based on the key.
> > E.g. key "foo" results in a file 'foo.json' in the given base
> > directory. If a new value is set, the file is atomically replaced.
> > The JSON file also contains some metadata, namely 'added_at' and
> > 'expire_in' - they are used for cache expiration.
> > 
> > Note: This cache is not suited to applications that
> >  - Might want to cache huge amounts of data, and/or access the cache
> >    very frequently (due to the overhead of JSON de/serialization)
> >  - Require arbitrary keys - right now, keys are limited by
> >    SAFE_ID_REGEX
> > 
> > The cache was developed for the use in pvestatd, in order to cache
> > e.g. storage plugin status. There, these limitations do not really
> > play any role.
> > 
> > Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> > ---
> >  Cargo.toml                            |   1 +
> >  proxmox-cache/Cargo.toml              |  20 ++
> >  proxmox-cache/examples/performance.rs |  82 ++++++++
> >  proxmox-cache/src/lib.rs              |  40 ++++
> >  proxmox-cache/src/shared_cache.rs     | 263 ++++++++++++++++++++++++++
> >  5 files changed, 406 insertions(+)
> >  create mode 100644 proxmox-cache/Cargo.toml
> >  create mode 100644 proxmox-cache/examples/performance.rs
> >  create mode 100644 proxmox-cache/src/lib.rs
> >  create mode 100644 proxmox-cache/src/shared_cache.rs
> > 
> > diff --git a/Cargo.toml b/Cargo.toml
> > index e334ac1..940e1d0 100644
> > --- a/Cargo.toml
> > +++ b/Cargo.toml
> > @@ -5,6 +5,7 @@ members = [
> >      "proxmox-async",
> >      "proxmox-auth-api",
> >      "proxmox-borrow",
> > +    "proxmox-cache",
> >      "proxmox-client",
> >      "proxmox-compression",
> >      "proxmox-http",
> > diff --git a/proxmox-cache/src/lib.rs b/proxmox-cache/src/lib.rs
> > new file mode 100644
> > index 0000000..d496dc7
> > --- /dev/null
> > +++ b/proxmox-cache/src/lib.rs
> > @@ -0,0 +1,40 @@
> > +use anyhow::Error;
> > +use serde_json::Value;
> > +
> > +pub mod shared_cache;
> > +
> > +pub use shared_cache::SharedCache;
> > +
> > +trait TimeProvider {

Also mentioned off list: this trait will be dropped.
For tests, a `#[cfg(test)]` alternative will be used for fetching the
time.

We just never need a different provider outside of tests, and storing it
as `Box<dyn TimeProvider>` immediately drops all the auto traits from
the cache struct, since trait objects *only* ever get the auto traits
you *explicitly* define.

> > +    /// Returns the current time as a UNIX epoch (second resolution)
> > +    fn now(&self) -> i64;
> > +}
> > +
> > +struct DefaultTimeProvider;
> > +
> > +impl TimeProvider for DefaultTimeProvider {
> > +    fn now(&self) -> i64 {
> > +        proxmox_time::epoch_i64()
> > +    }
> > +}
> > +
> > +pub trait Cache {
> > +    /// Set or insert a cache entry.
> > +    ///
> > +    /// If `expires_in` is set, this entry will expire in the desired number of
> > +    /// seconds.
> > +    fn set<S: AsRef<str>>(
> > +        &self,
> > +        key: S,
> > +        value: Value,
> > +        expires_in: Option<i64>,
> > +    ) -> Result<(), Error>;
> > +
> > +    /// Delete a cache entry.
> > +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error>;
> > +
> > +    /// Get a value from the cache.
> > +    ///
> > +    /// Expired entries will *not* be returned.
> > +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
> > +}
> 
> I don't necessarily think that a trait would be necessary in this
> case, as there's not really any other structure (that can be used as
> caching mechanism) that you're abstracting over. (more below)

Already somewhat mentioned this off list. This should not be a trait for
now. If we ever need one we can still add one.

> 
> > diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
> > new file mode 100644
> > index 0000000..be6212c
> > --- /dev/null
> > +++ b/proxmox-cache/src/shared_cache.rs
> > @@ -0,0 +1,263 @@
> > +use std::path::{Path, PathBuf};
> > +
> > +use anyhow::{bail, Error};
> > +use serde::{Deserialize, Serialize};
> > +use serde_json::Value;
> > +
> > +use proxmox_schema::api_types::SAFE_ID_FORMAT;
> > +use proxmox_sys::fs::CreateOptions;
> > +
> > +use crate::{Cache, DefaultTimeProvider, TimeProvider};
> > +
> > +/// A simple, file-backed cache that can be used from multiple processes concurrently.
> > +///
> > +/// Cache entries are stored as individual files inside a base directory. For instance,
> > +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
> > +/// the base directory. As the extension implies, the cached data will be stored as a JSON
> > +/// string.
> > +///
> > +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
> > +///
> > +/// ## Key Space
> > +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
> > +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
> > +///
> > +/// ## Concurrency
> > +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
> > +/// multiple processes at the same time is safe.
> > +///
> > +/// ## Performance
> > +/// On a tmpfs:
> > +/// ```sh
> > +///   $ cargo run --release --example=performance
> > +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
> > +///   getting 100000 keys took 584.874842ms (5.848µs per key)
> > +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
> > +///
> > +/// Inserting/getting large objects might of course result in lower performance due to the cost
> > +/// of serialization.
> > +/// ```
> > +///
> > +pub struct SharedCache {
> > +    base_path: PathBuf,
> > +    time_provider: Box<dyn TimeProvider>,
> > +    create_options: CreateOptions,
> > +}
> 
> Instead, this should be generic:
> 
> pub struct SharedCache<K, V> { ... }
> 
> .. and maybe rename it to SharedFileCache to make it explicit that this
> operates on a file. (but that's more dependent on one's taste tbh)
> 
> .. and the impl block below ...

The main issue here is that it'll all boil down to one general "json
thingy" because it'll be used from out of perl where if we have multiple
types here to instantiate we'll need to create separate `perlmod
#[export]`s for each and every type we need in our perl code.

OTOH, we can still have the crate side here generic and have the perl-rs
side just instantiate a single `SharedCache<String, serde_json::Value>`...

A generic `K` might make it annoying to work with paths, though, so IMO
that can just stay a string ;-)

> 
> > +
> > +impl Cache for SharedCache {
> > +    fn set<S: AsRef<str>>(
> > +        &self,
> > +        key: S,
> > +        value: Value,

In any case, I'd definitely argue against `Value` here and just go with
a generic `V: Serialize` at least in the methods.

Because why serialize twice? (Once from some struct to Value, then from
Value to json.)

Otherwise we could also just store `&[u8]` and have this be nothing more
than a thing to attach timeouts to bytes in files ;-)

> > +        expires_in: Option<i64>,
> > +    ) -> Result<(), Error> {
> > +        let path = self.get_path_for_key(key.as_ref())?;
> > +        let added_at = self.time_provider.now();
> > +
> > +        let item = CachedItem {
> > +            value,
> > +            added_at,
> > +            expires_in,
> > +        };
> > +
> > +        let serialized = serde_json::to_vec_pretty(&item)?;
> > +
> > +        // Atomically replace file
> > +        proxmox_sys::fs::replace_file(path, &serialized, self.create_options.clone(), true)?;
> > +        Ok(())
> > +    }
> > +
> > +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error> {
> > +        let path = self.get_path_for_key(key.as_ref())?;
> > +        std::fs::remove_file(path)?;
> > +        Ok(())
> > +    }
> > +
> > +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error> {

Same here of course.

> > +        let path = self.get_path_for_key(key.as_ref())?;
> > +
> > +        let value = if let Some(content) = proxmox_sys::fs::file_get_optional_contents(path)? {
> > +            let value: CachedItem = serde_json::from_slice(&content)?;

`CachedItem` would need to be generic, and here you could use
`CachedItem<&'_ serde_json::RawValue>` which *should* just refer to a
subslice in `content` here to defer the payload parsing to later.

This way you avoid at least *some* overhead of the intermediate `Value`
while still being able to access the value's time.

Alternatively we could think of a more efficient format where the added
and expires fields are in a fixed-sized binary header... but it
shouldn't matter much.

> > +
> > +            let now = self.time_provider.now();
> > +
> > +            if let Some(expires_in) = value.expires_in {
> > +                // Check if value is not expired yet. Also do not allow
> > +                // values from the future, in case we have clock jumps
> > +                if value.added_at + expires_in > now && value.added_at <= now {
> > +                    Some(value.value)
> > +                } else {
> > +                    None
> > +                }
> > +            } else {
> > +                Some(value.value)
> > +            }
> > +        } else {
> > +            None
> > +        };
> > +
> > +        Ok(value)
> > +    }
> > +}
> 
> ... can be replaced as follows, in order to make it similar to
> std::collections::{HashMap, BTreeMap}:
> 
> impl<K: AsRef<str>> for SharedCache<K, Value> {
>     // Returns old value on successful insert, if given
>     fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
>         // ...
>     }
> 
>     fn get(&self, k: K) -> Result<Option<Value>, Error> {
>         // ...
>     }
> 
>     fn remove(&self, k: K) -> Result<Option<Value>, Error> {
>         // ...
>     }
> }
> 
> If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
> be added as well, such as remove_entry, retain, clear, etc.

Using the naming found in std sure makes sense.

But please hold off on the `Entry` API for now, that one will bite ya ;-)




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-22 11:33     ` Lukas Wagner
@ 2023-08-22 12:01       ` Wolfgang Bumiller
  0 siblings, 0 replies; 19+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 12:01 UTC (permalink / raw)
  To: Lukas Wagner; +Cc: Max Carrara, Proxmox VE development discussion

On Tue, Aug 22, 2023 at 01:33:44PM +0200, Lukas Wagner wrote:
(...)
> > ... can be replaced as follows, in order to make it similar to
> > std::collections::{HashMap, BTreeMap}:
> > 
> > impl<K: AsRef<str>> for SharedCache<K, Value> {
> >      // Returns old value on successful insert, if given
> >      fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
> >          // ...
> >      }
> > 
> >      fn get(&self, k: K) -> Result<Option<Value>, Error> {
> >          // ...
> >      }
> > 
> >      fn remove(&self, k: K) -> Result<Option<Value>, Error> {
> >          // ...
> >      }
> > }
> > 
> > If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
> > be added as well, such as remove_entry, retain, clear, etc.
> > 
> 
> I don't have any hard feelings regarding the naming, but not returning a
> Value from `delete` was a conscious decision - we simply don't need it right
> now. I don't want to deserialize to just throw away the value.
> Also, reading *and* deleting at the same time *might* introduce the need for
> file locking - although I'm not completely sure about that yet.
> 
> If we ever need a `remove` that also returns the value, we could just
> introduce a second method, e.g. `take`.

Locking might make sense though.

If we want to refresh data, it may make sense to first lock the cache
entry, then re-check the expiration time, and only then do the actual
refresh.
That way, 5 concurrent accesses will cause only a single refresh, since
afterwards they all see the new value.

As for `take` vs locking: You *could* have the first operation be a
`rename` to a random string, then open,delete,read (but then you need
something that cleans up after left-overs when a process gets killed
between the rename & delete.
But I don't quite see the value in this functionality here for now :-)




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache'
  2023-08-22 11:56     ` Wolfgang Bumiller
@ 2023-08-22 13:52       ` Max Carrara
  0 siblings, 0 replies; 19+ messages in thread
From: Max Carrara @ 2023-08-22 13:52 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: Proxmox VE development discussion, Lukas Wagner

On 8/22/23 13:56, Wolfgang Bumiller wrote:
> On Tue, Aug 22, 2023 at 12:08:38PM +0200, Max Carrara wrote:
>> On 8/21/23 15:44, Lukas Wagner wrote:
>>> For now, it contains a file-backed cache with expiration logic.
>>> The cache should be safe to be accessed from multiple processes at
>>> once.
>>>
>>
>> This seems pretty neat! The cache implementation seems straightforward
>> enough. I'll see if I can test it more thoroughly later.
>>
>> However, in my opinion we should have a crate like
>> "proxmox-collections" (or something of the sort) with modules for each
>> data structure / collection similar to the standard library; I'm
>> curious what others think about that. imo it would be a great
>> opportunity to introduce that crate in this series, since you're
>> already introducing one for the cache anyway.
>>
>> So, proxmox-collections would look something like this:
>>
>>   proxmox-collections
>>   └── src
>>       ├── cache
>>       │   ├── mod.rs
>>       │   └── shared_cache.rs
>>       └── lib.rs
>>
>> Let me know what you think!
> 
> Smaller crates help with compile time, I don't want to pull in a big
> collections crate if I need one single type of collection from it ;-)
> 

They do indeed, but in this case I'd prefer a single crate that guards
each collection behind a feature, which has essentially the same effect.

Or as an alternative, let proxmox-collections be a meta-crate that
re-exports smaller crates in its own namespace; let those smaller
crates then be optional dependencies which are only pulled in and compiled
when their respective features are specified[0].

I'm not sure how the latter works in conjunction with deb packaging, however.

That being said, I'm otherwise not opposed against proxmox-cache becoming
its own crate, but I do think that we should group our code logically in some way.

>>
>>> The cache stores values in a directory, based on the key.
>>> E.g. key "foo" results in a file 'foo.json' in the given base
>>> directory. If a new value is set, the file is atomically replaced.
>>> The JSON file also contains some metadata, namely 'added_at' and
>>> 'expire_in' - they are used for cache expiration.
>>>
>>> Note: This cache is not suited to applications that
>>>  - Might want to cache huge amounts of data, and/or access the cache
>>>    very frequently (due to the overhead of JSON de/serialization)
>>>  - Require arbitrary keys - right now, keys are limited by
>>>    SAFE_ID_REGEX
>>>
>>> The cache was developed for the use in pvestatd, in order to cache
>>> e.g. storage plugin status. There, these limitations do not really
>>> play any role.
>>>
>>> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
>>> ---
>>>  Cargo.toml                            |   1 +
>>>  proxmox-cache/Cargo.toml              |  20 ++
>>>  proxmox-cache/examples/performance.rs |  82 ++++++++
>>>  proxmox-cache/src/lib.rs              |  40 ++++
>>>  proxmox-cache/src/shared_cache.rs     | 263 ++++++++++++++++++++++++++
>>>  5 files changed, 406 insertions(+)
>>>  create mode 100644 proxmox-cache/Cargo.toml
>>>  create mode 100644 proxmox-cache/examples/performance.rs
>>>  create mode 100644 proxmox-cache/src/lib.rs
>>>  create mode 100644 proxmox-cache/src/shared_cache.rs
>>>
>>> diff --git a/Cargo.toml b/Cargo.toml
>>> index e334ac1..940e1d0 100644
>>> --- a/Cargo.toml
>>> +++ b/Cargo.toml
>>> @@ -5,6 +5,7 @@ members = [
>>>      "proxmox-async",
>>>      "proxmox-auth-api",
>>>      "proxmox-borrow",
>>> +    "proxmox-cache",
>>>      "proxmox-client",
>>>      "proxmox-compression",
>>>      "proxmox-http",
>>> diff --git a/proxmox-cache/src/lib.rs b/proxmox-cache/src/lib.rs
>>> new file mode 100644
>>> index 0000000..d496dc7
>>> --- /dev/null
>>> +++ b/proxmox-cache/src/lib.rs
>>> @@ -0,0 +1,40 @@
>>> +use anyhow::Error;
>>> +use serde_json::Value;
>>> +
>>> +pub mod shared_cache;
>>> +
>>> +pub use shared_cache::SharedCache;
>>> +
>>> +trait TimeProvider {
> 
> Also mentioned off list: this trait will be dropped.
> For tests, a `#[cfg(test)]` alternative will be used for fetching the
> time.
> 
> We just never need a different provider outside of tests, and storing it
> as `Box<dyn TimeProvider>` immediately drops all the auto traits from
> the cache struct, since trait objects *only* ever get the auto traits
> you *explicitly* define.
> 
>>> +    /// Returns the current time as a UNIX epoch (second resolution)
>>> +    fn now(&self) -> i64;
>>> +}
>>> +
>>> +struct DefaultTimeProvider;
>>> +
>>> +impl TimeProvider for DefaultTimeProvider {
>>> +    fn now(&self) -> i64 {
>>> +        proxmox_time::epoch_i64()
>>> +    }
>>> +}
>>> +
>>> +pub trait Cache {
>>> +    /// Set or insert a cache entry.
>>> +    ///
>>> +    /// If `expires_in` is set, this entry will expire in the desired number of
>>> +    /// seconds.
>>> +    fn set<S: AsRef<str>>(
>>> +        &self,
>>> +        key: S,
>>> +        value: Value,
>>> +        expires_in: Option<i64>,
>>> +    ) -> Result<(), Error>;
>>> +
>>> +    /// Delete a cache entry.
>>> +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error>;
>>> +
>>> +    /// Get a value from the cache.
>>> +    ///
>>> +    /// Expired entries will *not* be returned.
>>> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error>;
>>> +}
>>
>> I don't necessarily think that a trait would be necessary in this
>> case, as there's not really any other structure (that can be used as
>> caching mechanism) that you're abstracting over. (more below)
> 
> Already somewhat mentioned this off list. This should not be a trait for
> now. If we ever need one we can still add one.
> 
>>
>>> diff --git a/proxmox-cache/src/shared_cache.rs b/proxmox-cache/src/shared_cache.rs
>>> new file mode 100644
>>> index 0000000..be6212c
>>> --- /dev/null
>>> +++ b/proxmox-cache/src/shared_cache.rs
>>> @@ -0,0 +1,263 @@
>>> +use std::path::{Path, PathBuf};
>>> +
>>> +use anyhow::{bail, Error};
>>> +use serde::{Deserialize, Serialize};
>>> +use serde_json::Value;
>>> +
>>> +use proxmox_schema::api_types::SAFE_ID_FORMAT;
>>> +use proxmox_sys::fs::CreateOptions;
>>> +
>>> +use crate::{Cache, DefaultTimeProvider, TimeProvider};
>>> +
>>> +/// A simple, file-backed cache that can be used from multiple processes concurrently.
>>> +///
>>> +/// Cache entries are stored as individual files inside a base directory. For instance,
>>> +/// a cache entry with the key 'disk_stats' will result in a file 'disk_stats.json' inside
>>> +/// the base directory. As the extension implies, the cached data will be stored as a JSON
>>> +/// string.
>>> +///
>>> +/// For optimal performance, `SharedCache` should have its base directory in a `tmpfs`.
>>> +///
>>> +/// ## Key Space
>>> +/// Due to the fact that cache keys are being directly used as filenames, they have to match the
>>> +/// following regular expression: `[A-Za-z0-9_][A-Za-z0-9._\-]*`
>>> +///
>>> +/// ## Concurrency
>>> +/// All cache operations are based on atomic file operations, thus accessing/updating the cache from
>>> +/// multiple processes at the same time is safe.
>>> +///
>>> +/// ## Performance
>>> +/// On a tmpfs:
>>> +/// ```sh
>>> +///   $ cargo run --release --example=performance
>>> +///   inserting 100000 keys took 896.609758ms (8.966µs per key)
>>> +///   getting 100000 keys took 584.874842ms (5.848µs per key)
>>> +///   deleting 100000 keys took 247.742702ms (2.477µs per key)
>>> +///
>>> +/// Inserting/getting large objects might of course result in lower performance due to the cost
>>> +/// of serialization.
>>> +/// ```
>>> +///
>>> +pub struct SharedCache {
>>> +    base_path: PathBuf,
>>> +    time_provider: Box<dyn TimeProvider>,
>>> +    create_options: CreateOptions,
>>> +}
>>
>> Instead, this should be generic:
>>
>> pub struct SharedCache<K, V> { ... }
>>
>> .. and maybe rename it to SharedFileCache to make it explicit that this
>> operates on a file. (but that's more dependent on one's taste tbh)
>>
>> .. and the impl block below ...
> 
> The main issue here is that it'll all boil down to one general "json
> thingy" because it'll be used from out of perl where if we have multiple
> types here to instantiate we'll need to create separate `perlmod
> #[export]`s for each and every type we need in our perl code.
> 
> OTOH, we can still have the crate side here generic and have the perl-rs
> side just instantiate a single `SharedCache<String, serde_json::Value>`...
> 

Or perhaps have the perl-rs side wrap `SharedCache<String, Value>` and expose
only the methods that are needed; I don't see the issue here ;-)

> A generic `K` might make it annoying to work with paths, though, so IMO
> that can just stay a string ;-)
> 
>>
>>> +
>>> +impl Cache for SharedCache {
>>> +    fn set<S: AsRef<str>>(
>>> +        &self,
>>> +        key: S,
>>> +        value: Value,
> 
> In any case, I'd definitely argue against `Value` here and just go with
> a generic `V: Serialize` at least in the methods.
> 
> Because why serialize twice? (Once from some struct to Value, then from
> Value to json.)
> 

ACK, that one I hadn't considered. Good point!

> Otherwise we could also just store `&[u8]` and have this be nothing more
> than a thing to attach timeouts to bytes in files ;-)
> 
>>> +        expires_in: Option<i64>,
>>> +    ) -> Result<(), Error> {
>>> +        let path = self.get_path_for_key(key.as_ref())?;
>>> +        let added_at = self.time_provider.now();
>>> +
>>> +        let item = CachedItem {
>>> +            value,
>>> +            added_at,
>>> +            expires_in,
>>> +        };
>>> +
>>> +        let serialized = serde_json::to_vec_pretty(&item)?;
>>> +
>>> +        // Atomically replace file
>>> +        proxmox_sys::fs::replace_file(path, &serialized, self.create_options.clone(), true)?;
>>> +        Ok(())
>>> +    }
>>> +
>>> +    fn delete<S: AsRef<str>>(&self, key: S) -> Result<(), Error> {
>>> +        let path = self.get_path_for_key(key.as_ref())?;
>>> +        std::fs::remove_file(path)?;
>>> +        Ok(())
>>> +    }
>>> +
>>> +    fn get<S: AsRef<str>>(&self, key: S) -> Result<Option<Value>, Error> {
> 
> Same here of course.
> 
>>> +        let path = self.get_path_for_key(key.as_ref())?;
>>> +
>>> +        let value = if let Some(content) = proxmox_sys::fs::file_get_optional_contents(path)? {
>>> +            let value: CachedItem = serde_json::from_slice(&content)?;
> 
> `CachedItem` would need to be generic, and here you could use
> `CachedItem<&'_ serde_json::RawValue>` which *should* just refer to a
> subslice in `content` here to defer the payload parsing to later.
> 
> This way you avoid at least *some* overhead of the intermediate `Value`
> while still being able to access the value's time.
> 
> Alternatively we could think of a more efficient format where the added
> and expires fields are in a fixed-sized binary header... but it
> shouldn't matter much.
> 
>>> +
>>> +            let now = self.time_provider.now();
>>> +
>>> +            if let Some(expires_in) = value.expires_in {
>>> +                // Check if value is not expired yet. Also do not allow
>>> +                // values from the future, in case we have clock jumps
>>> +                if value.added_at + expires_in > now && value.added_at <= now {
>>> +                    Some(value.value)
>>> +                } else {
>>> +                    None
>>> +                }
>>> +            } else {
>>> +                Some(value.value)
>>> +            }
>>> +        } else {
>>> +            None
>>> +        };
>>> +
>>> +        Ok(value)
>>> +    }
>>> +}
>>
>> ... can be replaced as follows, in order to make it similar to
>> std::collections::{HashMap, BTreeMap}:
>>
>> impl<K: AsRef<str>> for SharedCache<K, Value> {
>>     // Returns old value on successful insert, if given
>>     fn insert(&self, k: K, v: Value) -> Result<Option<Value>, Error> {
>>         // ...
>>     }
>>
>>     fn get(&self, k: K) -> Result<Option<Value>, Error> {
>>         // ...
>>     }
>>
>>     fn remove(&self, k: K) -> Result<Option<Value>, Error> {
>>         // ...
>>     }
>> }
>>
>> If necessary / sensible, other methods (inspired by {HashMap, BTreeMap} can
>> be added as well, such as remove_entry, retain, clear, etc.
> 
> Using the naming found in std sure makes sense.
> 
> But please hold off on the `Entry` API for now, that one will bite ya ;-)

[0]: https://doc.rust-lang.org/cargo/reference/features.html#optional-dependencies





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pve-devel] applied: [RFC proxmox 1/7] sys: fs: move tests to a sub-module
  2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
@ 2023-08-30 15:38   ` Thomas Lamprecht
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Lamprecht @ 2023-08-30 15:38 UTC (permalink / raw)
  To: Proxmox VE development discussion, Lukas Wagner

Am 21/08/2023 um 15:44 schrieb Lukas Wagner:
> This ensures that test code is not compiled in regular builds
> 
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
>  proxmox-sys/src/fs/dir.rs | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
>

applied this one already, thanks!




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls
  2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
  2023-08-22 11:25   ` Wolfgang Bumiller
@ 2023-08-30 17:07   ` Wolf Noble
  1 sibling, 0 replies; 19+ messages in thread
From: Wolf Noble @ 2023-08-30 17:07 UTC (permalink / raw)
  To: Proxmox VE development discussion


depends, i suppose, on  ( juice > squeeze ) evaluation…
but I could see ultimately adding flags to the cli / attributes to the api to request fresh/tolerate stale / tolerate stale if howStale < time… allowing for choice… 

one could also imagine implementing mechanism to assert that a cached piece of information is now definitively no longer accurate…   however that’d never catch all the ways the cached data becomes bad data 
but how deep that rabbit hole goes…  is… well… 
i bet there’s likely a few yaks to shave somewhere in the rabbit den too…

hasn’t it been said frequently that there are two really hard problems in programming?
naming things , caching, and off-by-one errors?



[= The contents of this message have been written, read, processed, erased, sorted, sniffed, compressed, rewritten, misspelled, overcompensated, lost, found, and most importantly delivered entirely with recycled electrons =]

> On Aug 22, 2023, at 04:17, Fiona Ebner <f.ebner@proxmox.com> wrote:
> 
> If operations affecting the values like allocation, resize, etc. would
> invalidate the cache, I think we could go for a bit more. But if they
> don't, the limit can't be too high IMHO. Otherwise, users will wonder
> why the usage on the storage doesn't change after their action.
> 
> And would it make sense to have the cache be opt-in? So that only
> pvestatd would use it, but standalone API/CLI calls always get current
> values? If there is invalidation like mentioned above, that might not be
> needed, but otherwise, I'm a bit afraid that handing out (slightly)
> outdated values might trip up some automated scripts doing batch work or
> something


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-08-30 17:08 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 13:44 [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 1/7] sys: fs: move tests to a sub-module Lukas Wagner
2023-08-30 15:38   ` [pve-devel] applied: " Thomas Lamprecht
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 2/7] sys: add make_tmp_dir Lukas Wagner
2023-08-22  8:39   ` Wolfgang Bumiller
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 3/7] sys: fs: remove unnecessary clippy allow directive Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 4/7] cache: add new crate 'proxmox-cache' Lukas Wagner
2023-08-22 10:08   ` Max Carrara
2023-08-22 11:33     ` Lukas Wagner
2023-08-22 12:01       ` Wolfgang Bumiller
2023-08-22 11:56     ` Wolfgang Bumiller
2023-08-22 13:52       ` Max Carrara
2023-08-21 13:44 ` [pve-devel] [RFC proxmox 5/7] cache: add debian packaging Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC proxmox-perl-rs 6/7] cache: add bindings for `SharedCache` from `proxmox-cache` Lukas Wagner
2023-08-21 13:44 ` [pve-devel] [RFC pve-storage 7/7] stats: api: cache storage plugin status Lukas Wagner
2023-08-22  8:51   ` Lukas Wagner
2023-08-22  9:17 ` [pve-devel] [RFC storage/proxmox{, -perl-rs} 0/7] cache storage plugin status for pvestatd/API status update calls Fiona Ebner
2023-08-22 11:25   ` Wolfgang Bumiller
2023-08-30 17:07   ` Wolf Noble

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal