[pdm-devel] [PATCH proxmox-datacenter-manager v7 0/7] remote task cache fetching task / better cache backend

* [pdm-devel] [PATCH proxmox-datacenter-manager v7 0/7] remote task cache fetching task / better cache backend
@ 2025-08-20 12:43 Lukas Wagner
  2025-08-20 12:43 ` [pdm-devel] [PATCH proxmox-datacenter-manager v7 1/7] remote tasks: implement improved cache for remote tasks Lukas Wagner
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Lukas Wagner @ 2025-08-20 12:43 UTC (permalink / raw)
  To: pdm-devel

The aim of this patch series is to greatly improve the performance of the
remote task cache for big PDM setups.

The inital cache implementation had the following problems:
  1.) cache was populated as part of the `get_tasks` API, leading to hanging
    API calls while fetching task data from remotes
  2.) all tasks were stored in a single file, which was completely rewritten
    for any change to the cache's contents
  3.) The caching mechanism was pretty simple, using only a max-age mechanism,
    re-requesting all task data if max-age was exceeded

Now, these characteristics are not really problematic for *small* PDM setups
with only a couple of remotes. However, for big setups (e.g. 100 remotes, each
remote being a PVE cluster with 10 nodes), this completely falls apart:
  1.) fetching remote tasks takes considerable amount of time, especially
      on connections with a high latency. Since the data is requested
      from *within* the `get_tasks` function, which is called by the
      `remote-tasks/list` API handler, the API call is blocked until
     *all* task data is requested.
  2.) The single file approach leads to significant writes to the disk
  3.) Leads to unnecessary network IO, as we re-request data that we
      already have locally.

To rectify the situation, this series performs the following changes:

  - `get_tasks` never does any fetching, it only reads the most recent
    data from the cache
  - There is a new background task which periodically fetches tasks
    from all remotes (every 10mins at the moment). Only the latest
    missing tasks are requested, not the full task history as before
  - The new background task also takes over the 'tracked task' polling
    duty, where we fetch the status for any task started by PDM on
    a remote (short polling interval, 10s at the moment).
  - The task cache storage implementation has been completely overhauled
    and is now optimized for the most common accesses to the cache.
    It is also more storage efficient, occupying rougly 50% of the disk
    space for the same number of tasks (achieved by avoiding duplicate
    information in the files)
  - The size of the task cache is 'limited' by doing file rotation.
    We keep 7 days of task history.

For details on *how* the cache itself works, please refer to the full
commit message of
    remote tasks: implement improved cache for remote tasks

# Benchmarks

Finally, some concrete data to back up the claimed performance improvements. The
times were measured *inside* the `get_tasks` function and not at the API level,
so the times do not include JSON serialization and data transfer.

Benchmarking was done using the 'fake-remote' feature. There were 100 remotes,
10 PVE nodes per remote. The task cache contained about 1.5 million tasks.

                                              before       v5      v6 (journal, zstd)
list of active tasks (*):                     ~1.3s       ~300µs   ~300µs
list of 500 tasks, offset 0 (**):             ~1.3s      ~1.45ms   ~1.5ms
list of 500 tasks, offset 1 million (***):    ~1.3s       ~175ms   ~200ms
list of 500 tasks, offset 0, 
    2000 tasks in journal (****):                                  ~4.5ms
Size on disk:                                 ~500MB      ~200MB   ~40MB

(*):  Requested by the UI every 3s
(**): Requested by the UI when visiting Remotes > Tasks
(***): E.g. when scrolling towars the bottom of 'Remotes > Tasks'
(****): e.g. when the journal has not been applied for a while. Reading tasks
       from the journal is a bit less efficient than from the task archive, since
       we have to fully load it into memory so that we can sort the tasks and
       also  remove potential duplicates

In the old implementation, the archive file was *always* fully deserialized and
loaded into RAM, this is the reason why the time needed is pretty idential for
all scenarios.
The new implementation reads the archive files only line by line, and only 500
tasks were loaded into RAM at the same time. The higher the offset, the more
archive lines/files we have to scan, which increases the time needed to access
the data. The tasks are sorted descending by starttime, as a result the
requests get slower the further you go back in history.

The 'before' times do NOT include the time needed for actually fetching the
task data.

This series was preseded by [1], however almost all of the code has changes,
which is the reason why I send this as a new series.

[1] https://lore.proxmox.com/pdm-devel/20250128122520.167796-1-l.wagner@proxmox.com/

Changes since v6:
  - Incorporate additional review feedback from @Dominik:
     - Log error in case of task panic
     - create `active` file when `init` is called, this avoids an error in the
       logs when calling the tasks API before the first task fetching round
     - minor style suggestions/fixes

Changes since v5:
  - Incorporate review feedback from @Dominik:
     - Poll tracked tasks individually instead of doing a full task refresh with the
       oldest running task as cutoff. This should be much more efficient
       for long-running tasks.
     - Change state-file representation
     - Improved some doc comments
     - Use timestamps instead of cycle counter for the fetching task
     - make total connection semaphore allocation more efficient
     - Use dedicated types for (read/write)-locked task cache, encoding
       the locking requirements in Rust's type system. Neat!
  - Keep track of cut-off typestamps per node, not per remote.
     - This makes sure that we don't refetch tasks that we already have
       if one node in a cluster is offline for a longer period of time
  - Instead of writing new task directly into the archive files, append
    them to a journal/write-ahead-log file, which is then applied in regular intervals.
    This should reduce disk writes, since every single time an archive file is
    changed, it has to be completely rewritten (tasks might arrive out-of-order and
    the contents of the archive are sorted by the task's starttime). The journal allows
    us write more tasks at once.
  - Compress older archive files using zstd - this greatly reduces disk usage
    of task data

Changes since v4:
  - Rebased onto latest master, adapting to Gabriel's section config changes

Changes since v3:
  - Include benchmark results in commit message
  - Remove unneeded and potentially unsafe `pub` (thx Wolfgang)

Changes since v2:
  - Change locking approach as suggested by Wolfgang
  - Incorporated feedback from Wolfang
     - see patch notes for details
  - Added some .context/.with_context for better error messages

Changes since v1:
  - Drop already applied patches
  - Some code style improvents, see individual patch changelogs
  - Move tack fetching task to bin/proxmox-datacenter-api/tasks/remote_task.rs
  - Make sure that remote_tasks::get_tasks does not block the async executor

proxmox-datacenter-manager:

Lukas Wagner (7):
  remote tasks: implement improved cache for remote tasks
  remote tasks: add background task for task polling, use new task cache
  pdm-api-types: remote tasks: add new_from_str constructor for
    TaskStateType
  fake remote: make the fake_remote feature compile again
  fake remote: clippy fixes
  remote tasks: task cache: create `active` file in init
  remote tasks: log error in case of task panic, instead of cancelling
    all tasks

 Cargo.toml                                    |    2 +-
 lib/pdm-api-types/src/lib.rs                  |   15 +
 server/Cargo.toml                             |    1 +
 server/src/api/pve/lxc.rs                     |   10 +-
 server/src/api/pve/mod.rs                     |    4 +-
 server/src/api/pve/qemu.rs                    |    6 +-
 server/src/api/remote_tasks.rs                |   11 +-
 server/src/bin/proxmox-datacenter-api/main.rs |    1 +
 .../bin/proxmox-datacenter-api/tasks/mod.rs   |    1 +
 .../tasks/remote_tasks.rs                     |  560 +++++++
 server/src/remote_tasks/mod.rs                |  632 ++-----
 server/src/remote_tasks/task_cache.rs         | 1491 +++++++++++++++++
 server/src/test_support/fake_remote.rs        |   39 +-
 13 files changed, 2234 insertions(+), 539 deletions(-)
 create mode 100644 server/src/bin/proxmox-datacenter-api/tasks/remote_tasks.rs
 create mode 100644 server/src/remote_tasks/task_cache.rs

Summary over all repositories:
  13 files changed, 2234 insertions(+), 539 deletions(-)

-- 
Generated by murpp 0.9.0

_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread