all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
Date: Tue, 11 Nov 2025 13:29:38 +0100	[thread overview]
Message-ID: <20251111122941.110412-1-s.rufinatscha@proxmox.com> (raw)

Hi,

this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.

While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).

The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request, likely related to bcrypt.
Also this should be eventually revisited in a future effort.

## Approach

[PATCH 1/3] Fast path for datastore lookups
Use the shared-memory `ConfigVersionCache` generation for `datastore.cfg`.
Tag each cached `DataStoreImpl` with the last seen generation; when it
matches, reuse the cached instance. Fall back to the existing slow path
on mismatch or when the cache is unavailable.

[PATCH 2/3] Fast path for `Drop`
Reuse the maintenance mode eviction decision captured at lookup time,
removing the config reload from `Drop`.

[PATCH 3/3] TTL to catch manual edits
If a cached entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
(default 60s), the next lookup refreshes it via the slow path. This
detects manual file edits without hashing on every request.

## Results

End-to-end `/status?verbose=0` (1000 stores, 5 req/store, parallel=16):

  Metric                 Baseline   [1/3]    [2/3]
  ------------------------------------------------
  Total time             13s        11s      10s
  Throughput (all)       384.62     454.55   500.00
  Cold RPS (round #1)    76.92      90.91    100.00
  Warm RPS (2..N)        307.69     363.64   400.00

Patch 1 improves overall throughput by ~18% (−15% total time). Patch 2
adds ~10% on top. Patch 3 is a robustness feature; a 0.1 s probe shows
periodic latency spikes at TTL expiry and flat latencies otherwise.

## Reproduction steps

VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
  - scsi0 32G (OS)
  - scsi1 1000G (datastores)

Install PBS from ISO on the VM.

Set up ZFS on /dev/sdb (adjust if different):

    zpool create -f -o ashift=12 pbsbench /dev/sdb
    zfs set mountpoint=/pbsbench pbsbench
    zfs create pbsbench/pbs-bench

Raise file-descriptor limit:

    sudo systemctl edit proxmox-backup-proxy.service

Add the following lines:

    [Service]
    LimitNOFILE=1048576

Reload systemd and restart the proxy:

    sudo systemctl daemon-reload
    sudo systemctl restart proxmox-backup-proxy.service

Verify the limit:

    systemctl show proxmox-backup-proxy.service | grep LimitNOFILE

Create 1000 ZFS-backed datastores (as used in #6049 [1]):

    seq -w 001 1000 | xargs -n1 -P1 bash -c '
      id=$0
      name="ds${id}"
      dataset="pbsbench/pbs-bench/${name}"
      path="/pbsbench/pbs-bench/${name}"
      zfs create -o mountpoint="$path" "$dataset"
      proxmox-backup-manager datastore create "$name" "$path" \
        --comment "ZFS dataset-based datastore"
    '

Build PBS from this series, then run the server under manually
under flamegraph:

    systemctl stop proxmox-backup-proxy
    cargo flamegraph --release --bin proxmox-backup-proxy

Benchmark script (`bench.sh`) used for the numbers above:

    #!/usr/bin/env bash
    set -euo pipefail

    # --- Config ---------------------------------------------------------------
    HOST='https://localhost:8007'
    USER='root@pam'
    PASS="$(cat passfile)"

    DATASTORE_PATH="/pbsbench/pbs-bench"
    MAX_STORES=1000      # how many stores to include
    PARALLEL=16         # concurrent workers
    REPEAT=5            # requests per store (1 cold + REPEAT-1 warm)

    PRINT_FIRST=false   # true => log first request's HTTP code per store

    # --- Helpers --------------------------------------------------------------
    fmt_rps () {
      local n="$1" t="$2"
      awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
    }

    # --- Login ---------------------------------------------------------------
    auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
      -d "username=$USER" -d "password=$PASS")
    ticket=$(echo "$auth" | jq -r '.data.ticket')

    if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
      echo "[ERROR] Login failed (no ticket)"
      exit 1
    fi

    # --- Collect stores (deterministic order) --------------------------------
    mapfile -t STORES < <(
      find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
      | sort | head -n "$MAX_STORES"
    )

    USED_STORES=${#STORES[@]}
    if (( USED_STORES == 0 )); then
      echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
      exit 1
    fi

    echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"

    # --- Temp counters --------------------------------------------------------
    SUCCESS_ALL="$(mktemp)"
    FAIL_ALL="$(mktemp)"
    COLD_OK="$(mktemp)"
    WARM_OK="$(mktemp)"
    trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT

    export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST

    SECONDS=0

    # --- Fire requests --------------------------------------------------------
    printf "%s\n" "${STORES[@]}" \
    | xargs -P"$PARALLEL" -I{} bash -c '
      store="$1"
      url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"

      for ((i=1;i<=REPEAT;i++)); do
        code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)

        if [[ "$code" == "200" ]]; then
          echo 1 >> "$SUCCESS_ALL"
          if (( i == 1 )); then
            echo 1 >> "$COLD_OK"
          else
            echo 1 >> "$WARM_OK"
          fi
          if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
            ts=$(date +%H:%M:%S)
            echo "[$ts] $store #$i HTTP:200"
          fi
        else
          echo 1 >> "$FAIL_ALL"
          if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
            ts=$(date +%H:%M:%S)
            echo "[$ts] $store #$i HTTP:$code (FAIL)"
          fi
        fi
      done
    ' _ {}

    # --- Summary --------------------------------------------------------------
    elapsed=$SECONDS
    ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
    fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
    cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
    warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)

    expected=$(( USED_STORES * REPEAT ))
    total=$(( ok + fail ))

    rps_all=$(fmt_rps "$ok" "$elapsed")
    rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
    rps_warm=$(fmt_rps "$warm_ok" "$elapsed")

    echo "===== Summary ====="
    echo "Stores used:         $USED_STORES"
    echo "Expected requests:   $expected"
    echo "Executed requests:   $total"
    echo "OK (HTTP 200):       $ok"
    echo "Failed:              $fail"
    printf "Total time:          %dm %ds\n" $((elapsed/60)) $((elapsed%60))
    echo "Throughput all RPS:  $rps_all"
    echo "Cold RPS (round #1): $rps_cold"
    echo "Warm RPS (#2..N):    $rps_warm"

## Maintainer notes

- No dependency bumps, no API changes, no breaking changes in this
  series.

## Patch summary

[PATCH 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 2/3] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 3/3] datastore: add TTL fallback to catch manual config edits

Thanks for reviewing!

[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph

Samuel Rufinatscha (3):
  partial fix #6049: datastore: impl ConfigVersionCache fast path for
    lookups
  partial fix #6049: datastore: use config fast-path in Drop
  datastore: add TTL fallback to catch manual config edits

 pbs-config/src/config_version_cache.rs |  10 ++-
 pbs-datastore/src/datastore.rs         | 119 ++++++++++++++++++-------
 2 files changed, 96 insertions(+), 33 deletions(-)

-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

             reply	other threads:[~2025-11-11 12:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 12:29 Samuel Rufinatscha [this message]
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-12 13:24   ` Fabian Grünbichler
2025-11-13 12:59     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24   ` Fabian Grünbichler
2025-11-12 15:20     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27   ` Samuel Rufinatscha
2025-11-14 15:08 ` [pbs-devel] superseded: " Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251111122941.110412-1-s.rufinatscha@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal