public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] superseded: [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
Date: Fri, 14 Nov 2025 16:08:30 +0100	[thread overview]
Message-ID: <84a63b00-9964-45a9-bb07-590f98239a21@proxmox.com> (raw)
In-Reply-To: <20251111122941.110412-1-s.rufinatscha@proxmox.com>

https://lore.proxmox.com/pbs-devel/20251114150544.224839-1-s.rufinatscha@proxmox.com/T/#t

On 11/11/25 1:29 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
> 
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
> 
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request, likely related to bcrypt.
> Also this should be eventually revisited in a future effort.
> 
> ## Approach
> 
> [PATCH 1/3] Fast path for datastore lookups
> Use the shared-memory `ConfigVersionCache` generation for `datastore.cfg`.
> Tag each cached `DataStoreImpl` with the last seen generation; when it
> matches, reuse the cached instance. Fall back to the existing slow path
> on mismatch or when the cache is unavailable.
> 
> [PATCH 2/3] Fast path for `Drop`
> Reuse the maintenance mode eviction decision captured at lookup time,
> removing the config reload from `Drop`.
> 
> [PATCH 3/3] TTL to catch manual edits
> If a cached entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
> (default 60s), the next lookup refreshes it via the slow path. This
> detects manual file edits without hashing on every request.
> 
> ## Results
> 
> End-to-end `/status?verbose=0` (1000 stores, 5 req/store, parallel=16):
> 
>    Metric                 Baseline   [1/3]    [2/3]
>    ------------------------------------------------
>    Total time             13s        11s      10s
>    Throughput (all)       384.62     454.55   500.00
>    Cold RPS (round #1)    76.92      90.91    100.00
>    Warm RPS (2..N)        307.69     363.64   400.00
> 
> Patch 1 improves overall throughput by ~18% (−15% total time). Patch 2
> adds ~10% on top. Patch 3 is a robustness feature; a 0.1 s probe shows
> periodic latency spikes at TTL expiry and flat latencies otherwise.
> 
> ## Reproduction steps
> 
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
>    - scsi0 32G (OS)
>    - scsi1 1000G (datastores)
> 
> Install PBS from ISO on the VM.
> 
> Set up ZFS on /dev/sdb (adjust if different):
> 
>      zpool create -f -o ashift=12 pbsbench /dev/sdb
>      zfs set mountpoint=/pbsbench pbsbench
>      zfs create pbsbench/pbs-bench
> 
> Raise file-descriptor limit:
> 
>      sudo systemctl edit proxmox-backup-proxy.service
> 
> Add the following lines:
> 
>      [Service]
>      LimitNOFILE=1048576
> 
> Reload systemd and restart the proxy:
> 
>      sudo systemctl daemon-reload
>      sudo systemctl restart proxmox-backup-proxy.service
> 
> Verify the limit:
> 
>      systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
> 
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
> 
>      seq -w 001 1000 | xargs -n1 -P1 bash -c '
>        id=$0
>        name="ds${id}"
>        dataset="pbsbench/pbs-bench/${name}"
>        path="/pbsbench/pbs-bench/${name}"
>        zfs create -o mountpoint="$path" "$dataset"
>        proxmox-backup-manager datastore create "$name" "$path" \
>          --comment "ZFS dataset-based datastore"
>      '
> 
> Build PBS from this series, then run the server under manually
> under flamegraph:
> 
>      systemctl stop proxmox-backup-proxy
>      cargo flamegraph --release --bin proxmox-backup-proxy
> 
> Benchmark script (`bench.sh`) used for the numbers above:
> 
>      #!/usr/bin/env bash
>      set -euo pipefail
> 
>      # --- Config ---------------------------------------------------------------
>      HOST='https://localhost:8007'
>      USER='root@pam'
>      PASS="$(cat passfile)"
> 
>      DATASTORE_PATH="/pbsbench/pbs-bench"
>      MAX_STORES=1000      # how many stores to include
>      PARALLEL=16         # concurrent workers
>      REPEAT=5            # requests per store (1 cold + REPEAT-1 warm)
> 
>      PRINT_FIRST=false   # true => log first request's HTTP code per store
> 
>      # --- Helpers --------------------------------------------------------------
>      fmt_rps () {
>        local n="$1" t="$2"
>        awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
>      }
> 
>      # --- Login ---------------------------------------------------------------
>      auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
>        -d "username=$USER" -d "password=$PASS")
>      ticket=$(echo "$auth" | jq -r '.data.ticket')
> 
>      if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
>        echo "[ERROR] Login failed (no ticket)"
>        exit 1
>      fi
> 
>      # --- Collect stores (deterministic order) --------------------------------
>      mapfile -t STORES < <(
>        find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
>        | sort | head -n "$MAX_STORES"
>      )
> 
>      USED_STORES=${#STORES[@]}
>      if (( USED_STORES == 0 )); then
>        echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
>        exit 1
>      fi
> 
>      echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
> 
>      # --- Temp counters --------------------------------------------------------
>      SUCCESS_ALL="$(mktemp)"
>      FAIL_ALL="$(mktemp)"
>      COLD_OK="$(mktemp)"
>      WARM_OK="$(mktemp)"
>      trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
> 
>      export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
> 
>      SECONDS=0
> 
>      # --- Fire requests --------------------------------------------------------
>      printf "%s\n" "${STORES[@]}" \
>      | xargs -P"$PARALLEL" -I{} bash -c '
>        store="$1"
>        url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
> 
>        for ((i=1;i<=REPEAT;i++)); do
>          code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
> 
>          if [[ "$code" == "200" ]]; then
>            echo 1 >> "$SUCCESS_ALL"
>            if (( i == 1 )); then
>              echo 1 >> "$COLD_OK"
>            else
>              echo 1 >> "$WARM_OK"
>            fi
>            if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
>              ts=$(date +%H:%M:%S)
>              echo "[$ts] $store #$i HTTP:200"
>            fi
>          else
>            echo 1 >> "$FAIL_ALL"
>            if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
>              ts=$(date +%H:%M:%S)
>              echo "[$ts] $store #$i HTTP:$code (FAIL)"
>            fi
>          fi
>        done
>      ' _ {}
> 
>      # --- Summary --------------------------------------------------------------
>      elapsed=$SECONDS
>      ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
>      fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
>      cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
>      warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
> 
>      expected=$(( USED_STORES * REPEAT ))
>      total=$(( ok + fail ))
> 
>      rps_all=$(fmt_rps "$ok" "$elapsed")
>      rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
>      rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
> 
>      echo "===== Summary ====="
>      echo "Stores used:         $USED_STORES"
>      echo "Expected requests:   $expected"
>      echo "Executed requests:   $total"
>      echo "OK (HTTP 200):       $ok"
>      echo "Failed:              $fail"
>      printf "Total time:          %dm %ds\n" $((elapsed/60)) $((elapsed%60))
>      echo "Throughput all RPS:  $rps_all"
>      echo "Cold RPS (round #1): $rps_cold"
>      echo "Warm RPS (#2..N):    $rps_warm"
> 
> ## Maintainer notes
> 
> - No dependency bumps, no API changes, no breaking changes in this
>    series.
> 
> ## Patch summary
> 
> [PATCH 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 2/3] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 3/3] datastore: add TTL fallback to catch manual config edits
> 
> Thanks for reviewing!
> 
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> 
> Samuel Rufinatscha (3):
>    partial fix #6049: datastore: impl ConfigVersionCache fast path for
>      lookups
>    partial fix #6049: datastore: use config fast-path in Drop
>    datastore: add TTL fallback to catch manual config edits
> 
>   pbs-config/src/config_version_cache.rs |  10 ++-
>   pbs-datastore/src/datastore.rs         | 119 ++++++++++++++++++-------
>   2 files changed, 96 insertions(+), 33 deletions(-)
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

      parent reply	other threads:[~2025-11-14 15:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 12:29 [pbs-devel] " Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-12 13:24   ` Fabian Grünbichler
2025-11-13 12:59     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24   ` Fabian Grünbichler
2025-11-12 15:20     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27   ` Samuel Rufinatscha
2025-11-14 15:08 ` Samuel Rufinatscha [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84a63b00-9964-45a9-bb07-590f98239a21@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal