From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>,
Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 6/6] api: admin: datastore: implement streaming content api call
Date: Fri, 3 Oct 2025 13:55:41 +0200 [thread overview]
Message-ID: <df3e0980-4e3c-43eb-ab66-3865821bc872@proxmox.com> (raw)
In-Reply-To: <20251003085045.1346864-8-d.csapak@proxmox.com>
Am 03.10.25 um 10:51 schrieb Dominik Csapak:
> this is a new api call that utilizes `async-stream` together with
> `proxmox_router::Stream` to provide a streaming interface to querying
> the datastore content.
>
> This can be done when a client reuqests this api call with the
> `application/json-seq` Accept header.
>
> In contrast to the existing api calls, this one
> * returns all types of content items (namespaces, groups, snapshots; can
> be filtered with a parameter)
> * iterates over them recursively (with the range that is given with the
> parameter)
>
> The api call returns the data in the following order:
> * first all visible namespaces
> * then for each ns in order
> * each group
> * each snapshot
>
> This is done so that we can have a good way of building a tree view in
> the ui.
>
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> This should be thouroughly checked for permission checks. I did it to
> the best of my ability, but of course some bug/issue could have crept in.
>
> interesting side node, in my rather large setup with ~600 groups and ~1000
> snapshosts per group, streaming this is faster than using the current
> `snapshot` api (by a lot):
> * `snapshot` api -> ~3 min
> * `content` api with streaming -> ~2:11 min
> * `content` api without streaming -> ~3 min
>
> It seems that either collecting such a 'large' api response (~200MiB)
> is expensive. My guesses what happens here are either:
> * frequent (re)allocation of the resulting vec
> * or serde's serializing code
You could compare peak (RSS) memory usage of the daemon as side-effect,
and/or also use bpftrace to log bigger allocations. While I did use bpftrace
lots of times, I did not try this specifically to rust, but I found a
shorth'ish article that describes doing just that for rust, and looks like
it would not be _that_ much work (and could be a nice tool to have in the
belt in the future):
https://readyset.io/blog/tracing-large-memory-allocations-in-rust-with-bpftrace
> but the cost seems still pretty high for that.
> LMK if i should further investigate this.
tbh. if this holds up in an in-depth review, especially at priv checking like
you mentioned, I'm fine with taking it as is; mostly mentioned above as it would
be interesting for a deeper understanding, and in my experience especially
bpftrace is often quite widely applicable, so it can be worth spending some time
playing around with it during "calmer times" ^^
Am 03.10.25 um 10:51 schrieb Dominik Csapak:
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 2252dcfa4..bf94f6400 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -23,7 +23,7 @@ use proxmox_compression::zstd::ZstdEncoder;
> use proxmox_log::LogContext;
> use proxmox_router::{
> http_err, list_subdirs_api_method, ApiHandler, ApiMethod, ApiResponseFuture, Permission,
> - Router, RpcEnvironment, RpcEnvironmentType, SubdirMap,
> + Record, Router, RpcEnvironment, RpcEnvironmentType, SubdirMap,
> };
> use proxmox_rrd_api_types::{RrdMode, RrdTimeframe};
> use proxmox_schema::*;
> @@ -39,15 +39,16 @@ use pxar::EntryKind;
>
> use pbs_api_types::{
> print_ns_and_snapshot, print_store_and_ns, ArchiveType, Authid, BackupArchiveName,
> - BackupContent, BackupGroupDeleteStats, BackupNamespace, BackupType, Counts, CryptMode,
> - DataStoreConfig, DataStoreListItem, DataStoreMountStatus, DataStoreStatus,
> - GarbageCollectionJobStatus, GroupListItem, JobScheduleStatus, KeepOptions, MaintenanceMode,
> - MaintenanceType, Operation, PruneJobOptions, SnapshotListItem, SyncJobConfig,
> - BACKUP_ARCHIVE_NAME_SCHEMA, BACKUP_ID_SCHEMA, BACKUP_NAMESPACE_SCHEMA, BACKUP_TIME_SCHEMA,
> - BACKUP_TYPE_SCHEMA, CATALOG_NAME, CLIENT_LOG_BLOB_NAME, DATASTORE_SCHEMA,
> - IGNORE_VERIFIED_BACKUPS_SCHEMA, MAX_NAMESPACE_DEPTH, NS_MAX_DEPTH_SCHEMA, PRIV_DATASTORE_AUDIT,
> - PRIV_DATASTORE_BACKUP, PRIV_DATASTORE_MODIFY, PRIV_DATASTORE_PRUNE, PRIV_DATASTORE_READ,
> - PRIV_DATASTORE_VERIFY, PRIV_SYS_MODIFY, UPID, UPID_SCHEMA, VERIFICATION_OUTDATED_AFTER_SCHEMA,
> + BackupContent, BackupGroupDeleteStats, BackupNamespace, BackupType, ContentListItem,
> + ContentType, Counts, CryptMode, DataStoreConfig, DataStoreListItem, DataStoreMountStatus,
> + DataStoreStatus, GarbageCollectionJobStatus, GroupListItem, JobScheduleStatus, KeepOptions,
> + MaintenanceMode, MaintenanceType, NamespaceListItem, Operation, PruneJobOptions,
> + SnapshotListItem, SyncJobConfig, BACKUP_ARCHIVE_NAME_SCHEMA, BACKUP_ID_SCHEMA,
> + BACKUP_NAMESPACE_SCHEMA, BACKUP_TIME_SCHEMA, BACKUP_TYPE_SCHEMA, CATALOG_NAME,
> + CLIENT_LOG_BLOB_NAME, DATASTORE_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA, MAX_NAMESPACE_DEPTH,
> + NS_MAX_DEPTH_SCHEMA, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_BACKUP, PRIV_DATASTORE_MODIFY,
> + PRIV_DATASTORE_PRUNE, PRIV_DATASTORE_READ, PRIV_DATASTORE_VERIFY, PRIV_SYS_MODIFY, UPID,
> + UPID_SCHEMA, VERIFICATION_OUTDATED_AFTER_SCHEMA,
oof. Would be probably good to split this use statement into multiple ones, e.g.
one for PRIV_* one for other const UPPERCASE thingies, and then maybe one for
Backup* types and one for DataStore* and the rest.
While one can diff per word to see what's going on, this still causes lot's of
churn for applying/merging if anything happened in between and history (blame).
But doesn't have to be the job of you and this patch series, I'm just venting
it.
> };
> use pbs_client::pxar::{create_tar, create_zip};
> use pbs_config::CachedUserInfo;
> @@ -70,7 +71,10 @@ use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>
> use crate::api2::backup::optional_ns_param;
> use crate::api2::node::rrd::create_value_from_rrd;
> -use crate::backup::{check_ns_privs_full, ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK};
> +use crate::backup::{
> + can_access_any_namespace_in_range, check_ns_privs, check_ns_privs_full,
> + ListAccessibleBackupGroups, VerifyWorker, NS_PRIVS_OK,
> +};
> use crate::server::jobstate::{compute_schedule_status, Job, JobState};
> use crate::tools::{backup_info_to_snapshot_list_item, get_all_snapshot_files, read_backup_index};
>
> @@ -396,7 +400,7 @@ pub async fn delete_snapshot(
> }
>
> #[api(
> - serializing: true,
> + stream: true,
> input: {
> properties: {
> store: { schema: DATASTORE_SCHEMA },
> @@ -404,40 +408,137 @@ pub async fn delete_snapshot(
> type: BackupNamespace,
> optional: true,
> },
> - "backup-type": {
> + "max-depth": {
> + schema: NS_MAX_DEPTH_SCHEMA,
> optional: true,
> - type: BackupType,
> },
> - "backup-id": {
> + "content-type": {
> optional: true,
> - schema: BACKUP_ID_SCHEMA,
> + type: ContentType,
> },
> },
> },
> - returns: pbs_api_types::ADMIN_DATASTORE_LIST_SNAPSHOTS_RETURN_TYPE,
> access: {
> permission: &Permission::Anybody,
> description: "Requires on /datastore/{store}[/{namespace}] either DATASTORE_AUDIT for any \
> or DATASTORE_BACKUP and being the owner of the group",
> },
> )]
> -/// List backup snapshots.
> -pub async fn list_snapshots(
> +/// List datastore content, recursively through all namespaces.
> +pub async fn list_content(
> store: String,
> ns: Option<BackupNamespace>,
> - backup_type: Option<BackupType>,
> - backup_id: Option<String>,
> + max_depth: Option<usize>,
> + content_type: Option<ContentType>,
> _param: Value,
> _info: &ApiMethod,
> rpcenv: &mut dyn RpcEnvironment,
> -) -> Result<Vec<SnapshotListItem>, Error> {
> +) -> Result<proxmox_router::Stream, Error> {
> + let (sender, mut receiver) = tokio::sync::mpsc::channel(128);
> +
> let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
> + let user_info = CachedUserInfo::new()?;
>
> - tokio::task::spawn_blocking(move || unsafe {
> - list_snapshots_blocking(store, ns, backup_type, backup_id, auth_id)
> - })
> - .await
> - .map_err(|err| format_err!("failed to await blocking task: {err}"))?
> + let datastore = DataStore::lookup_datastore(&store, Some(Operation::Read))?;
> + if !can_access_any_namespace_in_range(
> + datastore.clone(),
> + &auth_id,
> + &user_info,
> + ns.clone(),
> + max_depth,
> + ) {
> + proxmox_router::http_bail!(FORBIDDEN, "permission check failed");
> + }
> +
> + let ns = ns.unwrap_or_default();
> +
> + let (list_ns, list_group, list_snapshots) = match content_type {
> + Some(ContentType::Namespace) => (true, false, false),
> + Some(ContentType::Group) => (false, true, false),
> + Some(ContentType::Snapshot) => (false, false, true),
> + None => (true, true, true),
> + };
Hmm, might it make sense to have a filter param with a flag per type?
So that one can choose to include groups and snapshots, but not
namespaces? Albeit, here it's not really a filter in the classical
sense, as besides for skipping snapshot it basically only affects
namespaces or groups that are empty, otherwise the info is there
indirectly anyway.
OTOH. most use cases might just use max-depth to return everything
from a single level and load the rest on demand/select.
So might be an option to skip this param for now, but maybe someone
else has better input or arguments here.
> +
> + tokio::spawn(async move {
Is this really needed? The spawn blocking below already moves the
function to a thread dedicated for stuff that can block, so this seems
like a needless indirection, or am I overlooking something?
> + tokio::task::spawn_blocking(move || {
Looked at the rest below more shallowly, nothing stuck out, but would
indeed warrant a more in-depth review.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-10-03 11:56 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-03 8:50 [pbs-devel] [PATCH proxmox{, -backup} 0/7] introduce " Dominik Csapak
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox 1/1] pbs-api-types: add api types for " Dominik Csapak
2025-10-07 8:59 ` Wolfgang Bumiller
2025-10-08 6:41 ` Dominik Csapak
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 1/6] backup: hierarchy: add new can_access_any_namespace_in_range helper Dominik Csapak
2025-10-03 9:52 ` Thomas Lamprecht
2025-10-03 10:10 ` Dominik Csapak
2025-10-03 10:21 ` Thomas Lamprecht
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 2/6] backup: hierarchy: reuse 'NS_PRIVS_OK' for namespace helper Dominik Csapak
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 3/6] api: admin: datastore: refactor BackupGroup to GroupListItem conversion Dominik Csapak
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 4/6] api: admin: datastore: factor out 'get_group_owner' Dominik Csapak
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 5/6] api: admin: datastore: optimize `groups` api call Dominik Csapak
2025-10-03 10:18 ` Thomas Lamprecht
2025-10-03 10:51 ` Dominik Csapak
2025-10-03 12:37 ` Thomas Lamprecht
2025-10-03 8:50 ` [pbs-devel] [PATCH proxmox-backup 6/6] api: admin: datastore: implement streaming content " Dominik Csapak
2025-10-03 11:55 ` Thomas Lamprecht [this message]
2025-10-07 12:51 ` Wolfgang Bumiller
2025-10-07 14:22 ` Thomas Lamprecht
2025-10-07 14:31 ` Wolfgang Bumiller
2025-10-07 15:05 ` Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df3e0980-4e3c-43eb-ab66-3865821bc872@proxmox.com \
--to=t.lamprecht@proxmox.com \
--cc=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.