Re: [pve-devel] [PATCH manager v2] api: implement node-independent bulk actions

From: Dominik Csapak <d.csapak@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH manager v2] api: implement node-independent bulk actions
Date: Fri, 14 Nov 2025 10:16:20 +0100	[thread overview]
Message-ID: <c9ff0657-de5f-431a-b493-ea138bd36ad0@proxmox.com> (raw)
In-Reply-To: <176310914957.64802.13109309341568645230@yuna.proxmox.com>

thanks for the review!

just a few questions inline

On 11/14/25 9:32 AM, Fabian Grünbichler wrote:
> Quoting Dominik Csapak (2025-08-14 13:26:59)
[snip]
>> +sub create_client {
>> +    my ($authuser, $request_timeout) = @_;
>> +    my ($user, undef) = PVE::AccessControl::split_tokenid($authuser, 1);
>> +
>> +    # TODO: How to handle Tokens?
> 
> not like below for sure ;) we'd need to make it queriable using the
> RPCEnvironment (and store it there) I guess? maybe opt-in so the storing only
> happens for certain API handlers (e.g., these ones here for a start)?
> 
> this basically escalates from the token to a ticket of the user, which is a
> nogo even if you duplicate the current set of privilege checks here, as that is
> just waiting to get out of sync

Just to clarify: what's the worst that could happen with this?

the API call checks if the token has the correct permissions for all 
resources, and the user can't have less permissions at that point.
So 'upgrading' the token to a full user ticket for the context
of these changes does not inherently break anything?
(Except I'm missing something?)

There is a TOCTOU issue of course, but IMHO it would not that big
of a deal since this can only happen if the permissions change
during the running task, and one could argue that the token
had permission to do these changes when starting the api call,
so it should be able to finish it.

I modeled this after we do the metrics export api call
in PVE/API2/Cluster/MetricServer.pm where we do the same
(or similar), so we probably should change it there as well then?

Anyway, I'll see that we extend the rpcenv to be able to do that,
but how should we mark the api call? in a special way like we do
with 'protected' ? (so e.g. '"save-token-info" => 1,' in the api
call (with a better name ofc))

> 
>> +    my $ticket = PVE::AccessControl::assemble_ticket($user || $authuser);
>> +    my $csrf_token = PVE::AccessControl::assemble_csrf_prevention_token($user || $authuser);
>> +
>> +    my $node = PVE::INotify::nodename();
>> +    my $fingerprint = PVE::Cluster::get_node_fingerprint($node);
>> +
>> +    my $conn_args = {
>> +        protocol => 'https',
>> +        host => 'localhost', # always call the api locally, let pveproxy handle the proxying
>> +        port => 8006,
>> +        ticket => $ticket,
>> +        timeout => $request_timeout // 25, # default slightly shorter than the proxy->daemon timeout
>> +        cached_fingerprints => {
>> +            $fingerprint => 1,
>> +        },
>> +    };
>> +
>> +    my $api_client = PVE::APIClient::LWP->new($conn_args->%*);
>> +    if (defined($csrf_token)) {
>> +        $api_client->update_csrftoken($csrf_token);
>> +    }
>> +
>> +    return $api_client;
> 
> this client doesn't automatically refresh the ticket before it expires, so
> bigger sets of bulk actions that take more than 2h will always fail..
> 

true, so i'd abstract away a 'make_request' function that checks
if we should renew the ticket and always use that?

>> +}
>> +
>> +# starts and awaits a task for each guest given via $startlist.
>> +#
>> +# takes a vm list in the form of
>> +# {
>> +#     0 => {
>> +#         100 => { .. guest info ..},
>> +#         101 => { .. guest info ..},
>> +#     },
>> +#     1 => {
>> +#         102 => { .. guest info ..},
>> +#         103 => { .. guest info ..},
>> +#     },
>> +# }
>> +#
>> +# max_workers: how many parallel tasks should be started.
>> +# start_task: a sub that returns eiter a upid or 1 (undef means failure)
>> +# check_task: if start_task returned a upid, will wait for that to finish and
>> +#    call check_task with the resulting task status
>> +sub handle_task_foreach_guest {
>> +    my ($startlist, $max_workers, $start_task, $check_task) = @_;
>> +
>> +    my $rpcenv = PVE::RPCEnvironment::get();
>> +    my $authuser = $rpcenv->get_user();
>> +    my $api_client = create_client($authuser);
>> +
>> +    my $failed = [];
>> +    for my $order (sort { $a <=> $b } keys $startlist->%*) {
>> +        my $vmlist = $startlist->{$order};
>> +        my $workers = {};
>> +
>> +        for my $vmid (sort { $a <=> $b } keys $vmlist->%*) {
>> +
>> +            # wait until at least one slot is free
>> +            while (scalar(keys($workers->%*)) >= $max_workers) {
>> +                for my $upid (keys($workers->%*)) {
>> +                    my $worker = $workers->{$upid};
>> +                    my $node = $worker->{guest}->{node};
>> +
>> +                    my $task = eval { $api_client->get("/nodes/$node/tasks/$upid/status") };
> 
> this could easily fail for reasons other than the task having exited? should we
> maybe retry a few times to avoid accidents, before giving up?
> 

yep, i'd include such functionality (opt-in just for these calls) in the 
'make_request' abstraction?

>> +                    if (my $err = $@) {
>> +                        push $failed->@*, $worker->{vmid};
>> +
>> +                        $check_task->($api_client, $worker->{vmid}, $worker->{guest}, 1, undef);
>> +
>> +                        delete $workers->{$upid};
>> +                    } elsif ($task->{status} ne 'running') {
>> +                        my $is_error = PVE::Tools::upid_status_is_error($task->{exitstatus});
>> +                        push $failed->@*, $worker->{vmid} if $is_error;
>> +
>> +                        $check_task->(
>> +                            $api_client, $worker->{vmid}, $worker->{guest}, $is_error, $task,
>> +                        );
>> +
>> +                        delete $workers->{$upid};
>> +                    }
>> +                }
>> +                sleep(1); # How much?
>> +            }
>> +
>> +            my $guest = $vmlist->{$vmid};
>> +            my $upid = eval { $start_task->($api_client, $vmid, $guest) };
>> +            warn $@ if $@;
> 
> A: here we use warn (see further similar nits below)

true, i'll use log_warn for those

> 

[snip]
>> +    code => sub {
>> +        my ($param) = @_;
>> +
>> +        my $rpcenv = PVE::RPCEnvironment::get();
>> +        my $authuser = $rpcenv->get_user();
>> +
>> +        my ($vmlist, $vmlist_string) = extract_vmlist($param);
>> +
>> +        check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.PowerMgmt']);
>> +
>> +        if ($param->{'to-disk'}) {
>> +            check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.Config.Disk']);
>> +        }
>> +
>> +        if (my $statestorage = $param->{statestorage}) {
>> +            $rpcenv->check($authuser, "/storage/$statestorage", ['Datastore.AllocateSpace']);
>> +        } else {
>> +            # storage access must be done in start task
>> +        }
> 
> this if should be nested in the other if?

we could, but since 'statestorage' requires 'to-disk' in the api anyway,
so changing this would just increase indentation?

i could put it int he if, but the api already does that for us

> 
>> +
>> +        my $code = sub {
>> +            my $startlist =
>> +                PVE::API2::Nodes::Nodeinfo::get_start_stop_list(undef, undef, $vmlist_string);
>> +
>> +            print_start_action($vmlist, "Suspending");
>> +
>> +            # reverse order for suspend
>> +            for my $order (keys $startlist->%*) {
>> +                my $list = delete $startlist->{$order};
>> +                $order = $order * -1;
>> +                $startlist->{$order} = $list;
>> +            }
>> +
>> +            my $start_task = sub {
>> +                my ($api_client, $vmid, $guest) = @_;
>> +                my $node = $guest->{node};
>> +
>> +                if ($guest->{type} ne 'qemu') {
>> +                    log_warn("skipping $vmid, only VMs can be suspended");
>> +                    return 1;
>> +                }
>> +
>> +                if (!$param->{statestorage}) {
> 
> this should again be nested inside a check for to-disk being set

again, same argument as above
> 
>> +                    my $conf = PVE::QemuConfig->load_config($vmid, $node);
>> +                    my $storecfg = PVE::Storage::config();
>> +                    my $statestorage = PVE::QemuServer::find_vmstate_storage($conf, $storecfg);
> 
> this does not exist, it's in QemuConfig

ah yes, this was moved betweeen my v1 and v2 IIRC and i forgot to change it

> 
>> +                    $rpcenv->check(
>> +                        $authuser,
>> +                        "/storage/$statestorage",
>> +                        ['Datastore.AllocateSpace'],
>> +                    );
>> +                }
>> +
>> +                my $status =
>> +                    eval { $api_client->get("/nodes/$node/qemu/$vmid/status/current") };
>> +                if (defined($status) && $status->{status} ne 'running') {
>> +                    print STDERR "Skipping VM $vmid, not running.\n";
>> +                    return 1;
>> +                }
>> +
>> +                my $params = {};
>> +                $params->{'todisk'} = $param->{'to-disk'} // 0;
>> +                $params->{statestorage} = $param->{statestorage}
>> +                    if defined($param->{statestorage});
> 
> statestorage only makes sense if you set to-disk, so it should be ordered like
> that here as well..

again same argument as above

if you insist, i'll change it, but IMHO it just increases the 
indentation level for not much gain

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel