public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Dominik Csapak <d.csapak@proxmox.com>, pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH manager v2] api: implement node-independent bulk actions
Date: Fri, 14 Nov 2025 09:32:29 +0100	[thread overview]
Message-ID: <176310914957.64802.13109309341568645230@yuna.proxmox.com> (raw)
In-Reply-To: <20250814112659.2584520-1-d.csapak@proxmox.com>

Quoting Dominik Csapak (2025-08-14 13:26:59)
> To achieve this, start a worker task and use our generic api client
> to start the tasks on the relevant nodes. The client always points
> to 'localhost' so we let the pveproxy worry about the proxying etc.
> 
> We reuse some logic from the startall/stopall/etc. calls, like getting
> the ordered guest info list. For that to work, we must convert some of
> the private subs into proper subs. We also fix handling loading configs
> from other nodes.
> 
> In each worker, for each task, we check if the target is in the desired
> state (e.g. stopped when wanting to start, etc.). If that is the case,
> start the task and put the UPID in a queue to check. This is done until
> the parallel count is at 'max_workers', at which point we wait until at
> least one task is finished before starting the next one.
> 
> Failures (e.g. task errors or failure to fetch the tasks) are printed,
> and the vmid is saved and they're collectively printed at the end for
> convenience.
> 
> Special handling is required for checking the permissions for suspend:
> We have to load the config of the VM and find the target state storage.
> We can then check the privileges for that storage.
> 
> Further improvements can be:
> * filters (I'd prefer starting out with front end filters)
> * failure mode resolution (I'd wait until someone requests that)
> * token handling (probably not necessary since we do check the
>   permissions upfront for the correct token.)

I disagree with that last point, see below.

> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> changes from v1:
> * rebased on master (perltidy changes)
> * added missing suspend to index
> * refactored more functionality to be reused
> 
>  PVE/API2/Cluster.pm                  |   7 +
>  PVE/API2/Cluster/BulkAction.pm       |  45 ++
>  PVE/API2/Cluster/BulkAction/Guest.pm | 753 +++++++++++++++++++++++++++
>  PVE/API2/Cluster/BulkAction/Makefile |  17 +
>  PVE/API2/Cluster/Makefile            |   4 +-
>  PVE/API2/Nodes.pm                    |  24 +-
>  6 files changed, 838 insertions(+), 12 deletions(-)
>  create mode 100644 PVE/API2/Cluster/BulkAction.pm
>  create mode 100644 PVE/API2/Cluster/BulkAction/Guest.pm
>  create mode 100644 PVE/API2/Cluster/BulkAction/Makefile
> 
> diff --git a/PVE/API2/Cluster.pm b/PVE/API2/Cluster.pm
> index 02a7ceff..f0dd88f3 100644
> --- a/PVE/API2/Cluster.pm
> +++ b/PVE/API2/Cluster.pm
> @@ -25,6 +25,7 @@ use PVE::API2::ACMEAccount;
>  use PVE::API2::ACMEPlugin;
>  use PVE::API2::Backup;
>  use PVE::API2::Cluster::BackupInfo;
> +use PVE::API2::Cluster::BulkAction;
>  use PVE::API2::Cluster::Ceph;
>  use PVE::API2::Cluster::Mapping;
>  use PVE::API2::Cluster::Jobs;
> @@ -103,6 +104,11 @@ __PACKAGE__->register_method({
>      path => 'mapping',
>  });
>  
> +__PACKAGE__->register_method({
> +    subclass => "PVE::API2::Cluster::BulkAction",
> +    path => 'bulk-action',
> +});
> +
>  if ($have_sdn) {
>      __PACKAGE__->register_method({
>          subclass => "PVE::API2::Network::SDN",
> @@ -163,6 +169,7 @@ __PACKAGE__->register_method({
>              { name => 'resources' },
>              { name => 'status' },
>              { name => 'tasks' },
> +            { name => 'bulk-action' },

sorting? ;)

>          ];
>  
>          if ($have_sdn) {
> diff --git a/PVE/API2/Cluster/BulkAction.pm b/PVE/API2/Cluster/BulkAction.pm
> new file mode 100644
> index 00000000..df650514
> --- /dev/null
> +++ b/PVE/API2/Cluster/BulkAction.pm
> @@ -0,0 +1,45 @@
> +package PVE::API2::Cluster::BulkAction;
> +
> +use strict;
> +use warnings;
> +
> +use PVE::API2::Cluster::BulkAction::Guest;
> +
> +use base qw(PVE::RESTHandler);
> +
> +__PACKAGE__->register_method({
> +    subclass => "PVE::API2::Cluster::BulkAction::Guest",
> +    path => 'guest',
> +});
> +
> +__PACKAGE__->register_method({
> +    name => 'index',
> +    path => '',
> +    method => 'GET',
> +    description => "List resource types.",
> +    permissions => {
> +        user => 'all',
> +    },
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {},
> +    },
> +    returns => {
> +        type => 'array',
> +        items => {
> +            type => "object",
> +        },
> +        links => [{ rel => 'child', href => "{name}" }],
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        my $result = [
> +            { name => 'guest' },
> +        ];
> +
> +        return $result;
> +    },
> +});
> +
> +1;
> diff --git a/PVE/API2/Cluster/BulkAction/Guest.pm b/PVE/API2/Cluster/BulkAction/Guest.pm
> new file mode 100644
> index 00000000..785c9036
> --- /dev/null
> +++ b/PVE/API2/Cluster/BulkAction/Guest.pm
> @@ -0,0 +1,753 @@
> +package PVE::API2::Cluster::BulkAction::Guest;
> +
> +use strict;
> +use warnings;
> +
> +use PVE::APIClient::LWP;
> +use PVE::AccessControl;
> +use PVE::Cluster;
> +use PVE::Exception qw(raise raise_perm_exc raise_param_exc);
> +use PVE::INotify;
> +use PVE::JSONSchema qw(get_standard_option);
> +use PVE::QemuConfig;
> +use PVE::QemuServer;
> +use PVE::RESTEnvironment qw(log_warn);
> +use PVE::RESTHandler;
> +use PVE::RPCEnvironment;
> +use PVE::Storage;
> +use PVE::Tools qw();
> +
> +use PVE::API2::Nodes;
> +
> +use base qw(PVE::RESTHandler);
> +
> +__PACKAGE__->register_method({
> +    name => 'index',
> +    path => '',
> +    method => 'GET',
> +    description => "Bulk action index.",
> +    permissions => { user => 'all' },
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {},
> +    },
> +    returns => {
> +        type => 'array',
> +        items => {
> +            type => "object",
> +            properties => {},
> +        },
> +        links => [{ rel => 'child', href => "{name}" }],
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        return [
> +            { name => 'start' },
> +            { name => 'shutdown' },
> +            { name => 'migrate' },
> +            { name => 'suspend' },
> +        ];
> +    },
> +});
> +
> +sub create_client {
> +    my ($authuser, $request_timeout) = @_;
> +    my ($user, undef) = PVE::AccessControl::split_tokenid($authuser, 1);
> +
> +    # TODO: How to handle Tokens?

not like below for sure ;) we'd need to make it queriable using the
RPCEnvironment (and store it there) I guess? maybe opt-in so the storing only
happens for certain API handlers (e.g., these ones here for a start)?

this basically escalates from the token to a ticket of the user, which is a
nogo even if you duplicate the current set of privilege checks here, as that is
just waiting to get out of sync

> +    my $ticket = PVE::AccessControl::assemble_ticket($user || $authuser);
> +    my $csrf_token = PVE::AccessControl::assemble_csrf_prevention_token($user || $authuser);
> +
> +    my $node = PVE::INotify::nodename();
> +    my $fingerprint = PVE::Cluster::get_node_fingerprint($node);
> +
> +    my $conn_args = {
> +        protocol => 'https',
> +        host => 'localhost', # always call the api locally, let pveproxy handle the proxying
> +        port => 8006,
> +        ticket => $ticket,
> +        timeout => $request_timeout // 25, # default slightly shorter than the proxy->daemon timeout
> +        cached_fingerprints => {
> +            $fingerprint => 1,
> +        },
> +    };
> +
> +    my $api_client = PVE::APIClient::LWP->new($conn_args->%*);
> +    if (defined($csrf_token)) {
> +        $api_client->update_csrftoken($csrf_token);
> +    }
> +
> +    return $api_client;

this client doesn't automatically refresh the ticket before it expires, so
bigger sets of bulk actions that take more than 2h will always fail..

> +}
> +
> +# starts and awaits a task for each guest given via $startlist.
> +#
> +# takes a vm list in the form of
> +# {
> +#     0 => {
> +#         100 => { .. guest info ..},
> +#         101 => { .. guest info ..},
> +#     },
> +#     1 => {
> +#         102 => { .. guest info ..},
> +#         103 => { .. guest info ..},
> +#     },
> +# }
> +#
> +# max_workers: how many parallel tasks should be started.
> +# start_task: a sub that returns eiter a upid or 1 (undef means failure)
> +# check_task: if start_task returned a upid, will wait for that to finish and
> +#    call check_task with the resulting task status
> +sub handle_task_foreach_guest {
> +    my ($startlist, $max_workers, $start_task, $check_task) = @_;
> +
> +    my $rpcenv = PVE::RPCEnvironment::get();
> +    my $authuser = $rpcenv->get_user();
> +    my $api_client = create_client($authuser);
> +
> +    my $failed = [];
> +    for my $order (sort { $a <=> $b } keys $startlist->%*) {
> +        my $vmlist = $startlist->{$order};
> +        my $workers = {};
> +
> +        for my $vmid (sort { $a <=> $b } keys $vmlist->%*) {
> +
> +            # wait until at least one slot is free
> +            while (scalar(keys($workers->%*)) >= $max_workers) {
> +                for my $upid (keys($workers->%*)) {
> +                    my $worker = $workers->{$upid};
> +                    my $node = $worker->{guest}->{node};
> +
> +                    my $task = eval { $api_client->get("/nodes/$node/tasks/$upid/status") };

this could easily fail for reasons other than the task having exited? should we
maybe retry a few times to avoid accidents, before giving up?

> +                    if (my $err = $@) {
> +                        push $failed->@*, $worker->{vmid};
> +
> +                        $check_task->($api_client, $worker->{vmid}, $worker->{guest}, 1, undef);
> +
> +                        delete $workers->{$upid};
> +                    } elsif ($task->{status} ne 'running') {
> +                        my $is_error = PVE::Tools::upid_status_is_error($task->{exitstatus});
> +                        push $failed->@*, $worker->{vmid} if $is_error;
> +
> +                        $check_task->(
> +                            $api_client, $worker->{vmid}, $worker->{guest}, $is_error, $task,
> +                        );
> +
> +                        delete $workers->{$upid};
> +                    }
> +                }
> +                sleep(1); # How much?
> +            }
> +
> +            my $guest = $vmlist->{$vmid};
> +            my $upid = eval { $start_task->($api_client, $vmid, $guest) };
> +            warn $@ if $@;

A: here we use warn (see further similar nits below)

> +
> +            # success but no task necessary
> +            next if defined($upid) && "$upid" eq "1";
> +
> +            if (!defined($upid)) {
> +                push $failed->@*, $vmid;
> +                next;
> +            }
> +
> +            $workers->{$upid} = {
> +                vmid => $vmid,
> +                guest => $guest,
> +            };
> +        }
> +
> +        # wait until current order is finished
> +        for my $upid (keys($workers->%*)) {
> +            my $worker = $workers->{$upid};
> +            my $node = $worker->{guest}->{node};
> +
> +            my $task = eval { wait_for_task_finished($api_client, $node, $upid) };
> +            my $err = $@;
> +            my $is_error = ($err || PVE::Tools::upid_status_is_error($task->{exitstatus})) ? 1 : 0;
> +            push $failed->@*, $worker->{vmid} if $is_error;
> +
> +            $check_task->($api_client, $worker->{vmid}, $worker->{guest}, $is_error, $task);
> +
> +            delete $workers->{$upid};
> +        }
> +    }
> +
> +    return $failed;
> +}
> +
> +sub get_type_text {
> +    my ($type) = @_;
> +
> +    if ($type eq 'lxc') {
> +        return 'CT';
> +    } elsif ($type eq 'qemu') {
> +        return 'VM';
> +    } else {
> +        die "unknown guest type $type\n";
> +    }
> +}
> +
> +sub wait_for_task_finished {
> +    my ($client, $node, $upid) = @_;
> +
> +    while (1) {
> +        my $task = $client->get("/nodes/$node/tasks/$upid/status");

same question as above here - should we try to handle transient errors here?

> +        return $task if $task->{status} ne 'running';
> +        sleep(1); # How much time?
> +    }
> +}
> +
> +sub check_guest_permissions {
> +    my ($rpcenv, $authuser, $vmlist, $priv_list) = @_;
> +
> +    if (scalar($vmlist->@*) > 0) {
> +        $rpcenv->check($authuser, "/vms/$_", $priv_list) for $vmlist->@*;
> +    } elsif (!$rpcenv->check($authuser, "/", $priv_list, 1)) {
> +        raise_perm_exc("/, " . join(', ', $priv_list->@*));
> +    }
> +}
> +
> +sub extract_vmlist {
> +    my ($param) = @_;
> +
> +    if (my $vmlist = $param->{vms}) {
> +        my $vmlist_string = join(',', $vmlist->@*);
> +        return ($vmlist, $vmlist_string);
> +    }
> +    return ([], undef);
> +}
> +
> +sub print_start_action {
> +    my ($vmlist, $prefix, $suffix) = @_;
> +
> +    $suffix = defined($suffix) ? " $suffix" : "";
> +
> +    if (scalar($vmlist->@*)) {
> +        print STDERR "$prefix guests$suffix: " . join(', ', $vmlist->@*) . "\n";

A: here we use STDERR

> +    } else {
> +        print STDERR "$prefix all guests$suffix\n";
> +    }
> +}
> +
> +__PACKAGE__->register_method({
> +    name => 'start',
> +    path => 'start',
> +    method => 'POST',
> +    description => "Bulk start or resume all guests on the cluster.",
> +    permissions => {
> +        description => "The 'VM.PowerMgmt' permission is required on '/' or on '/vms/<ID>' for "
> +            . "each ID passed via the 'vms' parameter.",
> +        user => 'all',
> +    },
> +    protected => 1,
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {
> +            vms => {
> +                description => "Only consider guests from this list of VMIDs.",
> +                type => 'array',
> +                items => get_standard_option('pve-vmid'),
> +                optional => 1,
> +            },
> +            timeout => {
> +                description =>
> +                    "Default start timeout in seconds. Only valid for VMs. (default depends on the guest configuration).",
> +                type => 'integer',
> +                optional => 1,
> +            },
> +            'max-workers' => {
> +                description => "How many parallel tasks at maximum should be started.",
> +                optional => 1,
> +                default => 1,
> +                type => 'integer',
> +            },
> +            # TODO:
> +            # Failure resolution mode (fail, warn, retry?)
> +            # mode-limits (offline only, suspend only, ?)
> +            # filter (tags, name, ?)
> +        },
> +    },
> +    returns => {
> +        type => 'string',
> +        description => "UPID of the worker",
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        my $rpcenv = PVE::RPCEnvironment::get();
> +        my $authuser = $rpcenv->get_user();
> +
> +        my ($vmlist, $vmlist_string) = extract_vmlist($param);
> +
> +        check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.PowerMgmt']);
> +
> +        my $code = sub {
> +            my $startlist =
> +                PVE::API2::Nodes::Nodeinfo::get_start_stop_list(undef, undef, $vmlist_string);
> +
> +            print_start_action($vmlist, "Starting");
> +
> +            my $start_task = sub {
> +                my ($api_client, $vmid, $guest) = @_;
> +                my $node = $guest->{node};
> +
> +                my $type = $guest->{type};
> +                my $type_text = get_type_text($type);
> +                my $operation = 'start';
> +                my $status =
> +                    eval { $api_client->get("/nodes/$node/$type/$vmid/status/current") };
> +                if (defined($status) && $status->{status} eq 'running') {
> +                    if (defined($status->{qmpstatus}) && $status->{qmpstatus} ne 'paused') {
> +                        print STDERR "Skipping $type_text $vmid, already running.\n";
> +                        return 1;
> +                    } else {
> +                        $operation = 'resume';
> +                    }
> +                }
> +
> +                my $params = {};
> +                if (defined($param->{timeout}) && $operation eq 'start' && $type eq 'qemu') {
> +                    $params->{timeout} = $param->{timeout};
> +                }
> +
> +                my $url = "/nodes/$node/$type/$vmid/status/$operation";
> +                print STDERR "Starting $type_text $vmid\n";
> +                return $api_client->post($url, $params);
> +            };
> +
> +            my $check_task = sub {
> +                my ($api_client, $vmid, $guest, $is_error, $task) = @_;
> +                my $node = $guest->{node};
> +
> +                my $default_delay = 0;
> +
> +                if (!$is_error) {
> +                    my $delay = defined($guest->{up}) ? int($guest->{up}) : $default_delay;
> +                    if ($delay > 0) {
> +                        print STDERR "Waiting for $delay seconds (startup delay)\n"
> +                            if $guest->{up};
> +                        for (my $i = 0; $i < $delay; $i++) {
> +                            sleep(1);
> +                        }
> +                    }
> +                } else {
> +                    my $err =
> +                        defined($task) ? $task->{exitstatus} : "could not query task status";
> +                    my $type_text = get_type_text($guest->{type});
> +                    print STDERR "Starting $type_text $vmid failed: $err\n";
> +                }
> +            };
> +
> +            my $max_workers = $param->{'max-workers'} // 1;
> +            my $failed =
> +                handle_task_foreach_guest($startlist, $max_workers, $start_task, $check_task);
> +
> +            if (scalar($failed->@*)) {
> +                die "Some guests failed to start: " . join(', ', $failed->@*) . "\n";
> +            }
> +        };
> +
> +        return $rpcenv->fork_worker('bulkstart', undef, $authuser, $code);
> +    },
> +});
> +
> +__PACKAGE__->register_method({
> +    name => 'shutdown',
> +    path => 'shutdown',
> +    method => 'POST',
> +    description => "Bulk shutdown all guests on the cluster.",
> +    permissions => {
> +        description => "The 'VM.PowerMgmt' permission is required on '/' or on '/vms/<ID>' for "
> +            . "each ID passed via the 'vms' parameter.",
> +        user => 'all',
> +    },
> +    protected => 1,
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {
> +            vms => {
> +                description => "Only consider guests from this list of VMIDs.",
> +                type => 'array',
> +                items => get_standard_option('pve-vmid'),
> +                optional => 1,
> +            },
> +            timeout => {
> +                description =>
> +                    "Default shutdown timeout in seconds if none is configured for the guest.",
> +                type => 'integer',
> +                default => 180,
> +                optional => 1,
> +            },
> +            'force-stop' => {
> +                description => "Makes sure the Guest stops after the timeout.",
> +                type => 'boolean',
> +                default => 1,
> +                optional => 1,
> +            },
> +            'max-workers' => {
> +                description => "How many parallel tasks at maximum should be started.",
> +                optional => 1,
> +                default => 1,
> +                type => 'integer',
> +            },
> +            # TODO:
> +            # Failure resolution mode (fail, warn, retry?)
> +            # mode-limits (offline only, suspend only, ?)
> +            # filter (tags, name, ?)
> +        },
> +    },
> +    returns => {
> +        type => 'string',
> +        description => "UPID of the worker",
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        my $rpcenv = PVE::RPCEnvironment::get();
> +        my $authuser = $rpcenv->get_user();
> +
> +        my ($vmlist, $vmlist_string) = extract_vmlist($param);
> +
> +        check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.PowerMgmt']);
> +
> +        my $code = sub {
> +            my $startlist =
> +                PVE::API2::Nodes::Nodeinfo::get_start_stop_list(undef, undef, $vmlist_string);
> +
> +            print_start_action($vmlist, "Shutting down");
> +
> +            # reverse order for shutdown
> +            for my $order (keys $startlist->%*) {
> +                my $list = delete $startlist->{$order};
> +                $order = $order * -1;
> +                $startlist->{$order} = $list;
> +            }
> +
> +            my $start_task = sub {
> +                my ($api_client, $vmid, $guest) = @_;
> +                my $node = $guest->{node};
> +
> +                my $type = $guest->{type};
> +                my $type_text = get_type_text($type);
> +
> +                my $status =
> +                    eval { $api_client->get("/nodes/$node/$type/$vmid/status/current") };
> +                if (defined($status) && $status->{status} ne 'running') {
> +                    print STDERR "Skipping $type_text $vmid, not running.\n";
> +                    return 1;
> +                }
> +
> +                if (
> +                    defined($status)
> +                    && defined($status->{qmpstatus})
> +                    && $status->{qmpstatus} eq 'paused'
> +                    && !$param->{'force-stop'}
> +                ) {
> +                    log_warn("Skipping $type_text $vmid, resume paused VM before shutdown.\n");

A: here we use log_warn

> +                    return 1;
> +                }
> +
> +                my $timeout = int($guest->{down} // $param->{timeout} // 180);
> +                my $forceStop = $param->{'force-stop'} // 1;
> +
> +                my $params = {
> +                    forceStop => $forceStop,
> +                    timeout => $timeout,
> +                };
> +
> +                my $url = "/nodes/$node/$type/$vmid/status/shutdown";
> +                print STDERR "Shutting down $type_text $vmid (Timeout = $timeout seconds)\n";
> +                return $api_client->post($url, $params);
> +            };
> +
> +            my $check_task = sub {
> +                my ($api_client, $vmid, $guest, $is_error, $task) = @_;
> +                my $node = $guest->{node};
> +                if ($is_error) {
> +                    my $err =
> +                        defined($task) ? $task->{exitstatus} : "could not query task status";
> +                    my $type_text = get_type_text($guest->{type});
> +                    print STDERR "Stopping $type_text $vmid failed: $err\n";
> +                }
> +            };
> +
> +            my $max_workers = $param->{'max-workers'} // 1;
> +            my $failed =
> +                handle_task_foreach_guest($startlist, $max_workers, $start_task, $check_task);
> +
> +            if (scalar($failed->@*)) {
> +                die "Some guests failed to shutdown " . join(', ', $failed->@*) . "\n";
> +            }
> +        };
> +
> +        return $rpcenv->fork_worker('bulkshutdown', undef, $authuser, $code);
> +    },
> +});
> +
> +__PACKAGE__->register_method({
> +    name => 'suspend',
> +    path => 'suspend',
> +    method => 'POST',
> +    description => "Bulk suspend all guests on the cluster.",
> +    permissions => {
> +        description =>
> +            "The 'VM.PowerMgmt' permission is required on '/' or on '/vms/<ID>' for each"
> +            . " ID passed via the 'vms' parameter. Additionally, you need 'VM.Config.Disk' on the"
> +            . " '/vms/{vmid}' path and 'Datastore.AllocateSpace' for the configured state-storage(s)",
> +        user => 'all',
> +    },
> +    protected => 1,
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {
> +            vms => {
> +                description => "Only consider guests from this list of VMIDs.",
> +                type => 'array',
> +                items => get_standard_option('pve-vmid'),
> +                optional => 1,
> +            },
> +            statestorage => get_standard_option(
> +                'pve-storage-id',
> +                {
> +                    description => "The storage for the VM state.",
> +                    requires => 'to-disk',
> +                    optional => 1,
> +                    completion => \&PVE::Storage::complete_storage_enabled,
> +                },
> +            ),
> +            'to-disk' => {
> +                description =>
> +                    "If set, suspends the guests to disk. Will be resumed on next start.",
> +                type => 'boolean',
> +                default => 0,
> +                optional => 1,
> +            },
> +            'max-workers' => {
> +                description => "How many parallel tasks at maximum should be started.",
> +                optional => 1,
> +                default => 1,
> +                type => 'integer',
> +            },
> +            # TODO:
> +            # Failure resolution mode (fail, warn, retry?)
> +            # mode-limits (offline only, suspend only, ?)
> +            # filter (tags, name, ?)
> +        },
> +    },
> +    returns => {
> +        type => 'string',
> +        description => "UPID of the worker",
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        my $rpcenv = PVE::RPCEnvironment::get();
> +        my $authuser = $rpcenv->get_user();
> +
> +        my ($vmlist, $vmlist_string) = extract_vmlist($param);
> +
> +        check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.PowerMgmt']);
> +
> +        if ($param->{'to-disk'}) {
> +            check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.Config.Disk']);
> +        }
> +
> +        if (my $statestorage = $param->{statestorage}) {
> +            $rpcenv->check($authuser, "/storage/$statestorage", ['Datastore.AllocateSpace']);
> +        } else {
> +            # storage access must be done in start task
> +        }

this if should be nested in the other if?

> +
> +        my $code = sub {
> +            my $startlist =
> +                PVE::API2::Nodes::Nodeinfo::get_start_stop_list(undef, undef, $vmlist_string);
> +
> +            print_start_action($vmlist, "Suspending");
> +
> +            # reverse order for suspend
> +            for my $order (keys $startlist->%*) {
> +                my $list = delete $startlist->{$order};
> +                $order = $order * -1;
> +                $startlist->{$order} = $list;
> +            }
> +
> +            my $start_task = sub {
> +                my ($api_client, $vmid, $guest) = @_;
> +                my $node = $guest->{node};
> +
> +                if ($guest->{type} ne 'qemu') {
> +                    log_warn("skipping $vmid, only VMs can be suspended");
> +                    return 1;
> +                }
> +
> +                if (!$param->{statestorage}) {

this should again be nested inside a check for to-disk being set

> +                    my $conf = PVE::QemuConfig->load_config($vmid, $node);
> +                    my $storecfg = PVE::Storage::config();
> +                    my $statestorage = PVE::QemuServer::find_vmstate_storage($conf, $storecfg);

this does not exist, it's in QemuConfig

> +                    $rpcenv->check(
> +                        $authuser,
> +                        "/storage/$statestorage",
> +                        ['Datastore.AllocateSpace'],
> +                    );
> +                }
> +
> +                my $status =
> +                    eval { $api_client->get("/nodes/$node/qemu/$vmid/status/current") };
> +                if (defined($status) && $status->{status} ne 'running') {
> +                    print STDERR "Skipping VM $vmid, not running.\n";
> +                    return 1;
> +                }
> +
> +                my $params = {};
> +                $params->{'todisk'} = $param->{'to-disk'} // 0;
> +                $params->{statestorage} = $param->{statestorage}
> +                    if defined($param->{statestorage});

statestorage only makes sense if you set to-disk, so it should be ordered like
that here as well..

> +
> +                my $url = "/nodes/$node/qemu/$vmid/status/suspend";
> +                print STDERR "Suspending VM $vmid\n";
> +                return $api_client->post($url, $params);
> +            };
> +
> +            my $check_task = sub {
> +                my ($api_client, $vmid, $guest, $is_error, $task) = @_;
> +                my $node = $guest->{node};
> +                if ($is_error) {
> +                    my $err =
> +                        defined($task) ? $task->{exitstatus} : "could not query task status";
> +                    my $type_text = get_type_text($guest->{type});
> +                    print STDERR "Stopping $type_text $vmid failed: $err\n";
> +                }
> +            };
> +
> +            my $max_workers = $param->{'max-workers'} // 1;
> +            my $failed =
> +                handle_task_foreach_guest($startlist, $max_workers, $start_task, $check_task);
> +
> +            if (scalar($failed->@*)) {
> +                die "Some guests failed to suspend " . join(', ', $failed->@*) . "\n";
> +            }
> +        };
> +
> +        return $rpcenv->fork_worker('bulksuspend', undef, $authuser, $code);
> +    },
> +});
> +
> +__PACKAGE__->register_method({
> +    name => 'migrate',
> +    path => 'migrate',
> +    method => 'POST',
> +    description => "Bulk migrate all guests on the cluster.",
> +    permissions => {
> +        description =>
> +            "The 'VM.Migrate' permission is required on '/' or on '/vms/<ID>' for each "
> +            . "ID passed via the 'vms' parameter.",
> +        user => 'all',
> +    },
> +    protected => 1,
> +    parameters => {
> +        additionalProperties => 0,
> +        properties => {
> +            vms => {
> +                description => "Only consider guests from this list of VMIDs.",
> +                type => 'array',
> +                items => get_standard_option('pve-vmid'),
> +                optional => 1,
> +            },
> +            'target-node' => get_standard_option('pve-node', { description => "Target node." }),
> +            online => {
> +                type => 'boolean',
> +                description => "Enable live migration for VMs and restart migration for CTs.",
> +                optional => 1,
> +            },
> +            "with-local-disks" => {
> +                type => 'boolean',
> +                description => "Enable live storage migration for local disk",
> +                optional => 1,
> +            },
> +            'max-workers' => {
> +                description => "How many parallel tasks at maximum should be started.",
> +                optional => 1,
> +                default => 1,
> +                type => 'integer',
> +            },
> +            # TODO:
> +            # Failure resolution mode (fail, warn, retry?)
> +            # mode-limits (offline only, suspend only, ?)
> +            # filter (tags, name, ?)
> +        },
> +    },
> +    returns => {
> +        type => 'string',
> +        description => "UPID of the worker",
> +    },
> +    code => sub {
> +        my ($param) = @_;
> +
> +        my $rpcenv = PVE::RPCEnvironment::get();
> +        my $authuser = $rpcenv->get_user();
> +
> +        my ($vmlist, $vmlist_string) = extract_vmlist($param);
> +
> +        check_guest_permissions($rpcenv, $authuser, $vmlist, ['VM.Migrate']);
> +
> +        my $code = sub {
> +            my $list =
> +                PVE::API2::Nodes::Nodeinfo::get_filtered_vmlist(undef, $vmlist_string, 1, 1);
> +
> +            print_start_action($vmlist, "Migrating", "to $param->{'target-node'}");
> +
> +            my $start_task = sub {
> +                my ($api_client, $vmid, $guest) = @_;
> +                my $node = $guest->{node};
> +
> +                my $type = $guest->{type};
> +                my $type_text = get_type_text($type);
> +
> +                if ($node eq $param->{'target-node'}) {
> +                    print STDERR "$type_text $vmid already on $node, skipping.\n";
> +                    return 1;
> +                }
> +
> +                my $params = {
> +                    target => $param->{'target-node'},
> +                };
> +
> +                if ($type eq 'lxc') {
> +                    $params->{restart} = $param->{online} if defined($param->{online});
> +                } elsif ($type eq 'qemu') {
> +                    $params->{online} = $param->{online} if defined($param->{online});
> +                    $params->{'with-local-disks'} = $param->{'with-local-disks'}
> +                        if defined($param->{'with-local-disks'});
> +                }
> +
> +                my $url = "/nodes/$node/$type/$vmid/migrate";
> +                print STDERR "Migrating $type_text $vmid\n";
> +                return $api_client->post($url, $params);
> +            };
> +
> +            my $check_task = sub {
> +                my ($api_client, $vmid, $guest, $is_error, $task) = @_;
> +                if ($is_error) {
> +                    my $err =
> +                        defined($task) ? $task->{exitstatus} : "could not query task status";
> +                    my $type_text = get_type_text($guest->{type});
> +                    print STDERR "Migrating $type_text $vmid failed: $err\n";
> +                }
> +            };
> +
> +            my $max_workers = $param->{'max-workers'} // 1;
> +            my $failed =
> +                handle_task_foreach_guest({ '0' => $list }, $max_workers, $start_task, $check_task);
> +
> +            if (scalar($failed->@*)) {
> +                die "Some guests failed to migrate " . join(', ', $failed->@*) . "\n";
> +            }
> +        };
> +
> +        return $rpcenv->fork_worker('bulkmigrate', undef, $authuser, $code);
> +    },
> +});
> +
> +1;
> diff --git a/PVE/API2/Cluster/BulkAction/Makefile b/PVE/API2/Cluster/BulkAction/Makefile
> new file mode 100644
> index 00000000..822c1c15
> --- /dev/null
> +++ b/PVE/API2/Cluster/BulkAction/Makefile
> @@ -0,0 +1,17 @@
> +include ../../../../defines.mk
> +
> +# for node independent, cluster-wide applicable, API endpoints
> +# ensure we do not conflict with files shipped by pve-cluster!!
> +PERLSOURCE=    \
> +       Guest.pm
> +
> +all:
> +
> +.PHONY: clean
> +clean:
> +       rm -rf *~
> +
> +.PHONY: install
> +install: ${PERLSOURCE}
> +       install -d ${PERLLIBDIR}/PVE/API2/Cluster/BulkAction
> +       install -m 0644 ${PERLSOURCE} ${PERLLIBDIR}/PVE/API2/Cluster/BulkAction
> diff --git a/PVE/API2/Cluster/Makefile b/PVE/API2/Cluster/Makefile
> index b109e5cb..6cffe4c9 100644
> --- a/PVE/API2/Cluster/Makefile
> +++ b/PVE/API2/Cluster/Makefile
> @@ -1,11 +1,13 @@
>  include ../../../defines.mk
>  
> -SUBDIRS=Mapping
> +SUBDIRS=Mapping                \
> +       BulkAction
>  
>  # for node independent, cluster-wide applicable, API endpoints
>  # ensure we do not conflict with files shipped by pve-cluster!!
>  PERLSOURCE=                    \
>         BackupInfo.pm           \
> +       BulkAction.pm           \
>         MetricServer.pm         \
>         Mapping.pm              \
>         Notifications.pm                \
> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
> index ce7eecaf..0c43f5c7 100644
> --- a/PVE/API2/Nodes.pm
> +++ b/PVE/API2/Nodes.pm
> @@ -1908,7 +1908,7 @@ __PACKAGE__->register_method({
>  # * vmid whitelist
>  # * guest is a template (default: skip)
>  # * guest is HA manged (default: skip)
> -my $get_filtered_vmlist = sub {
> +sub get_filtered_vmlist {
>      my ($nodename, $vmfilter, $templates, $ha_managed) = @_;
>  
>      my $vmlist = PVE::Cluster::get_vmlist();
> @@ -1935,28 +1935,29 @@ my $get_filtered_vmlist = sub {
>                  die "unknown virtual guest type '$d->{type}'\n";
>              }
>  
> -            my $conf = $class->load_config($vmid);
> +            my $conf = $class->load_config($vmid, $d->{node});
>              return if !$templates && $class->is_template($conf);
>              return if !$ha_managed && PVE::HA::Config::vm_is_ha_managed($vmid);
>  
>              $res->{$vmid}->{conf} = $conf;
>              $res->{$vmid}->{type} = $d->{type};
>              $res->{$vmid}->{class} = $class;
> +            $res->{$vmid}->{node} = $d->{node};
>          };
>          warn $@ if $@;
>      }
>  
>      return $res;
> -};
> +}
>  
>  # return all VMs which should get started/stopped on power up/down
> -my $get_start_stop_list = sub {
> +sub get_start_stop_list {
>      my ($nodename, $autostart, $vmfilter) = @_;
>  
>      # do not skip HA vms on force or if a specific VMID set is wanted
>      my $include_ha_managed = defined($vmfilter) ? 1 : 0;
>  
> -    my $vmlist = $get_filtered_vmlist->($nodename, $vmfilter, undef, $include_ha_managed);
> +    my $vmlist = get_filtered_vmlist($nodename, $vmfilter, undef, $include_ha_managed);
>  
>      my $resList = {};
>      foreach my $vmid (keys %$vmlist) {
> @@ -1969,15 +1970,16 @@ my $get_start_stop_list = sub {
>  
>          $resList->{$order}->{$vmid} = $startup;
>          $resList->{$order}->{$vmid}->{type} = $vmlist->{$vmid}->{type};
> +        $resList->{$order}->{$vmid}->{node} = $vmlist->{$vmid}->{node};
>      }
>  
>      return $resList;
> -};
> +}
>  
>  my $remove_locks_on_startup = sub {
>      my ($nodename) = @_;
>  
> -    my $vmlist = &$get_filtered_vmlist($nodename, undef, undef, 1);
> +    my $vmlist = get_filtered_vmlist($nodename, undef, undef, 1);
>  
>      foreach my $vmid (keys %$vmlist) {
>          my $conf = $vmlist->{$vmid}->{conf};
> @@ -2069,7 +2071,7 @@ __PACKAGE__->register_method({
>              warn $@ if $@;
>  
>              my $autostart = $force ? undef : 1;
> -            my $startList = $get_start_stop_list->($nodename, $autostart, $param->{vms});
> +            my $startList = get_start_stop_list($nodename, $autostart, $param->{vms});
>  
>              # Note: use numeric sorting with <=>
>              for my $order (sort { $a <=> $b } keys %$startList) {
> @@ -2215,7 +2217,7 @@ __PACKAGE__->register_method({
>  
>              $rpcenv->{type} = 'priv'; # to start tasks in background
>  
> -            my $stopList = $get_start_stop_list->($nodename, undef, $param->{vms});
> +            my $stopList = get_start_stop_list($nodename, undef, $param->{vms});
>  
>              my $cpuinfo = PVE::ProcFSTools::read_cpuinfo();
>              my $datacenterconfig = cfs_read_file('datacenter.cfg');
> @@ -2344,7 +2346,7 @@ __PACKAGE__->register_method({
>  
>              $rpcenv->{type} = 'priv'; # to start tasks in background
>  
> -            my $toSuspendList = $get_start_stop_list->($nodename, undef, $param->{vms});
> +            my $toSuspendList = get_start_stop_list($nodename, undef, $param->{vms});
>  
>              my $cpuinfo = PVE::ProcFSTools::read_cpuinfo();
>              my $datacenterconfig = cfs_read_file('datacenter.cfg');
> @@ -2549,7 +2551,7 @@ __PACKAGE__->register_method({
>          my $code = sub {
>              $rpcenv->{type} = 'priv'; # to start tasks in background
>  
> -            my $vmlist = &$get_filtered_vmlist($nodename, $param->{vms}, 1, 1);
> +            my $vmlist = get_filtered_vmlist($nodename, $param->{vms}, 1, 1);
>              if (!scalar(keys %$vmlist)) {
>                  warn "no virtual guests matched, nothing to do..\n";
>                  return;
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 
>


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  reply	other threads:[~2025-11-14  8:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14 11:26 Dominik Csapak
2025-11-14  8:32 ` Fabian Grünbichler [this message]
2025-11-14  9:08   ` Thomas Lamprecht
2025-11-14  9:16   ` Dominik Csapak
2025-11-14  9:32     ` Fabian Grünbichler
2025-11-14 15:00 ` [pve-devel] superseded: " Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=176310914957.64802.13109309341568645230@yuna.proxmox.com \
    --to=f.gruenbichler@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal