* [pve-devel] [PATCH qemu-server v2 0/2] fix #3258: check for in-use pci devices on vm start
@ 2021-10-07 9:37 Dominik Csapak
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm Dominik Csapak
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 2/2] fix #3258: block vm start when pci device is already in use Dominik Csapak
0 siblings, 2 replies; 4+ messages in thread
From: Dominik Csapak @ 2021-10-07 9:37 UTC (permalink / raw)
To: pve-devel
by having a vmid <-> pciid mapping in /var/run
i did not check if the vm has the pci device really in the config,
but we should not need that, since we remove the reservation again
in the cleanup step and check the running pid anyway.
changes from v1:
* use time-based reservation before starting (current + start timeout +5s)
and reserve again after we have the pid
Dominik Csapak (2):
pci: add helpers to (un)reserve pciids for a vm
fix #3258: block vm start when pci device is already in use
PVE/QemuServer.pm | 27 +++++++++++-
PVE/QemuServer/PCI.pm | 99 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 125 insertions(+), 1 deletion(-)
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm
2021-10-07 9:37 [pve-devel] [PATCH qemu-server v2 0/2] fix #3258: check for in-use pci devices on vm start Dominik Csapak
@ 2021-10-07 9:37 ` Dominik Csapak
2021-10-07 11:59 ` Thomas Lamprecht
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 2/2] fix #3258: block vm start when pci device is already in use Dominik Csapak
1 sibling, 1 reply; 4+ messages in thread
From: Dominik Csapak @ 2021-10-07 9:37 UTC (permalink / raw)
To: pve-devel
saves a list of pciid <-> vmid mappings in /var/run
that we can check when we start a vm
if we're not given a pid but a timeout, we save the time when the
reservation will run out (current time + timeout + 5s) since each
vm start (until we can save the pid) varies from config to config
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer/PCI.pm | 99 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 99 insertions(+)
diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
index 5608207..364ceb7 100644
--- a/PVE/QemuServer/PCI.pm
+++ b/PVE/QemuServer/PCI.pm
@@ -5,6 +5,7 @@ use strict;
use PVE::JSONSchema;
use PVE::SysFSTools;
+use PVE::Tools;
use base 'Exporter';
@@ -461,4 +462,102 @@ sub print_hostpci_devices {
return ($kvm_off, $gpu_passthrough, $legacy_igd);
}
+my $PCIID_RESERVATION_FILE = "/var/run/pve-reserved-pciids";
+my $PCIID_RESERVATION_LCK = "/var/lock/pve-reserved-pciids.lck";
+
+my $parse_pci_reservation = sub {
+ my $pciids = {};
+
+ if (my $fh = IO::File->new ($PCIID_RESERVATION_FILE, "r")) {
+ while (my $line = <$fh>) {
+ if ($line =~ m/^($PCIRE)\s(\d+)\stime\:(\d+)$/) {
+ $pciids->{$1} = {
+ vmid => $2,
+ timestamp => $3,
+ };
+ } elsif ($line =~ m/^($PCIRE)\s(\d+)\spid\:(\d+)$/s) {
+ $pciids->{$1} = {
+ vmid => $2,
+ pid => $3,
+ };
+ }
+ }
+ }
+
+ return $pciids;
+};
+
+my $write_pci_reservation = sub {
+ my ($pciids) = @_;
+
+ my $data = "";
+ foreach my $p (keys %$pciids) {
+ my $entry = $pciids->{$p};
+ if (defined($entry->{pid})) {
+ $data .= "$p $entry->{vmid} pid:$entry->{pid}\n";
+ } else {
+ $data .= "$p $entry->{vmid} time:$entry->{timestamp}\n";
+ }
+ }
+
+ PVE::Tools::file_set_contents($PCIID_RESERVATION_FILE, $data);
+};
+
+sub remove_pci_reservation {
+ my ($id) = @_;
+
+ my $code = sub {
+ my $pciids = $parse_pci_reservation->();
+
+ delete $pciids->{$id};
+
+ $write_pci_reservation->($pciids);
+ };
+
+ PVE::Tools::lock_file($PCIID_RESERVATION_LCK, 10, $code);
+ die $@ if $@;
+
+ return;
+}
+
+sub reserve_pci_usage {
+ my ($id, $vmid, $timeout, $pid) = @_;
+
+ PVE::Tools::lock_file($PCIID_RESERVATION_LCK, 10, sub {
+
+ my $ctime = time();
+ my $errmsg = "PCI device '$id' already in use\n";
+
+ my $pciids = $parse_pci_reservation->();
+
+ if (my $pciid = $pciids->{$id}) {
+ if ($pciid->{vmid} != $vmid) {
+ # check time based reservation
+ die $errmsg if defined($pciid->{timestamp}) && $pciid->{timestamp} > $ctime;
+
+ # check running vm
+ my $pid = PVE::QemuServer::Helpers::vm_running_locally($pciid->{vmid});
+ die $errmsg if defined($pciid->{pid}) && $pid == $pciid->{pid};
+ }
+ }
+
+ $pciids->{$id} = {
+ vmid => $vmid,
+ };
+ if (defined($timeout)) {
+ # we save the timestamp when the reservation gets invalid (+5s),
+ # since the start timeout depends on the config
+ $pciids->{$id}->{timestamp} = $ctime + $timeout + 5;
+ }
+ $pciids->{$id}->{pid} = $pid if defined($pid);
+
+ $write_pci_reservation->($pciids);
+
+ return;
+ });
+ die $@ if $@;
+
+ return;
+}
+
1;
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* [pve-devel] [PATCH qemu-server v2 2/2] fix #3258: block vm start when pci device is already in use
2021-10-07 9:37 [pve-devel] [PATCH qemu-server v2 0/2] fix #3258: check for in-use pci devices on vm start Dominik Csapak
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm Dominik Csapak
@ 2021-10-07 9:37 ` Dominik Csapak
1 sibling, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2021-10-07 9:37 UTC (permalink / raw)
To: pve-devel
on vm start, we reserve all pciids that we use, and
remove the reservation again in vm_stop_cleanup
first with only a time-based reservation but after the vm is started,
we reserve again but with the pid.
for this, we have to move the start_timeout calculation above the
hostpci handling.
this way, when a vm starts with a pci device that is already configured
for a different running vm, will not be started and the user gets
the error that the device is already in use
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
PVE/QemuServer.pm | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index e5175b3..f05d858 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5381,11 +5381,23 @@ sub vm_start_nolock {
push @$cmd, '-S';
}
+ my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $resume);
+ my $pciids_to_reserve = [];
+
# host pci devices
for (my $i = 0; $i < $PVE::QemuServer::PCI::MAX_HOSTPCI_DEVICES; $i++) {
my $d = parse_hostpci($conf->{"hostpci$i"});
next if !$d;
my $pcidevices = $d->{pciid};
+
+ # reserve all pciids before actually doing anything with them
+ foreach my $pcidevice (@$pcidevices) {
+ my $pciid = $pcidevice->{id};
+ PVE::QemuServer::PCI::reserve_pci_usage($pciid, $vmid, $start_timeout);
+ # we need to update the reservation later
+ push @$pciids_to_reserve, $pciid;
+ }
+
foreach my $pcidevice (@$pcidevices) {
my $pciid = $pcidevice->{id};
@@ -5417,7 +5429,6 @@ sub vm_start_nolock {
my $cpuunits = get_cpuunits($conf);
- my $start_timeout = $params->{timeout} // config_aware_timeout($conf, $resume);
my %run_params = (
timeout => $statefile ? undef : $start_timeout,
umask => 0077,
@@ -5497,9 +5508,22 @@ sub vm_start_nolock {
if (my $err = $@) {
# deactivate volumes if start fails
eval { PVE::Storage::deactivate_volumes($storecfg, $vollist); };
+ foreach my $id (@$pciids_to_reserve) {
+ eval { PVE::QemuServer::PCI::remove_pci_reservation($id) };
+ warn $@ if $@;
+ }
+
die "start failed: $err";
}
+ # reserve all pciids again with the pid
+ # the vm is already started, we can only warn on error here
+ my $pid = PVE::QemuServer::Helpers::vm_running_locally($vmid);
+ foreach my $id (@$pciids_to_reserve) {
+ eval { PVE::QemuServer::PCI::reserve_pci_usage($id, $vmid, undef, $pid) };
+ warn $@ if $@;
+ }
+
print "migration listens on $migrate_uri\n" if $migrate_uri;
$res->{migrate_uri} = $migrate_uri;
@@ -5697,6 +5721,7 @@ sub vm_stop_cleanup {
foreach my $pci (@{$d->{pciid}}) {
my $pciid = $pci->{id};
PVE::SysFSTools::pci_cleanup_mdev_device($pciid, $uuid);
+ PVE::QemuServer::PCI::remove_pci_reservation($pciid);
}
}
--
2.30.2
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm Dominik Csapak
@ 2021-10-07 11:59 ` Thomas Lamprecht
0 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-10-07 11:59 UTC (permalink / raw)
To: Proxmox VE development discussion, Dominik Csapak
On 07.10.21 11:37, Dominik Csapak wrote:
> saves a list of pciid <-> vmid mappings in /var/run
> that we can check when we start a vm
>
> if we're not given a pid but a timeout, we save the time when the
> reservation will run out (current time + timeout + 5s) since each
> vm start (until we can save the pid) varies from config to config
>
a few comments, some of them could have happend for v1 so sorry for any
churn..
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
> PVE/QemuServer/PCI.pm | 99 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 99 insertions(+)
>
> diff --git a/PVE/QemuServer/PCI.pm b/PVE/QemuServer/PCI.pm
> index 5608207..364ceb7 100644
> --- a/PVE/QemuServer/PCI.pm
> +++ b/PVE/QemuServer/PCI.pm
> @@ -5,6 +5,7 @@ use strict;
>
> use PVE::JSONSchema;
> use PVE::SysFSTools;
> +use PVE::Tools;
>
> use base 'Exporter';
>
> @@ -461,4 +462,102 @@ sub print_hostpci_devices {
> return ($kvm_off, $gpu_passthrough, $legacy_igd);
> }
>
> +my $PCIID_RESERVATION_FILE = "/var/run/pve-reserved-pciids";
> +my $PCIID_RESERVATION_LCK = "/var/lock/pve-reserved-pciids.lck";
> +
> +my $parse_pci_reservation = sub {
> + my $pciids = {};
> +
> + if (my $fh = IO::File->new ($PCIID_RESERVATION_FILE, "r")) {
> + while (my $line = <$fh>) {
> + if ($line =~ m/^($PCIRE)\s(\d+)\stime\:(\d+)$/) {
> + $pciids->{$1} = {
> + vmid => $2,
> + timestamp => $3,
> + };
> + } elsif ($line =~ m/^($PCIRE)\s(\d+)\spid\:(\d+)$/s) {
> + $pciids->{$1} = {
> + vmid => $2,
> + pid => $3,
> + };
> + }
could probably be (untested):
if ($line =~ m/^($PCIRE)\s(\d+)\s(time|pid)\:(\d+)$/) {
$pciids->{$1} = {
vmid => $2,
"$3" => $4,
};
}
Alternative: use a format like "$vmid $timestamp [$pid]"
we'd keep the time of initial reservation info that way, not sure how much use that
has though ^^
> + }
> + }
> +
> + return $pciids;
> +};
> +
> +my $write_pci_reservation = sub {
> + my ($pciids) = @_;
> +
> + my $data = "";
> + foreach my $p (keys %$pciids) {
> + my $entry = $pciids->{$p};
> + if (defined($entry->{pid})) {
> + $data .= "$p $entry->{vmid} pid:$entry->{pid}\n";
> + } else {
> + $data .= "$p $entry->{vmid} time:$entry->{timestamp}\n";
> + }
> + }
> +
> + PVE::Tools::file_set_contents($PCIID_RESERVATION_FILE, $data);
> +};
> +
> +sub remove_pci_reservation {
> + my ($id) = @_;
it could be an optimization to allow passing a array ref of IDs, less open+writes+rename,
while thanks to /run being a tmpfs no IO is happening but syscalls are expensive, so
when they're as easy to reduce as here I'd opt for that. Could do the same in a similar
spirit for registering then.
> +
> + my $code = sub {
> + my $pciids = $parse_pci_reservation->();
> +
> + delete $pciids->{$id};
> +
> + $write_pci_reservation->($pciids);
> + };
> +
> + PVE::Tools::lock_file($PCIID_RESERVATION_LCK, 10, $code);
> + die $@ if $@;
> +
> + return;
> +}
> +
> +sub reserve_pci_usage {
> + my ($id, $vmid, $timeout, $pid) = @_;
> +
> + PVE::Tools::lock_file($PCIID_RESERVATION_LCK, 10, sub {
> +
> + my $ctime = time();
> + my $errmsg = "PCI device '$id' already in use\n";
> +
> + my $pciids = $parse_pci_reservation->();
> +
> + if (my $pciid = $pciids->{$id}) {
> + if ($pciid->{vmid} != $vmid) {
> + # check time based reservation
> + die $errmsg if defined($pciid->{timestamp}) && $pciid->{timestamp} > $ctime;
> +
> + # check running vm
> + my $pid = PVE::QemuServer::Helpers::vm_running_locally($pciid->{vmid});
> + die $errmsg if defined($pciid->{pid}) && $pid == $pciid->{pid};
Maybe we should warn in the != case, as that would mean a registration was not
cleaned up nicely and recording that in the task log should not hurt.
> + }
> + }
> +
> + $pciids->{$id} = {
> + vmid => $vmid,
> + };
> + if (defined($timeout)) {
> + # we save the timestamp when the reservation gets invalid (+5s),
> + # since the start timeout depends on the config
> + $pciids->{$id}->{timestamp} = $ctime + $timeout + 5;
> + }
> + $pciids->{$id}->{pid} = $pid if defined($pid);
> +
> + $write_pci_reservation->($pciids);
> +
> + return;
> + });
> + die $@ if $@;
> +
> + return;
> +}
> +
> 1;
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-10-07 12:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 9:37 [pve-devel] [PATCH qemu-server v2 0/2] fix #3258: check for in-use pci devices on vm start Dominik Csapak
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 1/2] pci: add helpers to (un)reserve pciids for a vm Dominik Csapak
2021-10-07 11:59 ` Thomas Lamprecht
2021-10-07 9:37 ` [pve-devel] [PATCH qemu-server v2 2/2] fix #3258: block vm start when pci device is already in use Dominik Csapak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox