From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 3A05B621A4 for ; Wed, 19 Jan 2022 15:32:28 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2D5851A6A4 for ; Wed, 19 Jan 2022 15:31:58 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id C73731A699 for ; Wed, 19 Jan 2022 15:31:56 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 9F27A460BD for ; Wed, 19 Jan 2022 15:31:56 +0100 (CET) Date: Wed, 19 Jan 2022 15:31:49 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Fabian Ebner , pve-devel@lists.proxmox.com References: <20211222135257.3242938-1-f.gruenbichler@proxmox.com> <20211222135257.3242938-4-f.gruenbichler@proxmox.com> In-Reply-To: < MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1642591184.x9eek9jlmc.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.217 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH v3 guest-common 3/3] add storage tunnel module X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jan 2022 14:32:28 -0000 On January 3, 2022 3:30 pm, Fabian Ebner wrote: > Am 12/22/21 um 14:52 schrieb Fabian Gr=C3=BCnbichler: >> encapsulating storage-related tunnel methods, currently >> - source-side storage-migrate helper >> - target-side disk-import handler >> - target-side query-disk-import handler >>=20 >> to be extended further with replication-related handlers and helpers. >>=20 >> Signed-off-by: Fabian Gr=C3=BCnbichler >> --- >>=20 >> Notes: >=20 > Needs bump for pve-storage (for storage_migration_snapshot,=20 > volume_import_start, etc.) >=20 >> new in v3, includes code previously in qemu-server >>=20 >> src/Makefile | 1 + >> src/PVE/StorageTunnel.pm | 231 +++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 232 insertions(+) >> create mode 100644 src/PVE/StorageTunnel.pm >>=20 >> diff --git a/src/Makefile b/src/Makefile >> index d82162c..baa2688 100644 >> --- a/src/Makefile >> +++ b/src/Makefile >> @@ -12,6 +12,7 @@ install: PVE >> install -m 0644 PVE/ReplicationConfig.pm ${PERL5DIR}/PVE/ >> install -m 0644 PVE/ReplicationState.pm ${PERL5DIR}/PVE/ >> install -m 0644 PVE/Replication.pm ${PERL5DIR}/PVE/ >> + install -m 0644 PVE/StorageTunnel.pm ${PERL5DIR}/PVE/ >> install -m 0644 PVE/Tunnel.pm ${PERL5DIR}/PVE/ >> install -d ${PERL5DIR}/PVE/VZDump >> install -m 0644 PVE/VZDump/Plugin.pm ${PERL5DIR}/PVE/VZDump/ >> diff --git a/src/PVE/StorageTunnel.pm b/src/PVE/StorageTunnel.pm >> new file mode 100644 >> index 0000000..06902ef >> --- /dev/null >> +++ b/src/PVE/StorageTunnel.pm >> @@ -0,0 +1,231 @@ >> +package PVE::StorageTunnel; >> + >> +use strict; >> +use warnings; >> + >> +use IO::Socket::UNIX; >> +use POSIX qw(WNOHANG); >> +use Socket qw(SOCK_STREAM); >> + >> +use PVE::Tools; >> +use PVE::Tunnel; >> +use PVE::Storage; >=20 > Nit: not ordered alphabetically >=20 >> + >> +sub storage_migrate { >> + my ($tunnel, $storecfg, $volid, $local_vmid, $remote_vmid, $opts, $= log) =3D @_; >> + >> + my $bwlimit =3D $opts->{bwlimit}; >> + my $targetsid =3D $opts->{targetsid}; >> + >> + # use 'migrate' limit for transfer to other node >> + my $bwlimit_opts =3D { >> + storage =3D> $targetsid, >> + bwlimit =3D> $bwlimit, >> + }; >> + my $remote_bwlimit =3D PVE::Tunnel::write_tunnel($tunnel, 10, 'bwli= mit', $bwlimit_opts); >=20 > Should this be done on the call side instead? For re-using the helper=20 > with replication, the 'bwlimit' command shouldn't happen, as there, the=20 > only relevant limit is the one from the job configuration (even the=20 > 'default' storage limit isn't used AFAICT). actually I am not so sure for remote replication - there it might make=20 sense to have bwlimit configurable on either side (although of course,=20 since the limiting happens on the client side the remote still needs to=20 trust it to actually honor the limit ;)) >=20 >> + $remote_bwlimit =3D $remote_bwlimit->{bwlimit}; >> + if (defined($remote_bwlimit)) { >> + $bwlimit =3D $remote_bwlimit if !defined($bwlimit); >> + $bwlimit =3D $remote_bwlimit if $remote_bwlimit < $bwlimit; >=20 > Wrong result if $remote_bwlimit =3D 0, $bwlimit > 0, but maybe that=20 > situation is not even possible currently. That aside, isn't the content=20 > of the if block superfluous, because the 'bwlimit'=20 > handler/get_bandwith_limit already determines the minimum from the=20 > passed-along parameter and the storage limit? hmm, no, but on a closer look this is actually wrong, as the=20 get_bandwidth_limit keeps the explicit limit in some situations (based=20 on privilege checks the explicit limit can take precedence over the=20 configured ones even if higher). I'll take a closer look for the next iteration - maybe we need to split=20 the option here into source side bwlimit without override and explicit=20 override bwlimit, and pass neither to the remote side, and then combine=20 source, remote and override here similar to what get_bandwidth_limit=20 those (or by extending it) :-/ >=20 >> + } >> + >> + # JSONSchema and get_bandwidth_limit use kbps - storage_migrate bps >> + $bwlimit =3D $bwlimit * 1024 if defined($bwlimit); >> + >> + # adapt volume name for import call >> + my ($sid, undef) =3D PVE::Storage::parse_volume_id($volid); >> + my (undef, $name, undef, undef, undef, undef, $format) =3D PVE::Sto= rage::parse_volname($storecfg, $volid); >> + my $scfg =3D PVE::Storage::storage_config($storecfg, $sid); >> + PVE::Storage::activate_volumes($storecfg, [$volid]); >> + if ($local_vmid !=3D $remote_vmid) { >=20 > There can be guests where the owner's VM ID and the disk's VM ID don't=20 > match (e.g. manual reassign via 'qm set'), but I'm not sure it's worth=20 > bothering about. true, we could detect this here and either abort or warn and override=20 $local_vmid.. >=20 >> + $name =3D~ s/-$local_vmid-/-$remote_vmid-/g; >> + $name =3D~ s/^$local_vmid\///; # re-added on target if dir-based stora= ge >> + } >> + >> + my $with_snapshots =3D $opts->{snapshots} ? 1 : 0; >> + my $snapshot; >> + my $migration_snapshot =3D PVE::Storage::storage_migrate_snapshot($= storecfg, $sid); >> + if ($migration_snapshot) { >> + $snapshot =3D '__migration__'; >> + $with_snapshots =3D 1; >> + } >> + >> + my @export_formats =3D PVE::Storage::volume_export_formats($storecf= g, $volid, undef, undef, $with_snapshots); >> + >> + my $disk_import_opts =3D { >> + format =3D> $format, >> + storage =3D> $targetsid, >> + snapshot =3D> $snapshot, >> + migration_snapshot =3D> $migration_snapshot, >> + 'with-snapshots' =3D> $with_snapshots, >> + 'allow-rename' =3D> !$opts->{is_vmstate}, >> + 'export-formats' =3D> join(",", @export_formats), >> + volname =3D> $name, >> + }; >> + my $res =3D PVE::Tunnel::write_tunnel($tunnel, 600, 'disk-import', = $disk_import_opts); >> + my $local =3D "/run/qemu-server/$local_vmid.storage"; >> + if (!$tunnel->{forwarded}->{$local}) { >> + PVE::Tunnel::forward_unix_socket($tunnel, $local, $res->{socket}); >> + } >> + my $socket =3D IO::Socket::UNIX->new(Peer =3D> $local, Type =3D> SO= CK_STREAM()) >> + or die "failed to connect to websocket tunnel at $local\n"; >> + # we won't be reading from the socket >> + shutdown($socket, 0); >> + >> + my $disk_export_opts =3D { >> + snapshot =3D> $snapshot, >> + migration_snapshot =3D> $migration_snapshot, >> + 'with-snapshots' =3D> $with_snapshots, >> + ratelimit_bps =3D> $bwlimit, >> + cmd =3D> { >> + output =3D> '>&'.fileno($socket), >> + }, >> + }; >> + >> + eval { >> + PVE::Storage::volume_export_start( >> + $storecfg, >> + $volid, >> + $res->{format}, >> + $disk_export_opts, >> + ); >> + }; >> + my $send_error =3D $@; >> + warn "$send_error\n" if $send_error; >> + >> + # don't close the connection entirely otherwise the >> + # receiving end might not get all buffered data (and >> + # fails with 'connection reset by peer') >> + shutdown($socket, 1); >> + >> + # wait for the remote process to finish >> + my $new_volid; >> + while ($res =3D PVE::Tunnel::write_tunnel($tunnel, 10, 'query-disk-= import')) { >> + if ($res->{status} eq 'pending') { >> + if (my $msg =3D $res->{msg}) { >> + $log->("disk-import: $msg\n"); >> + } else { >> + $log->("waiting for disk import to finish..\n"); >> + } >> + sleep(1) >> + } elsif ($res->{status} eq 'complete') { >> + $new_volid =3D $res->{volid}; >> + last; >> + } else { >> + warn "unknown query-disk-import result: $res->{status}\n"; >> + last; >> + } >> + } >> + >> + # now close the socket >> + close($socket); >> + if ($snapshot) { >> + eval { PVE::Storage::volume_snapshot_delete($storecfg, $volid, $snapsh= ot, 0) }; >> + warn "could not remove source snapshot: $@\n" if $@; >> + } >> + die $send_error if $send_error; >> + die "disk import failed - see log above\n" if !$new_volid; >> + >> + return $new_volid; >> +} >> + >> +our $cmd_schema =3D { >=20 > Maybe 'bwlimit' should also live here (and for future re-usability=20 > become agnostic towards the operation for the limit, rather than=20 > hard-coding 'migration')? yeah, we could do that. >=20 > And maybe we should add > 'query-disk-import' =3D> {}, > for completeness? doesn't hurt either. >=20 >> + 'disk-import' =3D> { >> + volname =3D> { >> + type =3D> 'string', >> + description =3D> 'volume name to use as preferred target volume na= me', >> + }, >> + format =3D> PVE::JSONSchema::get_standard_option('pve-qm-image-format'= ), >> + 'export-formats' =3D> { >> + type =3D> 'string', >> + description =3D> 'list of supported export formats', >> + }, >> + storage =3D> { >> + type =3D> 'string', >> + format =3D> 'pve-storage-id', >> + }, >> + 'with-snapshots' =3D> { >> + description =3D> >> + "Whether the stream includes intermediate snapshots", >> + type =3D> 'boolean', >> + optional =3D> 1, >> + default =3D> 0, >> + }, >> + 'allow-rename' =3D> { >> + description =3D> "Choose a new volume ID if the requested " . >> + "volume ID already exists, instead of throwing an error.", >> + type =3D> 'boolean', >> + optional =3D> 1, >> + default =3D> 0, >> + }, >> + }, >> +}; >> + >> +sub handle_disk_import { >> + my ($state, $params) =3D @_; >> + >> + die "disk import already running as PID '$state->{disk_import}->{pi= d}'\n" >> + if $state->{disk_import}->{pid}; >> + >> + my $storage =3D delete $params->{storage}; >> + my $format =3D delete $params->{format}; >> + my $volname =3D delete $params->{volname}; >> + >> + my $import =3D PVE::Storage::volume_import_start($state->{storecfg}= , $storage, $volname, $format, $state->{vmid}, $params); >> + >> + my $socket =3D $import->{socket}; >> + $format =3D delete $import->{format}; >> + >> + $state->{sockets}->{$socket} =3D 1; >> + $state->{disk_import} =3D $import; >> + >> + chown $state->{socket_uid}, -1, $socket; >> + >> + return { >> + socket =3D> $socket, >> + format =3D> $format, >> + }; >> +} >> + >> +sub handle_query_disk_import { >> + my ($state, $params) =3D @_; >> + >> + die "no disk import running\n" >> + if !$state->{disk_import}->{pid}; >> + >> + my $pattern =3D PVE::Storage::volume_imported_message(undef, 1); >> + my $result; >> + eval { >> + my $fh =3D $state->{disk_import}->{fh}; >> + PVE::Tools::run_with_timeout(5, sub { $result =3D <$fh>; }); >> + print "disk-import: $result\n" if $result; >> + }; >=20 > What if the disk import finishes after here... >=20 >> + if ($result && $result =3D~ $pattern) { >> + my $volid =3D $1; >> + waitpid($state->{disk_import}->{pid}, 0); >> + >> + my $unix =3D $state->{disk_import}->{socket}; >> + unlink $unix; >> + delete $state->{sockets}->{$unix}; >> + delete $state->{disk_import}; >> + $state->{cleanup}->{volumes}->{$volid} =3D 1; >> + return { >> + status =3D> "complete", >> + volid =3D> $volid, >> + }; >> + } elsif (!$result && waitpid($state->{disk_import}->{pid}, WNOHANG)= ) { >=20 > ...but before the call to waitpid? Or is it guaranteed to keep running=20 > until we read from the file handle? good catch! no it's not - there is a (small) race window here (`/tmp/sleep` sleeps=20 20s and then prints a line): use strict; use warnings; use IPC::Open3; use POSIX qw(WNOHANG); use PVE::Tools; my $input =3D IO::File->new(); my $info =3D IO::File->new(); my $cpid =3D open3($input, $info, $info, "/tmp/sleep"); my $result; while (!$result) { eval { PVE::Tools::run_with_timeout(5, sub { $result =3D <$info>; }) }; if ($result) { print "res: $result\n"; waitpid($cpid, 0); print "child exited\n"; } else { sleep 30; # trigger race if (waitpid($cpid, WNOHANG)) { print "child exited without result\n"; last; } else { print "still pending\n"; } } } prints: child exited without result with sleep between attempted read and waitpid in the else branch removed: still pending still pending still pending still pending res: OK child exited I guess we can just attempt to read from $fh after the waitpid to get=20 all the remaining output, and then either handle it as error (output,=20 but nothing matching $pattern) or as okay (got a line matching=20 $pattern). or maybe I find some other, more elegant solution that=20 doesn't involve doing sysread/select and manual buffering.. >=20 >> + my $unix =3D $state->{disk_import}->{socket}; >> + unlink $unix; >> + delete $state->{sockets}->{$unix}; >> + delete $state->{disk_import}; >> + >> + return { >> + status =3D> "error", >> + }; >> + } else { >> + return { >> + status =3D> "pending", >> + msg =3D> $result, >> + }; >> + } >> +} >=20