From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 5342F91FB7 for ; Wed, 31 Jan 2024 16:02:40 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1CE163CEE2 for ; Wed, 31 Jan 2024 16:02:10 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Wed, 31 Jan 2024 16:02:08 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 5C31F4543C for ; Wed, 31 Jan 2024 16:02:08 +0100 (CET) Message-ID: <2c95ac42-2085-47ea-b5b4-97cd8f8d2cd0@proxmox.com> Date: Wed, 31 Jan 2024 16:02:06 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Proxmox VE development discussion , Markus Frank References: <20231108085254.53574-1-m.frank@proxmox.com> <20231108085254.53574-5-m.frank@proxmox.com> From: Fiona Ebner In-Reply-To: <20231108085254.53574-5-m.frank@proxmox.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL -0.073 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [gitlab.com, virtiofs.pm, qemuserver.pm, memory.pm] Subject: Re: [pve-devel] [PATCH qemu-server v8 4/7] feature #1027: virtio-fs support X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2024 15:02:40 -0000 Am 08.11.23 um 09:52 schrieb Markus Frank: > add support for sharing directories with a guest vm > > virtio-fs needs virtiofsd to be started. > In order to start virtiofsd as a process (despite being a daemon it is does not run > in the background), a double-fork is used. > > virtiofsd should close itself together with qemu. > > There are the parameters dirid > and the optional parameters direct-io & cache. > Additionally the xattr & acl parameter overwrite the > directory mapping settings for xattr & acl. > > The dirid gets mapped to the path on the current node > and is also used as a mount-tag (name used to mount the > device on the guest). > > example config: > ``` > virtiofs0: foo,direct-io=1,cache=always,acl=1 > virtiofs1: dirid=bar,cache=never,xattr=1 > ``` > > For information on the optional parameters see there: > https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md > > Signed-off-by: Markus Frank > --- > PVE/QemuServer.pm | 185 +++++++++++++++++++++++++++++++++++++++ > PVE/QemuServer/Memory.pm | 25 ++++-- > debian/control | 1 + I'd like to have the change to debian/control as a separate preparatory patch. > 3 files changed, 205 insertions(+), 6 deletions(-) > > diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm > index 2895675..92580df 100644 > --- a/PVE/QemuServer.pm > +++ b/PVE/QemuServer.pm > @@ -43,6 +43,7 @@ use PVE::PBSClient; > use PVE::RESTEnvironment qw(log_warn); > use PVE::RPCEnvironment; > use PVE::Storage; > +use PVE::Mapping::Dir; So the missing include of PVE::Storage::Plugin in PVE::Mapping::Dir I mentioned in the guest-common patch is the reason it's not sorted alphabetically here ;) > use PVE::SysFSTools; > use PVE::Systemd; > use PVE::Tools qw(run_command file_read_firstline file_get_contents dir_glob_foreach get_host_arch $IPV6RE); > @@ -277,6 +278,42 @@ my $rng_fmt = { > }, > }; > I'd like to have the format/helpers/etc. (e.g. config_to_command could also call a helper) live in a dedicated module PVE/QemuServer/Virtiofs.pm. We should aim to make PVE/QemuServer.pm smaller, not bigger. > +my $virtiofs_fmt = { > + 'dirid' => { > + type => 'string', > + default_key => 1, > + description => "Mapping identifier of the directory mapping to be" > + ." shared with the guest. Also used as a mount tag inside the VM.", > + format_description => 'mapping-id', > + format => 'pve-configid', > + }, > + 'cache' => { > + type => 'string', > + description => "The caching policy the file system should use" > + ." (auto, always, never).", Style nit: can fit on one line with the 100 character limit > + format_description => "virtiofs-cache", format_description is (usually) used to tell a user how the string should look like if it has a special format, e.g. base64, cidr. It's not needed here when there already is an enum, and "virtiofs-xyz" is not really helping to clarify this. > + enum => [qw(auto always never)], > + optional => 1, Missing default > + }, > + 'direct-io' => { > + type => 'boolean', > + description => "Honor the O_DIRECT flag passed down by guest applications", > + format_description => "virtiofs-directio", Similar here > + optional => 1, Missing default > + }, > + xattr => { > + type => 'boolean', > + description => "Enable support for extended attributes.", Could mention it's an override for the value coming from the mapping. > + optional => 1, Missing default, i.e. "use value from mapping" > + }, > + acl => { > + type => 'boolean', > + description => "Enable support for posix ACLs (implies --xattr).", Could mention it's an override for the value coming from the mapping. > + optional => 1, Missing default, i.e. "use value from mapping" > + }, > +}; > +PVE::JSONSchema::register_format('pve-qm-virtiofs', $virtiofs_fmt); > + > my $meta_info_fmt = { > 'ctime' => { > type => 'integer', > @@ -839,6 +876,7 @@ while (my ($k, $v) = each %$confdesc) { > } > > my $MAX_NETS = 32; > +my $MAX_VIRTIOFS = 10; Is there a specific reason for ten or just to have an initial limit that's not too small? Does it still work nicely with all ten? > my $MAX_SERIAL_PORTS = 4; > my $MAX_PARALLEL_PORTS = 3; > > @@ -948,6 +986,21 @@ my $netdesc = { > > PVE::JSONSchema::register_standard_option("pve-qm-net", $netdesc); > > +my $virtiofsdesc = { > + optional => 1, > + type => 'string', format => $virtiofs_fmt, > + description => "share files between host and guest", Nit: s/files/a directory/ is slightly more precise? > +}; > +PVE::JSONSchema::register_standard_option("pve-qm-virtiofs", $virtiofsdesc); > + > +sub max_virtiofs { > + return $MAX_VIRTIOFS; > +} > + > +for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) { > + $confdesc->{"virtiofs$i"} = $virtiofsdesc; > +} > + > my $ipconfig_fmt = { > ip => { > type => 'string', > @@ -4055,6 +4108,23 @@ sub config_to_command { > push @$devices, '-device', $netdevicefull; > } > > + my $virtiofs_enabled = 0; > + for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) { > + my $opt = "virtiofs$i"; > + > + next if !$conf->{$opt}; > + my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt}); > + next if !$virtiofs; > + > + check_virtiofs_config ($conf, $virtiofs); Style nit: space between function name and parenthesis > + > + push @$devices, '-chardev', "socket,id=virtfs$i,path=/var/run/virtiofsd/vm$vmid-fs$i"; > + push @$devices, '-device', 'vhost-user-fs-pci,queue-size=1024' Any specific reason for queue-size=1024? Better performance than the default 128? > + .",chardev=virtfs$i,tag=$virtiofs->{dirid}"; > + > + $virtiofs_enabled = 1; > + } > + > if ($conf->{ivshmem}) { > my $ivshmem = parse_property_string($ivshmem_fmt, $conf->{ivshmem}); > > @@ -4114,6 +4184,14 @@ sub config_to_command { > } > push @$machineFlags, "type=${machine_type_min}"; > > + if ($virtiofs_enabled && !$conf->{numa}) { > + # kvm: '-machine memory-backend' and '-numa memdev' properties are > + # mutually exclusive > + push @$devices, '-object', 'memory-backend-memfd,id=virtiofs-mem' > + .",size=$conf->{memory}M,share=on"; > + push @$machineFlags, 'memory-backend=virtiofs-mem'; > + } I'd like to have this handled in the PVE::QemuServer::Memory::config() call (need to additionally pass along $machineFlags of course). > + > push @$cmd, @$devices; > push @$cmd, '-rtc', join(',', @$rtcFlags) if scalar(@$rtcFlags); > push @$cmd, '-machine', join(',', @$machineFlags) if scalar(@$machineFlags); > @@ -4140,6 +4218,96 @@ sub config_to_command { > return wantarray ? ($cmd, $vollist, $spice_port, $pci_devices) : $cmd; > } > > +sub check_virtiofs_config { Since this dies, maybe assert_ instead of check_? > + my ($conf, $virtiofs) = @_; > + my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}}; > + my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid}); > + > + my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'}; > + if ($acl && windows_version($conf->{ostype})) { > + log_warn( > + "Please disable ACLs for virtiofs on Windows VMs, otherwise" > + ." the virtiofs shared directory cannot be mounted.\n" A great, the warning is already here :) Nit: no need for "\n" with log_warn() > + ); > + } > + > + if (!$node_list || scalar($node_list->@*) != 1) { > + die "virtiofs needs exactly one mapping for this node\n"; > + } > + > + eval { > + PVE::Mapping::Dir::assert_valid($node_list->[0]); > + }; Style nit: eval block could be all on one line > + if (my $err = $@) { > + die "Directory Mapping invalid: $err\n"; > + } > +} > + > +sub start_virtiofs { > + my ($vmid, $fsid, $virtiofs) = @_; > + > + my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}}; > + my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid}); > + > + # Default to dir config xattr & acl settings > + my $xattr = $virtiofs->{xattr} // $dir_cfg->{xattr}; > + my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'}; > + > + my $node_cfg = $node_list->[0]; > + my $path = $node_cfg->{path}; > + my $socket_path_root = "/var/run/virtiofsd"; I think you can also just use /run instead of /var/run. Could also live in /run/qemu-server/virtiofsd instead of being stand-alone, but both are fine by me. > + mkdir $socket_path_root; > + my $socket_path = "$socket_path_root/vm$vmid-fs$fsid"; > + unlink($socket_path); > + my $socket = IO::Socket::UNIX->new( > + Type => SOCK_STREAM, > + Local => $socket_path, > + Listen => 1, > + ) or die "cannot create socket - $!\n"; > + > + my $flags = fcntl($socket, F_GETFD, 0) > + or die "failed to get file descriptor flags: $!\n"; > + fcntl($socket, F_SETFD, $flags & ~FD_CLOEXEC) > + or die "failed to remove FD_CLOEXEC from file descriptor\n"; > + > + my $fd = $socket->fileno(); > + > + my $virtiofsd_bin = '/usr/libexec/virtiofsd'; > + > + my $pid = fork(); > + if ($pid == 0) { > + setsid(); > + $0 = "task pve-vm$vmid-virtiofs$fsid"; > + for my $fd_loop (3 .. POSIX::sysconf( &POSIX::_SC_OPEN_MAX )) { Is there no better way to avoid this large number of close() calls (most of which are not needed)? > + POSIX::close($fd_loop) if ($fd_loop != $fd); Style nit: no need for the parentheses with post-if > + } > + > + my $pid2 = fork(); > + if ($pid2 == 0) { > + my $cmd = [$virtiofsd_bin, "--fd=$fd", "--shared-dir=$path"]; > + push @$cmd, '--xattr' if $xattr; > + push @$cmd, '--posix-acl' if $acl; > + push @$cmd, '--announce-submounts' if ($node_cfg->{submounts}); > + push @$cmd, '--allow-direct-io' if ($virtiofs->{'direct-io'}); > + push @$cmd, "--cache=$virtiofs->{'cache'}" if ($virtiofs->{'cache'}); > + push @$cmd, '--syslog'; > + exec(@$cmd); > + } elsif (!defined($pid2)) { > + die "could not fork to start virtiofsd\n"; > + } else { > + POSIX::_exit(0); > + } > + } elsif (!defined($pid)) { > + die "could not fork to start virtiofsd\n"; > + } else { > + waitpid($pid, 0); > + } > + > + # return socket to keep it alive, > + # so that qemu will wait for virtiofsd to start Nit: s/qemu/QEMU > + return $socket; > +} > + > sub check_rng_source { > my ($source) = @_; > > @@ -5835,6 +6003,18 @@ sub vm_start_nolock { > PVE::Tools::run_fork sub { > PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM $vmid", %systemd_properties); > > + my @virtiofs_sockets; > + for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) { > + my $opt = "virtiofs$i"; > + > + next if !$conf->{$opt}; > + my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt}); > + next if !$virtiofs; > + > + my $virtiofs_socket = start_virtiofs($vmid, $i, $virtiofs); > + push @virtiofs_sockets, $virtiofs_socket; > + } > + > my $tpmpid; > if ((my $tpm = $conf->{tpmstate0}) && !PVE::QemuConfig->is_template($conf)) { > # start the TPM emulator so QEMU can connect on start > @@ -5849,6 +6029,11 @@ sub vm_start_nolock { > } > die "QEMU exited with code $exitcode\n"; > } > + > + foreach my $virtiofs_socket (@virtiofs_sockets) { Style nit: for > + shutdown($virtiofs_socket, 2); > + close($virtiofs_socket); > + } Shouldn't this also be done when QEMU start fails? > }; > }; > > diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm > index f365f2d..647595a 100644 > --- a/PVE/QemuServer/Memory.pm > +++ b/PVE/QemuServer/Memory.pm > @@ -367,6 +367,16 @@ sub config { > > die "numa needs to be enabled to use hugepages" if $conf->{hugepages} && !$conf->{numa}; > > + my $virtiofs_enabled = 0; > + for (my $i = 0; $i < PVE::QemuServer::max_virtiofs(); $i++) { This is one reason it shoulb live in its own module. PVE/QemuServer/Memory.pm should not include or call into PVE/QemuServer.pm, that would be cyclic and can lead to strange issues down the line. > + my $opt = "virtiofs$i"; > + next if !$conf->{$opt}; > + my $virtiofs = PVE::JSONSchema::parse_property_string('pve-qm-virtiofs', $conf->{$opt}); > + if ($virtiofs) { > + $virtiofs_enabled = 1; > + } > + } > + > if ($conf->{numa}) { > > my $numa_totalmemory = undef;