public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Markus Frank <m.frank@proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server v8 4/7] feature #1027: virtio-fs support
Date: Wed, 31 Jan 2024 16:02:06 +0100	[thread overview]
Message-ID: <2c95ac42-2085-47ea-b5b4-97cd8f8d2cd0@proxmox.com> (raw)
In-Reply-To: <20231108085254.53574-5-m.frank@proxmox.com>

Am 08.11.23 um 09:52 schrieb Markus Frank:
> add support for sharing directories with a guest vm
> 
> virtio-fs needs virtiofsd to be started.
> In order to start virtiofsd as a process (despite being a daemon it is does not run
> in the background), a double-fork is used.
> 
> virtiofsd should close itself together with qemu.
> 
> There are the parameters dirid
> and the optional parameters direct-io & cache.
> Additionally the xattr & acl parameter overwrite the
> directory mapping settings for xattr & acl.
> 
> The dirid gets mapped to the path on the current node
> and is also used as a mount-tag (name used to mount the
> device on the guest).
> 
> example config:
> ```
> virtiofs0: foo,direct-io=1,cache=always,acl=1
> virtiofs1: dirid=bar,cache=never,xattr=1
> ```
> 
> For information on the optional parameters see there:
> https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md
> 
> Signed-off-by: Markus Frank <m.frank@proxmox.com>
> ---
>  PVE/QemuServer.pm        | 185 +++++++++++++++++++++++++++++++++++++++
>  PVE/QemuServer/Memory.pm |  25 ++++--

>  debian/control           |   1 +

I'd like to have the change to debian/control as a separate preparatory
patch.

>  3 files changed, 205 insertions(+), 6 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2895675..92580df 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -43,6 +43,7 @@ use PVE::PBSClient;
>  use PVE::RESTEnvironment qw(log_warn);
>  use PVE::RPCEnvironment;
>  use PVE::Storage;
> +use PVE::Mapping::Dir;

So the missing include of PVE::Storage::Plugin in PVE::Mapping::Dir I
mentioned in the guest-common patch is the reason it's not sorted
alphabetically here ;)

>  use PVE::SysFSTools;
>  use PVE::Systemd;
>  use PVE::Tools qw(run_command file_read_firstline file_get_contents dir_glob_foreach get_host_arch $IPV6RE);
> @@ -277,6 +278,42 @@ my $rng_fmt = {
>      },
>  };
>  

I'd like to have the format/helpers/etc. (e.g. config_to_command could
also call a helper) live in a dedicated module
PVE/QemuServer/Virtiofs.pm. We should aim to make PVE/QemuServer.pm
smaller, not bigger.

> +my $virtiofs_fmt = {
> +    'dirid' => {
> +	type => 'string',
> +	default_key => 1,
> +	description => "Mapping identifier of the directory mapping to be"
> +	    ." shared with the guest. Also used as a mount tag inside the VM.",
> +	format_description => 'mapping-id',
> +	format => 'pve-configid',
> +    },
> +    'cache' => {
> +	type => 'string',
> +	description => "The caching policy the file system should use"
> +	    ." (auto, always, never).",

Style nit: can fit on one line with the 100 character limit

> +	format_description => "virtiofs-cache",

format_description is (usually) used to tell a user how the string
should look like if it has a special format, e.g. base64, cidr. It's not
needed here when there already is an enum, and "virtiofs-xyz" is not
really helping to clarify this.

> +	enum => [qw(auto always never)],
> +	optional => 1,

Missing default

> +    },
> +    'direct-io' => {
> +	type => 'boolean',
> +	description => "Honor the O_DIRECT flag passed down by guest applications",
> +	format_description => "virtiofs-directio",

Similar here

> +	optional => 1,

Missing default

> +    },
> +    xattr => {
> +	type => 'boolean',
> +	description => "Enable support for extended attributes.",

Could mention it's an override for the value coming from the mapping.

> +	optional => 1,

Missing default, i.e. "use value from mapping"

> +    },
> +    acl => {
> +	type => 'boolean',
> +	description => "Enable support for posix ACLs (implies --xattr).",

Could mention it's an override for the value coming from the mapping.

> +	optional => 1,

Missing default, i.e. "use value from mapping"

> +    },
> +};
> +PVE::JSONSchema::register_format('pve-qm-virtiofs', $virtiofs_fmt);
> +
>  my $meta_info_fmt = {
>      'ctime' => {
>  	type => 'integer',
> @@ -839,6 +876,7 @@ while (my ($k, $v) = each %$confdesc) {
>  }
>  
>  my $MAX_NETS = 32;
> +my $MAX_VIRTIOFS = 10;

Is there a specific reason for ten or just to have an initial limit
that's not too small? Does it still work nicely with all ten?

>  my $MAX_SERIAL_PORTS = 4;
>  my $MAX_PARALLEL_PORTS = 3;
>  
> @@ -948,6 +986,21 @@ my $netdesc = {
>  
>  PVE::JSONSchema::register_standard_option("pve-qm-net", $netdesc);
>  
> +my $virtiofsdesc = {
> +    optional => 1,
> +    type => 'string', format => $virtiofs_fmt,
> +    description => "share files between host and guest",

Nit: s/files/a directory/ is slightly more precise?

> +};
> +PVE::JSONSchema::register_standard_option("pve-qm-virtiofs", $virtiofsdesc);
> +
> +sub max_virtiofs {
> +    return $MAX_VIRTIOFS;
> +}
> +
> +for (my $i = 0; $i < $MAX_VIRTIOFS; $i++)  {
> +    $confdesc->{"virtiofs$i"} = $virtiofsdesc;
> +}
> +
>  my $ipconfig_fmt = {
>      ip => {
>  	type => 'string',
> @@ -4055,6 +4108,23 @@ sub config_to_command {
>  	push @$devices, '-device', $netdevicefull;
>      }
>  
> +    my $virtiofs_enabled = 0;
> +    for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) {
> +	my $opt = "virtiofs$i";
> +
> +	next if !$conf->{$opt};
> +	my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +	next if !$virtiofs;
> +
> +	check_virtiofs_config ($conf, $virtiofs);

Style nit: space between function name and parenthesis

> +
> +	push @$devices, '-chardev', "socket,id=virtfs$i,path=/var/run/virtiofsd/vm$vmid-fs$i";
> +	push @$devices, '-device', 'vhost-user-fs-pci,queue-size=1024'

Any specific reason for queue-size=1024? Better performance than the
default 128?

> +	    .",chardev=virtfs$i,tag=$virtiofs->{dirid}";
> +
> +	$virtiofs_enabled = 1;
> +    }
> +
>      if ($conf->{ivshmem}) {
>  	my $ivshmem = parse_property_string($ivshmem_fmt, $conf->{ivshmem});
>  
> @@ -4114,6 +4184,14 @@ sub config_to_command {
>      }
>      push @$machineFlags, "type=${machine_type_min}";
>  
> +    if ($virtiofs_enabled && !$conf->{numa}) {
> +	# kvm: '-machine memory-backend' and '-numa memdev' properties are
> +	# mutually exclusive
> +	push @$devices, '-object', 'memory-backend-memfd,id=virtiofs-mem'
> +	    .",size=$conf->{memory}M,share=on";
> +	push @$machineFlags, 'memory-backend=virtiofs-mem';
> +    }

I'd like to have this handled in the PVE::QemuServer::Memory::config()
call (need to additionally pass along $machineFlags of course).

> +
>      push @$cmd, @$devices;
>      push @$cmd, '-rtc', join(',', @$rtcFlags) if scalar(@$rtcFlags);
>      push @$cmd, '-machine', join(',', @$machineFlags) if scalar(@$machineFlags);
> @@ -4140,6 +4218,96 @@ sub config_to_command {
>      return wantarray ? ($cmd, $vollist, $spice_port, $pci_devices) : $cmd;
>  }
>  
> +sub check_virtiofs_config {

Since this dies, maybe assert_ instead of check_?

> +    my ($conf, $virtiofs) = @_;
> +    my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}};
> +    my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid});
> +
> +    my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'};
> +    if ($acl && windows_version($conf->{ostype})) {
> +	log_warn(
> +	    "Please disable ACLs for virtiofs on Windows VMs, otherwise"
> +	    ." the virtiofs shared directory cannot be mounted.\n"

A great, the warning is already here :)
Nit: no need for "\n" with log_warn()

> +	);
> +    }
> +
> +    if (!$node_list || scalar($node_list->@*) != 1) {
> +	die "virtiofs needs exactly one mapping for this node\n";
> +    }
> +
> +    eval {
> +	PVE::Mapping::Dir::assert_valid($node_list->[0]);
> +    };

Style nit: eval block could be all on one line

> +    if (my $err = $@) {
> +	die "Directory Mapping invalid: $err\n";
> +    }
> +}
> +
> +sub start_virtiofs {
> +    my ($vmid, $fsid, $virtiofs) = @_;
> +
> +    my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}};
> +    my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid});
> +
> +    # Default to dir config xattr & acl settings
> +    my $xattr = $virtiofs->{xattr} // $dir_cfg->{xattr};
> +    my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'};
> +
> +    my $node_cfg = $node_list->[0];
> +    my $path = $node_cfg->{path};
> +    my $socket_path_root = "/var/run/virtiofsd";

I think you can also just use /run instead of /var/run. Could also live
in /run/qemu-server/virtiofsd instead of being stand-alone, but both are
fine by me.

> +    mkdir $socket_path_root;
> +    my $socket_path = "$socket_path_root/vm$vmid-fs$fsid";
> +    unlink($socket_path);
> +    my $socket = IO::Socket::UNIX->new(
> +	Type => SOCK_STREAM,
> +	Local => $socket_path,
> +	Listen => 1,
> +    ) or die "cannot create socket - $!\n";
> +
> +    my $flags = fcntl($socket, F_GETFD, 0)
> +	or die "failed to get file descriptor flags: $!\n";
> +    fcntl($socket, F_SETFD, $flags & ~FD_CLOEXEC)
> +	or die "failed to remove FD_CLOEXEC from file descriptor\n";
> +
> +    my $fd = $socket->fileno();
> +
> +    my $virtiofsd_bin = '/usr/libexec/virtiofsd';
> +
> +    my $pid = fork();
> +    if ($pid == 0) {
> +	setsid();
> +	$0 = "task pve-vm$vmid-virtiofs$fsid";
> +	for my $fd_loop (3 .. POSIX::sysconf( &POSIX::_SC_OPEN_MAX )) {

Is there no better way to avoid this large number of close() calls (most
of which are not needed)?

> +	    POSIX::close($fd_loop) if ($fd_loop != $fd);

Style nit: no need for the parentheses with post-if

> +	}
> +
> +	my $pid2 = fork();
> +	if ($pid2 == 0) {
> +	    my $cmd = [$virtiofsd_bin, "--fd=$fd", "--shared-dir=$path"];
> +	    push @$cmd, '--xattr' if $xattr;
> +	    push @$cmd, '--posix-acl' if $acl;
> +	    push @$cmd, '--announce-submounts' if ($node_cfg->{submounts});
> +	    push @$cmd, '--allow-direct-io' if ($virtiofs->{'direct-io'});
> +	    push @$cmd, "--cache=$virtiofs->{'cache'}" if ($virtiofs->{'cache'});
> +	    push @$cmd, '--syslog';
> +	    exec(@$cmd);
> +	} elsif (!defined($pid2)) {
> +	    die "could not fork to start virtiofsd\n";
> +	} else {
> +	    POSIX::_exit(0);
> +	}
> +    } elsif (!defined($pid)) {
> +	die "could not fork to start virtiofsd\n";
> +    } else {
> +	waitpid($pid, 0);
> +    }
> +
> +    # return socket to keep it alive,
> +    # so that qemu will wait for virtiofsd to start

Nit: s/qemu/QEMU

> +    return $socket;
> +}
> +
>  sub check_rng_source {
>      my ($source) = @_;
>  
> @@ -5835,6 +6003,18 @@ sub vm_start_nolock {
>  	PVE::Tools::run_fork sub {
>  	    PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM $vmid", %systemd_properties);
>  
> +	    my @virtiofs_sockets;
> +	    for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) {
> +		my $opt = "virtiofs$i";
> +
> +		next if !$conf->{$opt};
> +		my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +		next if !$virtiofs;
> +
> +		my $virtiofs_socket = start_virtiofs($vmid, $i, $virtiofs);
> +		push @virtiofs_sockets, $virtiofs_socket;
> +	    }
> +
>  	    my $tpmpid;
>  	    if ((my $tpm = $conf->{tpmstate0}) && !PVE::QemuConfig->is_template($conf)) {
>  		# start the TPM emulator so QEMU can connect on start
> @@ -5849,6 +6029,11 @@ sub vm_start_nolock {
>  		}
>  		die "QEMU exited with code $exitcode\n";
>  	    }
> +
> +	    foreach my $virtiofs_socket (@virtiofs_sockets) {

Style nit: for

> +		shutdown($virtiofs_socket, 2);
> +		close($virtiofs_socket);
> +	    }

Shouldn't this also be done when QEMU start fails?

>  	};
>      };
>  
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index f365f2d..647595a 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -367,6 +367,16 @@ sub config {
>  
>      die "numa needs to be enabled to use hugepages" if $conf->{hugepages} && !$conf->{numa};
>  
> +    my $virtiofs_enabled = 0;
> +    for (my $i = 0; $i < PVE::QemuServer::max_virtiofs(); $i++) {

This is one reason it shoulb live in its own module.
PVE/QemuServer/Memory.pm should not include or call into
PVE/QemuServer.pm, that would be cyclic and can lead to strange issues
down the line.

> +	my $opt = "virtiofs$i";
> +	next if !$conf->{$opt};
> +	my $virtiofs = PVE::JSONSchema::parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +	if ($virtiofs) {
> +	    $virtiofs_enabled = 1;
> +	}
> +    }
> +
>      if ($conf->{numa}) {
>  
>  	my $numa_totalmemory = undef;




  reply	other threads:[~2024-01-31 15:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08  8:52 [pve-devel] [PATCH cluster/guest-common/docs/qemu-server/manager v8 0/7] virtiofs Markus Frank
2023-11-08  8:52 ` [pve-devel] [PATCH cluster v8 1/7] add mapping/dir.cfg for resource mapping Markus Frank
2024-01-31 12:01   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH guest-common v8 2/7] add Dir mapping config Markus Frank
2024-01-31 12:01   ` Fiona Ebner
2024-01-31 13:42     ` Markus Frank
2024-01-31 13:53       ` Fiona Ebner
2024-01-31 14:00         ` Fiona Ebner
2024-01-31 14:15           ` Markus Frank
2024-01-31 13:02   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH docs v8 3/7] added shared filesystem doc for virtio-fs Markus Frank
2024-01-31 13:26   ` Fiona Ebner
2024-01-31 13:34   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 4/7] feature #1027: virtio-fs support Markus Frank
2024-01-31 15:02   ` Fiona Ebner [this message]
2024-02-13 11:52     ` Markus Frank
2024-02-13 12:04       ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 5/7] Permission check for virtiofs directory access Markus Frank
2024-01-31 15:23   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 6/7] check_local_resources: virtiofs Markus Frank
2024-01-31 15:35   ` Fiona Ebner
2024-02-22 10:44     ` Markus Frank
2023-11-08  8:52 ` [pve-devel] [PATCH manager v8 7/7] api: add resource map api endpoints for directories Markus Frank
2024-01-31 15:56   ` Fiona Ebner
2024-01-30 12:31 ` [pve-devel] [PATCH cluster/guest-common/docs/qemu-server/manager v8 0/7] virtiofs Markus Frank
2024-01-31 12:01 ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c95ac42-2085-47ea-b5b4-97cd8f8d2cd0@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=m.frank@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal