all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Fiona Ebner <f.ebner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Markus Frank <m.frank@proxmox.com>
Subject: Re: [pve-devel] [PATCH qemu-server v8 4/7] feature #1027: virtio-fs support
Date: Wed, 31 Jan 2024 16:02:06 +0100	[thread overview]
Message-ID: <2c95ac42-2085-47ea-b5b4-97cd8f8d2cd0@proxmox.com> (raw)
In-Reply-To: <20231108085254.53574-5-m.frank@proxmox.com>

Am 08.11.23 um 09:52 schrieb Markus Frank:
> add support for sharing directories with a guest vm
> 
> virtio-fs needs virtiofsd to be started.
> In order to start virtiofsd as a process (despite being a daemon it is does not run
> in the background), a double-fork is used.
> 
> virtiofsd should close itself together with qemu.
> 
> There are the parameters dirid
> and the optional parameters direct-io & cache.
> Additionally the xattr & acl parameter overwrite the
> directory mapping settings for xattr & acl.
> 
> The dirid gets mapped to the path on the current node
> and is also used as a mount-tag (name used to mount the
> device on the guest).
> 
> example config:
> ```
> virtiofs0: foo,direct-io=1,cache=always,acl=1
> virtiofs1: dirid=bar,cache=never,xattr=1
> ```
> 
> For information on the optional parameters see there:
> https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md
> 
> Signed-off-by: Markus Frank <m.frank@proxmox.com>
> ---
>  PVE/QemuServer.pm        | 185 +++++++++++++++++++++++++++++++++++++++
>  PVE/QemuServer/Memory.pm |  25 ++++--

>  debian/control           |   1 +

I'd like to have the change to debian/control as a separate preparatory
patch.

>  3 files changed, 205 insertions(+), 6 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 2895675..92580df 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -43,6 +43,7 @@ use PVE::PBSClient;
>  use PVE::RESTEnvironment qw(log_warn);
>  use PVE::RPCEnvironment;
>  use PVE::Storage;
> +use PVE::Mapping::Dir;

So the missing include of PVE::Storage::Plugin in PVE::Mapping::Dir I
mentioned in the guest-common patch is the reason it's not sorted
alphabetically here ;)

>  use PVE::SysFSTools;
>  use PVE::Systemd;
>  use PVE::Tools qw(run_command file_read_firstline file_get_contents dir_glob_foreach get_host_arch $IPV6RE);
> @@ -277,6 +278,42 @@ my $rng_fmt = {
>      },
>  };
>  

I'd like to have the format/helpers/etc. (e.g. config_to_command could
also call a helper) live in a dedicated module
PVE/QemuServer/Virtiofs.pm. We should aim to make PVE/QemuServer.pm
smaller, not bigger.

> +my $virtiofs_fmt = {
> +    'dirid' => {
> +	type => 'string',
> +	default_key => 1,
> +	description => "Mapping identifier of the directory mapping to be"
> +	    ." shared with the guest. Also used as a mount tag inside the VM.",
> +	format_description => 'mapping-id',
> +	format => 'pve-configid',
> +    },
> +    'cache' => {
> +	type => 'string',
> +	description => "The caching policy the file system should use"
> +	    ." (auto, always, never).",

Style nit: can fit on one line with the 100 character limit

> +	format_description => "virtiofs-cache",

format_description is (usually) used to tell a user how the string
should look like if it has a special format, e.g. base64, cidr. It's not
needed here when there already is an enum, and "virtiofs-xyz" is not
really helping to clarify this.

> +	enum => [qw(auto always never)],
> +	optional => 1,

Missing default

> +    },
> +    'direct-io' => {
> +	type => 'boolean',
> +	description => "Honor the O_DIRECT flag passed down by guest applications",
> +	format_description => "virtiofs-directio",

Similar here

> +	optional => 1,

Missing default

> +    },
> +    xattr => {
> +	type => 'boolean',
> +	description => "Enable support for extended attributes.",

Could mention it's an override for the value coming from the mapping.

> +	optional => 1,

Missing default, i.e. "use value from mapping"

> +    },
> +    acl => {
> +	type => 'boolean',
> +	description => "Enable support for posix ACLs (implies --xattr).",

Could mention it's an override for the value coming from the mapping.

> +	optional => 1,

Missing default, i.e. "use value from mapping"

> +    },
> +};
> +PVE::JSONSchema::register_format('pve-qm-virtiofs', $virtiofs_fmt);
> +
>  my $meta_info_fmt = {
>      'ctime' => {
>  	type => 'integer',
> @@ -839,6 +876,7 @@ while (my ($k, $v) = each %$confdesc) {
>  }
>  
>  my $MAX_NETS = 32;
> +my $MAX_VIRTIOFS = 10;

Is there a specific reason for ten or just to have an initial limit
that's not too small? Does it still work nicely with all ten?

>  my $MAX_SERIAL_PORTS = 4;
>  my $MAX_PARALLEL_PORTS = 3;
>  
> @@ -948,6 +986,21 @@ my $netdesc = {
>  
>  PVE::JSONSchema::register_standard_option("pve-qm-net", $netdesc);
>  
> +my $virtiofsdesc = {
> +    optional => 1,
> +    type => 'string', format => $virtiofs_fmt,
> +    description => "share files between host and guest",

Nit: s/files/a directory/ is slightly more precise?

> +};
> +PVE::JSONSchema::register_standard_option("pve-qm-virtiofs", $virtiofsdesc);
> +
> +sub max_virtiofs {
> +    return $MAX_VIRTIOFS;
> +}
> +
> +for (my $i = 0; $i < $MAX_VIRTIOFS; $i++)  {
> +    $confdesc->{"virtiofs$i"} = $virtiofsdesc;
> +}
> +
>  my $ipconfig_fmt = {
>      ip => {
>  	type => 'string',
> @@ -4055,6 +4108,23 @@ sub config_to_command {
>  	push @$devices, '-device', $netdevicefull;
>      }
>  
> +    my $virtiofs_enabled = 0;
> +    for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) {
> +	my $opt = "virtiofs$i";
> +
> +	next if !$conf->{$opt};
> +	my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +	next if !$virtiofs;
> +
> +	check_virtiofs_config ($conf, $virtiofs);

Style nit: space between function name and parenthesis

> +
> +	push @$devices, '-chardev', "socket,id=virtfs$i,path=/var/run/virtiofsd/vm$vmid-fs$i";
> +	push @$devices, '-device', 'vhost-user-fs-pci,queue-size=1024'

Any specific reason for queue-size=1024? Better performance than the
default 128?

> +	    .",chardev=virtfs$i,tag=$virtiofs->{dirid}";
> +
> +	$virtiofs_enabled = 1;
> +    }
> +
>      if ($conf->{ivshmem}) {
>  	my $ivshmem = parse_property_string($ivshmem_fmt, $conf->{ivshmem});
>  
> @@ -4114,6 +4184,14 @@ sub config_to_command {
>      }
>      push @$machineFlags, "type=${machine_type_min}";
>  
> +    if ($virtiofs_enabled && !$conf->{numa}) {
> +	# kvm: '-machine memory-backend' and '-numa memdev' properties are
> +	# mutually exclusive
> +	push @$devices, '-object', 'memory-backend-memfd,id=virtiofs-mem'
> +	    .",size=$conf->{memory}M,share=on";
> +	push @$machineFlags, 'memory-backend=virtiofs-mem';
> +    }

I'd like to have this handled in the PVE::QemuServer::Memory::config()
call (need to additionally pass along $machineFlags of course).

> +
>      push @$cmd, @$devices;
>      push @$cmd, '-rtc', join(',', @$rtcFlags) if scalar(@$rtcFlags);
>      push @$cmd, '-machine', join(',', @$machineFlags) if scalar(@$machineFlags);
> @@ -4140,6 +4218,96 @@ sub config_to_command {
>      return wantarray ? ($cmd, $vollist, $spice_port, $pci_devices) : $cmd;
>  }
>  
> +sub check_virtiofs_config {

Since this dies, maybe assert_ instead of check_?

> +    my ($conf, $virtiofs) = @_;
> +    my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}};
> +    my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid});
> +
> +    my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'};
> +    if ($acl && windows_version($conf->{ostype})) {
> +	log_warn(
> +	    "Please disable ACLs for virtiofs on Windows VMs, otherwise"
> +	    ." the virtiofs shared directory cannot be mounted.\n"

A great, the warning is already here :)
Nit: no need for "\n" with log_warn()

> +	);
> +    }
> +
> +    if (!$node_list || scalar($node_list->@*) != 1) {
> +	die "virtiofs needs exactly one mapping for this node\n";
> +    }
> +
> +    eval {
> +	PVE::Mapping::Dir::assert_valid($node_list->[0]);
> +    };

Style nit: eval block could be all on one line

> +    if (my $err = $@) {
> +	die "Directory Mapping invalid: $err\n";
> +    }
> +}
> +
> +sub start_virtiofs {
> +    my ($vmid, $fsid, $virtiofs) = @_;
> +
> +    my $dir_cfg = PVE::Mapping::Dir::config()->{ids}->{$virtiofs->{dirid}};
> +    my $node_list = PVE::Mapping::Dir::find_on_current_node($virtiofs->{dirid});
> +
> +    # Default to dir config xattr & acl settings
> +    my $xattr = $virtiofs->{xattr} // $dir_cfg->{xattr};
> +    my $acl = $virtiofs->{'acl'} // $dir_cfg->{'acl'};
> +
> +    my $node_cfg = $node_list->[0];
> +    my $path = $node_cfg->{path};
> +    my $socket_path_root = "/var/run/virtiofsd";

I think you can also just use /run instead of /var/run. Could also live
in /run/qemu-server/virtiofsd instead of being stand-alone, but both are
fine by me.

> +    mkdir $socket_path_root;
> +    my $socket_path = "$socket_path_root/vm$vmid-fs$fsid";
> +    unlink($socket_path);
> +    my $socket = IO::Socket::UNIX->new(
> +	Type => SOCK_STREAM,
> +	Local => $socket_path,
> +	Listen => 1,
> +    ) or die "cannot create socket - $!\n";
> +
> +    my $flags = fcntl($socket, F_GETFD, 0)
> +	or die "failed to get file descriptor flags: $!\n";
> +    fcntl($socket, F_SETFD, $flags & ~FD_CLOEXEC)
> +	or die "failed to remove FD_CLOEXEC from file descriptor\n";
> +
> +    my $fd = $socket->fileno();
> +
> +    my $virtiofsd_bin = '/usr/libexec/virtiofsd';
> +
> +    my $pid = fork();
> +    if ($pid == 0) {
> +	setsid();
> +	$0 = "task pve-vm$vmid-virtiofs$fsid";
> +	for my $fd_loop (3 .. POSIX::sysconf( &POSIX::_SC_OPEN_MAX )) {

Is there no better way to avoid this large number of close() calls (most
of which are not needed)?

> +	    POSIX::close($fd_loop) if ($fd_loop != $fd);

Style nit: no need for the parentheses with post-if

> +	}
> +
> +	my $pid2 = fork();
> +	if ($pid2 == 0) {
> +	    my $cmd = [$virtiofsd_bin, "--fd=$fd", "--shared-dir=$path"];
> +	    push @$cmd, '--xattr' if $xattr;
> +	    push @$cmd, '--posix-acl' if $acl;
> +	    push @$cmd, '--announce-submounts' if ($node_cfg->{submounts});
> +	    push @$cmd, '--allow-direct-io' if ($virtiofs->{'direct-io'});
> +	    push @$cmd, "--cache=$virtiofs->{'cache'}" if ($virtiofs->{'cache'});
> +	    push @$cmd, '--syslog';
> +	    exec(@$cmd);
> +	} elsif (!defined($pid2)) {
> +	    die "could not fork to start virtiofsd\n";
> +	} else {
> +	    POSIX::_exit(0);
> +	}
> +    } elsif (!defined($pid)) {
> +	die "could not fork to start virtiofsd\n";
> +    } else {
> +	waitpid($pid, 0);
> +    }
> +
> +    # return socket to keep it alive,
> +    # so that qemu will wait for virtiofsd to start

Nit: s/qemu/QEMU

> +    return $socket;
> +}
> +
>  sub check_rng_source {
>      my ($source) = @_;
>  
> @@ -5835,6 +6003,18 @@ sub vm_start_nolock {
>  	PVE::Tools::run_fork sub {
>  	    PVE::Systemd::enter_systemd_scope($vmid, "Proxmox VE VM $vmid", %systemd_properties);
>  
> +	    my @virtiofs_sockets;
> +	    for (my $i = 0; $i < $MAX_VIRTIOFS; $i++) {
> +		my $opt = "virtiofs$i";
> +
> +		next if !$conf->{$opt};
> +		my $virtiofs = parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +		next if !$virtiofs;
> +
> +		my $virtiofs_socket = start_virtiofs($vmid, $i, $virtiofs);
> +		push @virtiofs_sockets, $virtiofs_socket;
> +	    }
> +
>  	    my $tpmpid;
>  	    if ((my $tpm = $conf->{tpmstate0}) && !PVE::QemuConfig->is_template($conf)) {
>  		# start the TPM emulator so QEMU can connect on start
> @@ -5849,6 +6029,11 @@ sub vm_start_nolock {
>  		}
>  		die "QEMU exited with code $exitcode\n";
>  	    }
> +
> +	    foreach my $virtiofs_socket (@virtiofs_sockets) {

Style nit: for

> +		shutdown($virtiofs_socket, 2);
> +		close($virtiofs_socket);
> +	    }

Shouldn't this also be done when QEMU start fails?

>  	};
>      };
>  
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index f365f2d..647595a 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -367,6 +367,16 @@ sub config {
>  
>      die "numa needs to be enabled to use hugepages" if $conf->{hugepages} && !$conf->{numa};
>  
> +    my $virtiofs_enabled = 0;
> +    for (my $i = 0; $i < PVE::QemuServer::max_virtiofs(); $i++) {

This is one reason it shoulb live in its own module.
PVE/QemuServer/Memory.pm should not include or call into
PVE/QemuServer.pm, that would be cyclic and can lead to strange issues
down the line.

> +	my $opt = "virtiofs$i";
> +	next if !$conf->{$opt};
> +	my $virtiofs = PVE::JSONSchema::parse_property_string('pve-qm-virtiofs', $conf->{$opt});
> +	if ($virtiofs) {
> +	    $virtiofs_enabled = 1;
> +	}
> +    }
> +
>      if ($conf->{numa}) {
>  
>  	my $numa_totalmemory = undef;




  reply	other threads:[~2024-01-31 15:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08  8:52 [pve-devel] [PATCH cluster/guest-common/docs/qemu-server/manager v8 0/7] virtiofs Markus Frank
2023-11-08  8:52 ` [pve-devel] [PATCH cluster v8 1/7] add mapping/dir.cfg for resource mapping Markus Frank
2024-01-31 12:01   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH guest-common v8 2/7] add Dir mapping config Markus Frank
2024-01-31 12:01   ` Fiona Ebner
2024-01-31 13:42     ` Markus Frank
2024-01-31 13:53       ` Fiona Ebner
2024-01-31 14:00         ` Fiona Ebner
2024-01-31 14:15           ` Markus Frank
2024-01-31 13:02   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH docs v8 3/7] added shared filesystem doc for virtio-fs Markus Frank
2024-01-31 13:26   ` Fiona Ebner
2024-01-31 13:34   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 4/7] feature #1027: virtio-fs support Markus Frank
2024-01-31 15:02   ` Fiona Ebner [this message]
2024-02-13 11:52     ` Markus Frank
2024-02-13 12:04       ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 5/7] Permission check for virtiofs directory access Markus Frank
2024-01-31 15:23   ` Fiona Ebner
2023-11-08  8:52 ` [pve-devel] [PATCH qemu-server v8 6/7] check_local_resources: virtiofs Markus Frank
2024-01-31 15:35   ` Fiona Ebner
2024-02-22 10:44     ` Markus Frank
2023-11-08  8:52 ` [pve-devel] [PATCH manager v8 7/7] api: add resource map api endpoints for directories Markus Frank
2024-01-31 15:56   ` Fiona Ebner
2024-01-30 12:31 ` [pve-devel] [PATCH cluster/guest-common/docs/qemu-server/manager v8 0/7] virtiofs Markus Frank
2024-01-31 12:01 ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c95ac42-2085-47ea-b5b4-97cd8f8d2cd0@proxmox.com \
    --to=f.ebner@proxmox.com \
    --cc=m.frank@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal