From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.gruenbichler@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id 7BB75E8B0
 for <pve-devel@lists.proxmox.com>; Wed, 19 Jul 2023 14:09:26 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 5DF6B69B1
 for <pve-devel@lists.proxmox.com>; Wed, 19 Jul 2023 14:08:56 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pve-devel@lists.proxmox.com>; Wed, 19 Jul 2023 14:08:52 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 8629540F8A
 for <pve-devel@lists.proxmox.com>; Wed, 19 Jul 2023 14:08:52 +0200 (CEST)
Date: Wed, 19 Jul 2023 14:08:45 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
References: <20230706105421.54949-1-m.frank@proxmox.com>
 <20230706105421.54949-5-m.frank@proxmox.com>
In-Reply-To: <20230706105421.54949-5-m.frank@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid)
Message-Id: <1689767039.4bobtqbdw7.astroid@yuna.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.070 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [memory.pm, qemuserver.pm, gitlab.com, proxmox.com]
Subject: Re: [pve-devel] [PATCH qemu-server v6 1/3] feature #1027: virtio-fs
 support
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Wed, 19 Jul 2023 12:09:26 -0000

On July 6, 2023 12:54 pm, Markus Frank wrote:
> adds support for sharing directorys with a guest vm
>=20
> virtio-fs needs virtiofsd to be started.
>=20
> In order to start virtiofsd as a process (despite being a daemon it is do=
es not run
> in the background), a double-fork is used.
>=20
> virtiofsd should close itself together with qemu.
>=20
> There are the parameters dirid & tag
> and the optional parameters direct-io & cache.
>=20
> The dirid gets mapped to the path on the current node.
> The tag parameter is for choosing the tag-name that is used with the
> mount command.
>=20
> example config:
> ```
> virtiofs0: foo,tag=3Dtag1,direct-io=3D1,cache=3Dalways
> virtiofs1: dirid=3Dbar,tag=3Dtag2,cache=3Dnever
> ```
>=20
> For information on the optional parameters see there:
> https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md

some information why virtiofsd is incompatible with hugepages, and why
it needs a different memdev identifier would be nice to have, either in
the commit message, or as a comment..

nits/questions below mainly if you respin with the guest-common related
feedback adressed, if not, these can obviously be done as a follow up as
well - but please also see the cover letter reply, the problems listed
there likely mean this patch in particular still needs some rework
anyway!

>=20
> Signed-off-by: Markus Frank <m.frank@proxmox.com>
> ---
>  PVE/QemuServer.pm        | 157 +++++++++++++++++++++++++++++++++++++++
>  PVE/QemuServer/Memory.pm |  25 +++++--
>  debian/control           |   1 +
>  3 files changed, 177 insertions(+), 6 deletions(-)
>=20
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index 940cdac..3a8b4c5 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -43,6 +43,7 @@ use PVE::PBSClient;
>  use PVE::RESTEnvironment qw(log_warn);
>  use PVE::RPCEnvironment;
>  use PVE::Storage;
> +use PVE::Mapping::DIR;
>  use PVE::SysFSTools;
>  use PVE::Systemd;
>  use PVE::Tools qw(run_command file_read_firstline file_get_contents dir_=
glob_foreach get_host_arch $IPV6RE);
> @@ -276,6 +277,35 @@ my $rng_fmt =3D {
>      },
>  };
> =20
> +my $virtiofs_fmt =3D {
> +    'dirid' =3D> {
> +	type =3D> 'string',
> +	default_key =3D> 1,
> +	description =3D> "dirid of directory you want to share with the guest V=
M",
> +	format_description =3D> "virtiofs-dirid",
> +    },

the other two mapping types call this 'mapping', set a format and
different format_description. maybe we should be consistent?

nit: the description could just be

"Mapping identifier of the directory mapping to be shared with the
guest."

or something similar, "dirid of directory" is rather redundant, and
"you" seems wrong for a property description.

> +    'tag' =3D> {
> +	type =3D> 'string',
> +	description =3D> "tag name for mounting in the guest VM",
> +	format_description =3D> "virtiofs-tag",
> +    },
> +    'cache' =3D> {
> +	type =3D> 'string',
> +	description =3D> "The caching policy the file system should use"
> +	    ." (auto, always, never).",
> +	format_description =3D> "virtiofs-cache",
> +	enum =3D> [qw(auto always never)],
> +	optional =3D> 1,
> +    },
> +    'direct-io' =3D> {
> +	type =3D> 'boolean',
> +	description =3D> "Honor the O_DIRECT flag passed down by guest applicat=
ions",
> +	format_description =3D> "virtiofs-directio",
> +	optional =3D> 1,
> +    },

what about 'ro'?

should we also add 'xattr', 'acl' and the submount part here to control
whether those are actually enabled for this particular guest?

> +};
> +PVE::JSONSchema::register_format('pve-qm-virtiofs', $virtiofs_fmt);
> +
>  my $meta_info_fmt =3D {
>      'ctime' =3D> {
>  	type =3D> 'integer',
> @@ -838,6 +868,7 @@ while (my ($k, $v) =3D each %$confdesc) {
>  }
> =20
>  my $MAX_NETS =3D 32;
> +my $MAX_VIRTIOFS =3D 10;
>  my $MAX_SERIAL_PORTS =3D 4;
>  my $MAX_PARALLEL_PORTS =3D 3;
>  my $MAX_NUMA =3D 8;
> @@ -982,6 +1013,21 @@ my $netdesc =3D {
> =20
>  PVE::JSONSchema::register_standard_option("pve-qm-net", $netdesc);
> =20
> +my $virtiofsdesc =3D {
> +    optional =3D> 1,
> +    type =3D> 'string', format =3D> $virtiofs_fmt,
> +    description =3D> "share files between host and guest",
> +};
> +PVE::JSONSchema::register_standard_option("pve-qm-virtiofs", $virtiofsde=
sc);
> +
> +sub max_virtiofs {
> +    return $MAX_VIRTIOFS;
> +}
> +
> +for (my $i =3D 0; $i < $MAX_VIRTIOFS; $i++)  {
> +    $confdesc->{"virtiofs$i"} =3D $virtiofsdesc;
> +}
> +
>  my $ipconfig_fmt =3D {
>      ip =3D> {
>  	type =3D> 'string',
> @@ -4100,6 +4146,25 @@ sub config_to_command {
>  	push @$devices, '-device', $netdevicefull;
>      }
> =20
> +    my $onevirtiofs =3D 0;

nit: my $virtiofs_enabled , or something similar ('onevirtiofs' sounds like
there is exactly one virtiofs configured).

> +    for (my $i =3D 0; $i < $MAX_VIRTIOFS; $i++) {
> +	my $virtiofsstr =3D "virtiofs$i";

nit: variable naming could be better here - we usually call this "$opt"
or "$key" ;)

> +
> +	next if !$conf->{$virtiofsstr};
> +	my $virtiofs =3D parse_property_string('pve-qm-virtiofs', $conf->{$virt=
iofsstr});
> +	next if !$virtiofs;
> +
> +	push @$devices, '-chardev', "socket,id=3Dvirtfs$i,path=3D/var/run/virti=
ofsd/vm$vmid-fs$i";
> +	push @$devices, '-device', 'vhost-user-fs-pci,queue-size=3D1024'
> +	    .",chardev=3Dvirtfs$i,tag=3D$virtiofs->{tag}";
> +
> +	$onevirtiofs =3D 1;
> +    }
> +
> +    if ($onevirtiofs && $conf->{hugepages}){
> +	die "hugepages not supported in combination with virtiofs\n";
> +    }
> +
>      if ($conf->{ivshmem}) {
>  	my $ivshmem =3D parse_property_string($ivshmem_fmt, $conf->{ivshmem});
> =20
> @@ -4159,6 +4224,14 @@ sub config_to_command {
>      }
>      push @$machineFlags, "type=3D${machine_type_min}";
> =20
> +    if ($onevirtiofs && !$conf->{numa}) {
> +	# kvm: '-machine memory-backend' and '-numa memdev' properties are
> +	# mutually exclusive
> +	push @$devices, '-object', 'memory-backend-file,id=3Dvirtiofs-mem'
> +	    .",size=3D$conf->{memory}M,mem-path=3D/dev/shm,share=3Don";
> +	push @$machineFlags, 'memory-backend=3Dvirtiofs-mem';
> +    }
> +
>      push @$cmd, @$devices;
>      push @$cmd, '-rtc', join(',', @$rtcFlags) if scalar(@$rtcFlags);
>      push @$cmd, '-machine', join(',', @$machineFlags) if scalar(@$machin=
eFlags);
> @@ -4185,6 +4258,72 @@ sub config_to_command {
>      return wantarray ? ($cmd, $vollist, $spice_port, $pci_devices) : $cm=
d;
>  }
> =20
> +sub start_virtiofs {
> +    my ($vmid, $fsid, $virtiofs) =3D @_;
> +
> +    my $dir_list =3D PVE::Mapping::DIR::find_on_current_node($virtiofs->=
{dirid});
> +
> +    if (!$dir_list || scalar($dir_list->@*) !=3D 1) {
> +	die "virtiofs needs exactly one mapping for this node\n";
> +    }
> +
> +    eval {
> +	PVE::Mapping::DIR::assert_valid($dir_list->[0]);
> +    };
> +    if (my $err =3D $@) {
> +	die "Directory Mapping invalid: $err\n";
> +    }
> +
> +    my $dir_cfg =3D $dir_list->[0];
> +    my $path =3D $dir_cfg->{path};
> +    my $socket_path_root =3D "/var/run/virtiofsd";
> +    mkdir $socket_path_root;
> +    my $socket_path =3D "$socket_path_root/vm$vmid-fs$fsid";
> +    unlink($socket_path);
> +    my $socket =3D IO::Socket::UNIX->new(
> +	Type =3D> SOCK_STREAM,
> +	Local =3D> $socket_path,
> +	Listen =3D> 1,
> +    ) or die "cannot create socket - $!\n";
> +
> +    my $flags =3D fcntl($socket, F_GETFD, 0)
> +	or die "failed to get file descriptor flags: $!\n";
> +    fcntl($socket, F_SETFD, $flags & ~FD_CLOEXEC)
> +	or die "failed to remove FD_CLOEXEC from file descriptor\n";
> +
> +    my $fd =3D $socket->fileno();
> +
> +    my $virtiofsd_bin =3D '/usr/libexec/virtiofsd';
> +
> +    my $pid =3D fork();
> +    if ($pid =3D=3D 0) {
> +	for my $fd_loop (3 .. POSIX::sysconf( &POSIX::_SC_OPEN_MAX )) {
> +	    POSIX::close($fd_loop) if ($fd_loop !=3D $fd);
> +	}
> +	my $pid2 =3D fork();
> +	if ($pid2 =3D=3D 0) {
> +	    my $cmd =3D [$virtiofsd_bin, "--fd=3D$fd", "--shared-dir=3D$path"];
> +	    push @$cmd, '--xattr' if ($dir_cfg->{xattr});
> +	    push @$cmd, '--posix-acl' if ($dir_cfg->{acl});
> +	    push @$cmd, '--announce-submounts' if ($dir_cfg->{submounts});
> +	    push @$cmd, '--allow-direct-io' if ($virtiofs->{'direct-io'});
> +	    push @$cmd, "--cache=3D$virtiofs->{'cache'}" if ($virtiofs->{'cache=
'});
> +	    run_command($cmd);
> +	    POSIX::_exit(0);
> +	} elsif (!defined($pid2)) {
> +	    die "could not fork to start virtiofsd\n";
> +	} else {
> +	    POSIX::_exit(0);
> +	}
> +    } elsif (!defined($pid)) {
> +	die "could not fork to start virtiofsd\n";
> +    }
> +
> +    # return socket to keep it alive,
> +    # so that qemu will wait for virtiofsd to start
> +    return $socket;
> +}
> +
>  sub check_rng_source {
>      my ($source) =3D @_;
> =20
> @@ -5740,6 +5879,19 @@ sub vm_start_nolock {
>      my ($cmd, $vollist, $spice_port, $pci_devices) =3D config_to_command=
($storecfg, $vmid,
>  	$conf, $defaults, $forcemachine, $forcecpu, $params->{'pbs-backing'});
> =20
> +    my @sockets;
> +    for (my $i =3D 0; $i < $MAX_VIRTIOFS; $i++) {
> +	my $virtiofsstr =3D "virtiofs$i";
> +
> +	next if !$conf->{$virtiofsstr};
> +	my $virtiofs =3D parse_property_string('pve-qm-virtiofs', $conf->{$virt=
iofsstr});
> +	next if !$virtiofs;
> +
> +
> +	my $socket =3D start_virtiofs($vmid, $i, $virtiofs);
> +	push @sockets, $socket;
> +    }
> +
>      my $migration_ip;
>      my $get_migration_ip =3D sub {
>  	my ($nodename) =3D @_;
> @@ -6093,6 +6245,11 @@ sub vm_start_nolock {
> =20
>      PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'post-start');
> =20
> +    foreach my $socket (@sockets) {
> +	shutdown($socket, 2);
> +	close($socket);
> +    }
> +
>      return $res;
>  }
> =20
> diff --git a/PVE/QemuServer/Memory.pm b/PVE/QemuServer/Memory.pm
> index 0601dd6..3b58b36 100644
> --- a/PVE/QemuServer/Memory.pm
> +++ b/PVE/QemuServer/Memory.pm
> @@ -278,6 +278,16 @@ sub config {
> =20
>      die "numa needs to be enabled to use hugepages" if $conf->{hugepages=
} && !$conf->{numa};
> =20
> +    my $onevirtiofs =3D 0;
> +    for (my $i =3D 0; $i < PVE::QemuServer::max_virtiofs(); $i++) {
> +	my $virtiofsstr =3D "virtiofs$i";
> +	next if !$conf->{$virtiofsstr};
> +	my $virtiofs =3D PVE::JSONSchema::parse_property_string('pve-qm-virtiof=
s', $conf->{$virtiofsstr});
> +	if ($virtiofs) {
> +	    $onevirtiofs =3D 1;
> +	}
> +    }
> +
>      if ($conf->{numa}) {
> =20
>  	my $numa_totalmemory =3D undef;
> @@ -290,7 +300,8 @@ sub config {
>  	    my $numa_memory =3D $numa->{memory};
>  	    $numa_totalmemory +=3D $numa_memory;
> =20
> -	    my $mem_object =3D print_mem_object($conf, "ram-node$i", $numa_memo=
ry);
> +	    my $memdev =3D $onevirtiofs ? "virtiofs-mem$i" : "ram-node$i";
> +	    my $mem_object =3D print_mem_object($conf, $memdev, $numa_memory);
> =20
>  	    # cpus
>  	    my $cpulists =3D $numa->{cpus};
> @@ -315,7 +326,7 @@ sub config {
>  	    }
> =20
>  	    push @$cmd, '-object', $mem_object;
> -	    push @$cmd, '-numa', "node,nodeid=3D$i,cpus=3D$cpus,memdev=3Dram-no=
de$i";
> +	    push @$cmd, '-numa', "node,nodeid=3D$i,cpus=3D$cpus,memdev=3D$memde=
v";
>  	}
> =20
>  	die "total memory for NUMA nodes must be equal to vm static memory\n"
> @@ -329,13 +340,13 @@ sub config {
>  		die "host NUMA node$i doesn't exist\n"
>  		    if !host_numanode_exists($i) && $conf->{hugepages};
> =20
> -		my $mem_object =3D print_mem_object($conf, "ram-node$i", $numa_memory)=
;
> -		push @$cmd, '-object', $mem_object;
> -
>  		my $cpus =3D ($cores * $i);
>  		$cpus .=3D "-" . ($cpus + $cores - 1) if $cores > 1;
> =20
> -		push @$cmd, '-numa', "node,nodeid=3D$i,cpus=3D$cpus,memdev=3Dram-node$=
i";
> +		my $memdev =3D $onevirtiofs ? "virtiofs-mem$i" : "ram-node$i";
> +		my $mem_object =3D print_mem_object($conf, $memdev, $numa_memory);
> +		push @$cmd, '-object', $mem_object;
> +		push @$cmd, '-numa', "node,nodeid=3D$i,cpus=3D$cpus,memdev=3D$memdev";
>  	    }
>  	}
>      }
> @@ -364,6 +375,8 @@ sub print_mem_object {
>  	my $path =3D hugepages_mount_path($hugepages_size);
> =20
>  	return "memory-backend-file,id=3D$id,size=3D${size}M,mem-path=3D$path,s=
hare=3Don,prealloc=3Dyes";
> +    } elsif ($id =3D~ m/^virtiofs-mem/) {
> +	return "memory-backend-file,id=3D$id,size=3D${size}M,mem-path=3D/dev/sh=
m,share=3Don";
>      } else {
>  	return "memory-backend-ram,id=3D$id,size=3D${size}M";
>      }
> diff --git a/debian/control b/debian/control
> index 49f67b2..f008a9b 100644
> --- a/debian/control
> +++ b/debian/control
> @@ -53,6 +53,7 @@ Depends: dbus,
>           socat,
>           swtpm,
>           swtpm-tools,
> +         virtiofsd,
>           ${misc:Depends},
>           ${perl:Depends},
>           ${shlibs:Depends},
> --=20
> 2.39.2
>=20
>=20
>=20
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>=20
>=20
>=20