From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 6AF2D6B3B7 for ; Thu, 18 Feb 2021 14:16:30 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 57D55E06D for ; Thu, 18 Feb 2021 14:16:00 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 682CEE04A for ; Thu, 18 Feb 2021 14:15:59 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 1B5C2461D9 for ; Thu, 18 Feb 2021 14:15:59 +0100 (CET) Date: Thu, 18 Feb 2021 14:15:56 +0100 From: Stoiko Ivanov To: Thomas Lamprecht Cc: Proxmox VE development discussion Message-ID: <20210218141556.215b9fce@rosa.proxmox.com> In-Reply-To: <9004b4ad-eb2e-72a0-0c38-9392a50e1660@proxmox.com> References: <20210218113328.14488-1-s.ivanov@proxmox.com> <9004b4ad-eb2e-72a0-0c38-9392a50e1660@proxmox.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.064 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [RFC storage] zfspoolplugin: check if mounted instead of imported X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Feb 2021 13:16:30 -0000 Thanks for looking at the patch! On Thu, 18 Feb 2021 13:20:55 +0100 Thomas Lamprecht wrote: > On 18.02.21 12:33, Stoiko Ivanov wrote: > > This patch addresses an issue we recently saw on a production machine: > > * after booting a ZFS pool failed to get imported (due to an empty > > /etc/zfs/zpool.cache) > > * pvestatd/guest-startall eventually tried to import the pool > > * the pool was imported, yet the datasets of the pool remained > > not mounted > > > > A bit of debugging showed that `zpool import ` is not > > atomic, in fact it does fork+exec `mount` with appropriate parameters. > > If an import ran longer than the hardcoded timeout of 15s, it could > > happen that the pool got imported, but the zpool command (and its > > forks) got terminated due to timing out. > > > > reproducing this is straight-forward by setting (drastic) bw+iops > > limits on a guests' disk (which contains a zpool) - e.g.: > > `qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\ > > iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,mbps_wr_max=15` > > afterwards running `timeout 15 zpool import ` resulted in > > that situation in the guest on my machine > > > > The patch changes the check in activate_storage for the ZFSPoolPlugin, > > to check if the configured 'pool' (which can also be a sub-dataset) is > > mounted (via `zfs get mounted `). > > > > * In case this errors out we try to import the pool (and run `zfs > > mount -a` if the dataset is not mounted after importing) > > * In case it succeeds but reports the dataset as not mounted, > > we run `zfs mount -a` > > > > The choice of running `zfs mount -a` has potential for regression, in > > case someone manually umounts some dataset after booting the system > > (but does not set the canmount option) > > Why not > # zfs mount > ? That would only mount the dataset and not anything beneath it, which is different from what `zpool import` does (without the -N option). AFAICT there is no way to recursively mount everything below a dataset (short of manually walking through the tree) While we mitigated those issues with container-datasets recently (by explicitly mounting them before starting the container) - this does not hold true for manually created datasets (e.g. for setting different properties for certain guests, or for storing backups....) (in case of the original reproducer, there was a PBS-datastore defined on a sub-dataset) > > > > > Signed-off-by: Stoiko Ivanov > > --- > > * sending as RFC since I have the feeling that I miss quite a few corner-cases. > > * an alternative to getting the mounted property via `zfs get` might be > > parsing /proc/mounts (but zfs get mounted does mostly just that +stating > > files in /sys, and 2 (potentially blocking) ioctls > > * could reproduce the issue with ZFS 2.0.1 and 0.8.6 - so my current guess > > is the issue on the production box might be related to some other > > performance regression in its disk-subsystem (maybe introduced with the > > new kernel) > > * huge thanks to Dominik for many suggestions (including the idea of > > limiting the disk-speed via our stack instead of dm-delay :) > > > > PVE/Storage/ZFSPoolPlugin.pm | 25 +++++++++++++++++++++---- > > 1 file changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm > > index 105d802..32ad1c9 100644 > > --- a/PVE/Storage/ZFSPoolPlugin.pm > > +++ b/PVE/Storage/ZFSPoolPlugin.pm > > @@ -525,8 +525,17 @@ sub activate_storage { > > my ($class, $storeid, $scfg, $cache) = @_; > > > > # Note: $scfg->{pool} can include dataset / > > - my $pool = $scfg->{pool}; > > - $pool =~ s!/.*$!!; > > + my $dataset = $scfg->{pool}; > > + my $pool = ($dataset =~ s!/.*$!!r); > > + > > + my $dataset_mounted = sub { > > + my $mounted = eval { $class->zfs_get_properties($scfg, 'mounted', "$dataset") }; > > quotes don't do anything here, just gets you a "undefined value in string ..." warning > if undefined (and underlying run_command does not change behavior either). > > I.e., > > # perl -we 'use PVE::Tools qw(run_command); my $arg; run_command(["touch", "$arg"])' > Use of uninitialized value $arg in string at -e line 1. > touch: cannot touch '': No such file or directory > command 'touch ''' failed: exit code 1 > > vs. > > # perl -we 'use PVE::Tools qw(run_command); my $arg; run_command(["touch", $arg])' > Use of uninitialized value $_[1] in exec at /usr/share/perl/5.28/IPC/Open3.pm line 178. > touch: cannot touch '': No such file or directory > command 'touch ''' failed: exit code 1 > > I like to avoid using quotes in such cases to avoid suggesting that they have > any use. sorry - that was a bit shoddy copying - will fix that in a v2 > > > + if ($@) { > > + warn "$@\n"; > > + return undef; > > + } > > I know this is just copied, but both should just use a simple > warn "$@\n" if ($@); > > as the return is correctly using a defined check anyway sounds good > > > > + return defined($mounted) && $mounted =~ m/^yes$/; > > + }; > > > > my $pool_imported = sub { > > my @param = ('-o', 'name', '-H', "$pool"); > > @@ -538,13 +547,21 @@ sub activate_storage { > > return defined($res) && $res =~ m/$pool/; > > }; > > > > - if (!$pool_imported->()) { > > + if (!defined($dataset_mounted->())) { > > don't complicated boolean, drop the defined. in that case the return value of $dataset_mounted->() is ternary: * undefined - error during `zfs get` -> we most likely need to import the pool * false - pool is imported, but dataset is not mounted -> we need to mount only (the elsif branch) * true - everything as it should be coming to think about it: this would be problematic, if the storage has a 'pool', which points to a dataset configured with 'canmount=off' - see the description of 'canmount' in zfsprops(8) for when this might make sense): with the current check (if the pool is imported) this would work fine, with the patch in place this would result in each activate_storage() to call zfs mount -a. My expectation is that this does not happen in the wild (manually setting 'canmount' to off for the root-dataset of a storage) - but it is technically possible - but maybe I'm missing the use-case for this? > > > + # error from zfs get mounted - pool import necessary > > # import can only be done if not yet imported! > > my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', "$pool"); > > eval { $class->zfs_request($scfg, undef, 'zpool_import', @param) }; > > if (my $err = $@) { > > # just could've raced with another import, so recheck if it is imported > > - die "could not activate storage '$storeid', $@\n" if !$pool_imported->(); > > + die "could not activate storage '$storeid', $err\n" if !$pool_imported->(); > > + } > > + if (!$dataset_mounted->()) { > > + eval { $class->zfs_request($scfg, undef, 'mount', '-a') }; > > + if (my $err = $@) { > > + # just could've raced with another import, so recheck if it is imported > > but you do not (re-)check anything? will refactor that a bit in the v2 > > > + die "could not activate storage '$storeid', $err\n"; > > + } > > } > > } > > return 1; > > >