From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 7CE786BA55 for ; Fri, 19 Feb 2021 13:46:29 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 7256918C89 for ; Fri, 19 Feb 2021 13:46:29 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id D653E18C72 for ; Fri, 19 Feb 2021 13:46:27 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 983E04495F for ; Fri, 19 Feb 2021 13:46:27 +0100 (CET) From: Stoiko Ivanov To: pve-devel@lists.proxmox.com Date: Fri, 19 Feb 2021 13:45:43 +0100 Message-Id: <20210219124544.12593-3-s.ivanov@proxmox.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210219124544.12593-1-s.ivanov@proxmox.com> References: <20210219124544.12593-1-s.ivanov@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.062 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [zfspoolplugin.pm] Subject: [pve-devel] [PATCH storage v2 2/3] zfspoolplugin: check if mounted instead of imported X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Feb 2021 12:46:29 -0000 This patch addresses an issue we recently saw on a production machine: * after booting a ZFS pool failed to get imported (due to an empty /etc/zfs/zpool.cache) * pvestatd/guest-startall eventually tried to import the pool * the pool was imported, yet the datasets of the pool remained not mounted A bit of debugging showed that `zpool import ` is not atomic, in fact it does fork+exec `mount` with appropriate parameters. If an import ran longer than the hardcoded timeout of 15s, it could happen that the pool got imported, but the zpool command (and its forks) got terminated due to timing out. reproducing this is straight-forward by setting (drastic) bw+iops limits on a guests' disk (which contains a zpool) - e.g.: `qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\ iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\ mbps_wr_max=15` afterwards running `timeout 15 zpool import ` resulted in that situation in the guest on my machine The patch changes the check in activate_storage for the ZFSPoolPlugin, to check if any dataset below the 'pool' (which can also be a sub-dataset) is mounted by parsing /proc/mounts: * this is cheaper than running `zfs get` or `zpool list` * it catches a properly imported and mounted pool in case the root-dataset has 'canmount' set to off (or noauto), as long as any dataset below is mounted After trying to import the pool, we also run `zfs mount -a` (in case another check of /proc/mounts fails). Potential for regression: * running `zfs mount -a` is problematic, if a dataset is manually umounted after booting (without setting 'canmount') * a pool without any mounted dataset (no mountpoint property set and only zvols) - will result in repeated calls to `zfs mount -a` both of the above seem unlikely and should not occur, if using our tooling. Signed-off-by: Stoiko Ivanov --- PVE/Storage/ZFSPoolPlugin.pm | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm index be7b1b9..a524515 100644 --- a/PVE/Storage/ZFSPoolPlugin.pm +++ b/PVE/Storage/ZFSPoolPlugin.pm @@ -7,6 +7,7 @@ use IO::File; use Net::IP; use POSIX; +use PVE::ProcFSTools; use PVE::RPCEnvironment; use PVE::Storage::Plugin; use PVE::Tools qw(run_command); @@ -525,8 +526,26 @@ sub activate_storage { my ($class, $storeid, $scfg, $cache) = @_; # Note: $scfg->{pool} can include dataset / - my $pool = $scfg->{pool}; - $pool =~ s!/.*$!!; + my $dataset = $scfg->{pool}; + my $pool = ($dataset =~ s!/.*$!!r); + + my $dataset_mounted = sub { + my $mounted = 0; + my $dataset_dec = PVE::ProcFSTools::decode_mount($dataset); + my $mounts = eval { PVE::ProcFSTools::parse_proc_mounts() }; + warn "$@\n" if $@; + foreach my $mp (@$mounts) { + my ($what, $dir, $fs) = @$mp; + next if $fs ne 'zfs'; + # check for root-dataset of storage or any child-dataset. + # root-dataset could have 'canmount=off'. If any child is mounted + # heuristically assume that `zfs mount -a` was successful + next if $what !~ m!^$dataset_dec(?:/|$)!; + $mounted = 1; + last; + } + return $mounted; + }; my $pool_imported = sub { my @param = ('-o', 'name', '-H', $pool); @@ -536,7 +555,7 @@ sub activate_storage { return defined($res) && $res =~ m/$pool/; }; - if (!$pool_imported->()) { + if (!$dataset_mounted->()) { # import can only be done if not yet imported! my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', $pool); eval { $class->zfs_request($scfg, undef, 'zpool_import', @param) }; @@ -544,6 +563,8 @@ sub activate_storage { # just could've raced with another import, so recheck if it is imported die "could not activate storage '$storeid', $err\n" if !$pool_imported->(); } + eval { $class->zfs_request($scfg, undef, 'mount', '-a') }; + die "could not activate storage '$storeid', $@\n" if $@; } return 1; } -- 2.20.1