From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 67C206B47F for ; Thu, 18 Feb 2021 17:37:22 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 552DE107EE for ; Thu, 18 Feb 2021 17:36:52 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id A8352107E5 for ; Thu, 18 Feb 2021 17:36:51 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 73F4341ACB for ; Thu, 18 Feb 2021 17:36:51 +0100 (CET) From: Stoiko Ivanov To: pve-devel@lists.proxmox.com Date: Thu, 18 Feb 2021 17:36:35 +0100 Message-Id: <20210218163635.28639-1-s.ivanov@proxmox.com> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.063 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH storage] zfspoolplugin: check if mounted instead of imported X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Feb 2021 16:37:22 -0000 This patch addresses an issue we recently saw on a production machine: * after booting a ZFS pool failed to get imported (due to an empty /etc/zfs/zpool.cache) * pvestatd/guest-startall eventually tried to import the pool * the pool was imported, yet the datasets of the pool remained not mounted A bit of debugging showed that `zpool import ` is not atomic, in fact it does fork+exec `mount` with appropriate parameters. If an import ran longer than the hardcoded timeout of 15s, it could happen that the pool got imported, but the zpool command (and its forks) got terminated due to timing out. reproducing this is straight-forward by setting (drastic) bw+iops limits on a guests' disk (which contains a zpool) - e.g.: `qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\ iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\ mbps_wr_max=15` afterwards running `timeout 15 zpool import ` resulted in that situation in the guest on my machine The patch changes the check in activate_storage for the ZFSPoolPlugin, to check the 'mounted' and 'canmount' properties of the configured 'pool' (which can also be a sub-dataset): * if the zfs get fails, we try to import the pool (and run `zfs mount -a` if the dataset is not mounted after importing) * if canmount != 'yes' we assume everything is ok * if it reports the dataset as not mounted, we run `zfs mount -a` The choice of running `zfs mount -a` has potential for regression, in case someone manually umounts some dataset after booting the system (but does not set the canmount option). Signed-off-by: Stoiko Ivanov --- rfc -> v1: * incorporated Thomas' feedback (Thanks): ** we now check both properties mounted,canmount and only run mount -a if the dataset is not mounted, but should be by config ** this prevents the code from constantly running zfs mount -a in case the system has canmount=off on the root-dataset ** however in that case, if the (child) datasets were not mounted due to a timeout, they will not get mounted (as before) PVE/Storage/ZFSPoolPlugin.pm | 38 ++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/PVE/Storage/ZFSPoolPlugin.pm b/PVE/Storage/ZFSPoolPlugin.pm index 105d802..3d25e6a 100644 --- a/PVE/Storage/ZFSPoolPlugin.pm +++ b/PVE/Storage/ZFSPoolPlugin.pm @@ -525,27 +525,49 @@ sub activate_storage { my ($class, $storeid, $scfg, $cache) = @_; # Note: $scfg->{pool} can include dataset / - my $pool = $scfg->{pool}; - $pool =~ s!/.*$!!; + my $dataset = $scfg->{pool}; + my $pool = ($dataset =~ s!/.*$!!r); - my $pool_imported = sub { - my @param = ('-o', 'name', '-H', "$pool"); - my $res = eval { $class->zfs_request($scfg, undef, 'zpool_list', @param) }; + my $dataset_mounted = sub { + my ($mounted, $canmount) = eval { + $class->zfs_get_properties($scfg, 'mounted,canmount', $dataset) + }; if ($@) { warn "$@\n"; return undef; } + #canmount is off|noauto, zfs get worked - assume pool is correctly imported + return 'yes' if $canmount =~ /^(?:off|noauto)$/; + return $mounted; + }; + + my $mount_all = sub { + eval { $class->zfs_request($scfg, undef, 'mount', '-a') }; + die "could not activate storage '$storeid', $@\n" if $@; + }; + + my $pool_imported = sub { + my @param = ('-o', 'name', '-H', "$pool"); + my $res = eval { $class->zfs_request($scfg, undef, 'zpool_list', @param) }; + warn "$@\n" if $@; + return defined($res) && $res =~ m/$pool/; }; - if (!$pool_imported->()) { + my $mounted = $dataset_mounted->(); + if (!$mounted) { + # error from zfs get mounted - pool import necessary # import can only be done if not yet imported! - my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', "$pool"); + my @param = ('-d', '/dev/disk/by-id/', '-o', 'cachefile=none', $pool); eval { $class->zfs_request($scfg, undef, 'zpool_import', @param) }; if (my $err = $@) { # just could've raced with another import, so recheck if it is imported - die "could not activate storage '$storeid', $@\n" if !$pool_imported->(); + die "could not activate storage '$storeid', $err\n" if !$pool_imported->(); } + $mount_all->() if !$dataset_mounted->(); + } elsif ($mounted =~ /^no$/) { + # best effort mount everything + $mount_all->(); } return 1; } -- 2.20.1