From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <t.lamprecht@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id A4629627D2
 for <pve-devel@lists.proxmox.com>; Tue, 22 Feb 2022 09:44:58 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id 9AA6C21E89
 for <pve-devel@lists.proxmox.com>; Tue, 22 Feb 2022 09:44:58 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS id D84DC21E7E
 for <pve-devel@lists.proxmox.com>; Tue, 22 Feb 2022 09:44:57 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id A3959419F5
 for <pve-devel@lists.proxmox.com>; Tue, 22 Feb 2022 09:44:51 +0100 (CET)
Message-ID: <d3fb75b3-7b08-51a1-1c53-35ab743efaa9@proxmox.com>
Date: Tue, 22 Feb 2022 09:44:50 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101
 Thunderbird/98.0
Content-Language: en-US
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
 Aaron Lauterer <a.lauterer@proxmox.com>
References: <20220218113827.1415641-1-a.lauterer@proxmox.com>
 <20220218113827.1415641-5-a.lauterer@proxmox.com>
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
In-Reply-To: <20220218113827.1415641-5-a.lauterer@proxmox.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.058 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
 T_SCC_BODY_TEXT_LINE    -0.01 -
 URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See
 http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more
 information. [osd.pm, mon.pm, mds.pm]
Subject: Re: [pve-devel] [PATCH manager 4/6] api: mon: mds: osd: add safety
 check endpoints
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 22 Feb 2022 08:44:58 -0000

On 18.02.22 12:38, Aaron Lauterer wrote:
> for mon, mds and osd: ok-to-stop
> mon: ok-to-rm
> osd: safe-to-destroy
> 
> Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
> ---
> 
> I added the OSD safe-to-destroy endpoint even though I have no immediate
> use for it as of now. But plan to include it in the OSD panel once I
> have an idea how to do so in a sensible manner.
> 
> The main problem with safe-to-destroy is, that it only works if the OSD
> is still running and by the time we enable the destroy button, the OSD
> needs to be stopped.


above isn't really meta info for review but rather important for the commit
message itself.


In general I see lots of repetition, and in this case I'd rather have a single
enpoint that accepts one (or maybe better a list of) service-type(s), and an
action (stop/destroy) let's encode in the name (or at least description) that
it's a heuristical check, besides things that we possible miss to observe we
could never make it 100% safe as we cannot lock the whole ceph cluster between
checking and doing an operation, so this will always be a TOCTOU race that 
expects the admins to have some change management so that they do not interfere
with each others maintenance work.

So either `/nodes/<nodename>/ceph/cmd-safety-heuristic` or drop the heuristic
from the path and just refer to that detail in the description (which shows up
in the api viewer, so should be good enough) `/nodes/<nodename>/ceph/cmd-safety`

params could be: node, type, id and command

> 
>  PVE/API2/Ceph/MDS.pm |  50 ++++++++++++++++++++++
>  PVE/API2/Ceph/MON.pm | 100 +++++++++++++++++++++++++++++++++++++++++++
>  PVE/API2/Ceph/OSD.pm | 100 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 250 insertions(+)
> 
> diff --git a/PVE/API2/Ceph/MDS.pm b/PVE/API2/Ceph/MDS.pm
> index 1cb0b74f..2ed3cf5a 100644
> --- a/PVE/API2/Ceph/MDS.pm
> +++ b/PVE/API2/Ceph/MDS.pm
> @@ -230,4 +230,54 @@ __PACKAGE__->register_method ({
>      }
>  });
>  
> +__PACKAGE__->register_method ({
> +    name => 'oktostop',

FYI: the name is usable to call this directly via code API2::Ceph::MDS->oktostop(),
so it would be good to use the perl method naming convention.

> +    path => '{name}/ok-to-stop',
> +    method => 'GET',
> +    description => "Check if it is ok to stop the MON",

"heuristic check if..."

> +    proxyto => 'node',
> +    protected => 1,
> +    permissions => {
> +	check => ['perm', '/', [ 'Sys.Modify' ]],
> +    },

hmm, why Sys.Modify, I mean this doesn't do anything so Sys.Audit could be argued
too, or did you have something other in mind here (e.g., consistency with some other,
similarly categorized api endpoint)? 

> +    parameters => {
> +	additionalProperties => 0,
> +	properties => {
> +	    node => get_standard_option('pve-node'),
> +	    name => {
> +		type => 'string',
> +		pattern => PVE::Ceph::Services::SERVICE_REGEX,
> +		description => 'The name (ID) of the mds',
> +	    },
> +	},
> +    },
> +    returns => {
> +	type => "object",

please actually describe the return format for such simple api calls

> +    },
> +    code => sub {
> +	my ($param) = @_;
> +
> +	PVE::Ceph::Tools::check_ceph_inited();
> +
> +	my $name = $param->{name};
> +	my $rados = PVE::RADOS->new();
> +
> +	my $result = {
> +	    ok_to_stop => 0,
> +	    status => '',
> +	};
> +
> +	my $code;
> +	my $status;
> +	eval {
> +	    ($code, $status) = $rados->mon_command({ prefix => 'mds ok-to-stop', format => 'plain', ids => [ $name ] }, 1);
> +	};


eval's return their last expression (which can but doesn has to use a trailing semicolon)
so you could do:

my ($code, $status) = eval {
    $rados->mon_command(...)
};

> +	die $@ if $@;

ok, why bother with eval'ing if you die 1:1 in any error case anyway??

> +
> +	$result->{ok_to_stop} = $code == 0 ? 1 : 0;
> +	$result->{status} = $status;

for such simple logic we can avoid the intermediate $result and return directly,
that shaves off about 6 lines from this api call and please also use the kebab-case
for new properties, both in input parameters or output result.

return {
    'ok-to-stop' => $code == 0 ? 1 : 0,
    status => $status,
};

> +
> +	return $result;
> +    }});
> +
>  1;